Cloud Engineer III
Love's Travel Stops & Country Stores
Benefits
- Tuition assistance
- Paid Time Off
- 401(k)
- Medical/Dental/Vision
Perks
- Flexible Scheduling
Skills
About the Role
Benefits: *Fuel Your Growth with Love's - company funded tuition assistance program * Paid Time Off * Flexible Scheduling * 401(k) – 100% Match up to 5% * Medical/Dental/Vision Insurance after 30 days * Competitive Pay * Career Development * Hiring Immediately
Welcome to Love's: Our Cloud Platform Operations Engineer is responsible for the day-to-day operation, stability, and continuous improvement of a primarily AWS-based cloud platform, along with a distributed set of edge compute environments. This role operates in a highly dynamic, ticket-driven environment, ensuring rapid incident resolution, high system reliability, and consistent service delivery. This position goes beyond traditional support by leveraging automation and AI-enabled tools to enhance troubleshooting, streamline operational workflows, and proactively identify and resolve systemic issues. The role partners across infrastructure, engineering, and security teams to ensure the platform is scalable, secure, and optimized for performance and cost.
Job Functions:
- Operate and support AWS infrastructure, including EC2, S3, IAM, VPC, and associated services in a production environment
- Respond to and resolve Tier II/III incidents impacting cloud and edge systems, ensuring timely restoration of service
- Troubleshoot complex issues across compute, storage, networking, and access layers using logs, metrics, and system data
- Perform root cause analysis and implement corrective actions to prevent recurrence
- Identify recurring operational issues and translate them into scalable solutions through automation or design improvements
- Maintain and enhance monitoring, alerting, and observability practices to improve system reliability
- Design, implement, and maintain Infrastructure-as-Code solutions (e.g., Terraform, CloudFormation)
- Develop and maintain automation scripts to reduce manual intervention and improve operational efficiency
- Apply an automation-first mindset to all repeatable operational processes
- Leverage AI tools (e.g., LLMs, copilots, log analysis platforms) to accelerate diagnostics, summarize system behavior, and recommend remediation actions
- Contribute to the development of AI-enabled operational capabilities, including knowledge retrieval, intelligent runbooks, and workflow automation
- Identify opportunities to integrate AI into operational processes to improve speed, accuracy, and scalability
- Monitor system performance, availability, and cost utilization, proactively addressing anomalies
- Optimize cloud spend by identifying underutilized or misconfigured resources and implementing cost controls
- Support reliability engineering practices to improve uptime and service resilience
- Support edge compute and hypervisor environments, including connectivity, synchronization, and integration with cloud platforms
- Troubleshoot hybrid infrastructure issues spanning on-premise/edge and cloud environments
- Develop and maintain runbooks, knowledge articles, and standard operating procedures
- Ensure documentation reflects current-state architecture and operational practices
- Contribute to a culture of operational discipline, knowledge sharing, and continuous improvement
Experience and Qualifications:
- Bachelor’s degree in Computer Science or a related discipline such as Information Technology, Software Engineering, or Computer engineering is required.
- 4–6+ years of experience supporting cloud or infrastructure environments in a production setting
- Demonstrated experience operating and supporting AWS-based platforms at scale
- Proven track record of resolving complex, real-world infrastructure issues end-to-end (compute, storage, networking, access)
- Hands-on experience implementing Infrastructure-as-Code solutions in a production environment (Terraform preferred)
- Experience working in ticket-driven, operational support environments with defined SLAs and incident management processes
- Exposure to hybrid or distributed environments, including edge compute or virtualization platforms
- Practical experience applying AI tools to infrastructure operations (e.g., log analysis, incident triage, workflow automation)
- Cloud Platforms: AWS (EC2, S3, IAM, VPC, networking fundamentals)
- Infrastructure as Code: Terraform (preferred), CloudFormation
- Configuration Management: Ansible or similar tools
- Systems Administration: Linux and/or Windows server environments
- Troubleshooting: Root cause analysis across compute, storage, networking, and access layers
- Monitoring & Observability: Experience with logging, metrics, and alerting tools
- Networking Fundamentals: DNS, routing, firewalls, and connectivity troubleshooting
- Automation & Scripting: Ability to automate repetitive operational tasks (e.g., Python, Bash, or similar)
- Cloud Security Fundamentals: IAM, access controls, and security best practices
- AI Tool Application: Use of LLMs, copilots, or log analysis platforms to support diagnostics and operational efficiency
- Edge/Hybrid Infrastructure (Preferred): Virtualization, edge compute, and cloud synchronization concepts
- Operational Excellence: Strong sense of ownership, accountability, and urgency in maintaining system stability and performance
- Analytical Thinking: Structured problem-solving with the ability to diagnose and resolve complex issues under pressure
- Automation Mindset: Proactively identifies opportunities to eliminate manual effort and improve efficiency
- Adaptability: Effective in fast-paced, ticket-driven environments with shifting priorities
- Collaboration: Works effectively across infrastructure, engineering, and security teams to drive outcomes
- Communication: Translates technical issues into clear, actionable insights for both technical and non-technical stakeholders
- Continuous Improvement: Actively seeks opportunities to enhance processes, tools, and platform reliability
- Learning Agility: Stays current with evolving cloud technologies, automation practices, and AI capabilities
- Enterprise Mindset & Team Contribution: Proactively supports broader team and enterprise priorities by contributing to initiatives beyond core responsibilities; steps in where needed to drive outcomes and ensure collective success
Our Culture:
Fueling customers' journeys since 1964, innovation leads the way for this family-owned and operated business headquartered in Oklahoma City. With nearly 40,000 team members, travel stops are the core business along with products and services that provide value for professional drivers, fleets, traveling public, RVers, alternative energy and wholesale fuel customers. Giving back to communities and an inclusive workplace are hallmarks of the award-winning culture.
Love's is an Equal Opportunity Employer. Veterans encouraged to apply.
Similar Jobs
Senior Infrastructure Engineer (Core Infra, US)
Cloud Engineer III