Benefits: *Fuel Your Growth with Love's - company funded tuition assistance program * Paid Time Off * Flexible Scheduling * 401(k) – 100% Match up to 5% * Medical/Dental/Vision Insurance after 30 days * Competitive Pay * Career Development * Hiring Immediately

Welcome to Love's: Our Cloud Platform Operations Engineer is responsible for the day-to-day operation, stability, and continuous improvement of a primarily AWS-based cloud platform, along with a distributed set of edge compute environments. This role operates in a highly dynamic, ticket-driven environment, ensuring rapid incident resolution, high system reliability, and consistent service delivery. This position goes beyond traditional support by leveraging automation and AI-enabled tools to enhance troubleshooting, streamline operational workflows, and proactively identify and resolve systemic issues. The role partners across infrastructure, engineering, and security teams to ensure the platform is scalable, secure, and optimized for performance and cost.

Job Functions:

Operate and support AWS infrastructure, including EC2, S3, IAM, VPC, and associated services in a production environment
Respond to and resolve Tier II/III incidents impacting cloud and edge systems, ensuring timely restoration of service
Troubleshoot complex issues across compute, storage, networking, and access layers using logs, metrics, and system data
Perform root cause analysis and implement corrective actions to prevent recurrence
Identify recurring operational issues and translate them into scalable solutions through automation or design improvements
Maintain and enhance monitoring, alerting, and observability practices to improve system reliability
Design, implement, and maintain Infrastructure-as-Code solutions (e.g., Terraform, CloudFormation)
Develop and maintain automation scripts to reduce manual intervention and improve operational efficiency
Apply an automation-first mindset to all repeatable operational processes
Leverage AI tools (e.g., LLMs, copilots, log analysis platforms) to accelerate diagnostics, summarize system behavior, and recommend remediation actions
Contribute to the development of AI-enabled operational capabilities, including knowledge retrieval, intelligent runbooks, and workflow automation
Identify opportunities to integrate AI into operational processes to improve speed, accuracy, and scalability
Monitor system performance, availability, and cost utilization, proactively addressing anomalies
Optimize cloud spend by identifying underutilized or misconfigured resources and implementing cost controls
Support reliability engineering practices to improve uptime and service resilience
Support edge compute and hypervisor environments, including connectivity, synchronization, and integration with cloud platforms
Troubleshoot hybrid infrastructure issues spanning on-premise/edge and cloud environments
Develop and maintain runbooks, knowledge articles, and standard operating procedures
Ensure documentation reflects current-state architecture and operational practices
Contribute to a culture of operational discipline, knowledge sharing, and continuous improvement

Experience and Qualifications:

Bachelor’s degree in Computer Science or a related discipline such as Information Technology, Software Engineering, or Computer engineering is required.
4–6+ years of experience supporting cloud or infrastructure environments in a production setting
Demonstrated experience operating and supporting AWS-based platforms at scale
Proven track record of resolving complex, real-world infrastructure issues end-to-end (compute, storage, networking, access)
Hands-on experience implementing Infrastructure-as-Code solutions in a production environment (Terraform preferred)
Experience working in ticket-driven, operational support environments with defined SLAs and incident management processes
Exposure to hybrid or distributed environments, including edge compute or virtualization platforms
Practical experience applying AI tools to infrastructure operations (e.g., log analysis, incident triage, workflow automation)
Cloud Platforms: AWS (EC2, S3, IAM, VPC, networking fundamentals)
Infrastructure as Code: Terraform (preferred), CloudFormation
Configuration Management: Ansible or similar tools
Systems Administration: Linux and/or Windows server environments
Troubleshooting: Root cause analysis across compute, storage, networking, and access layers
Monitoring & Observability: Experience with logging, metrics, and alerting tools
Networking Fundamentals: DNS, routing, firewalls, and connectivity troubleshooting
Automation & Scripting: Ability to automate repetitive operational tasks (e.g., Python, Bash, or similar)
Cloud Security Fundamentals: IAM, access controls, and security best practices
AI Tool Application: Use of LLMs, copilots, or log analysis platforms to support diagnostics and operational efficiency
Edge/Hybrid Infrastructure (Preferred): Virtualization, edge compute, and cloud synchronization concepts
Operational Excellence: Strong sense of ownership, accountability, and urgency in maintaining system stability and performance
Analytical Thinking: Structured problem-solving with the ability to diagnose and resolve complex issues under pressure
Automation Mindset: Proactively identifies opportunities to eliminate manual effort and improve efficiency
Adaptability: Effective in fast-paced, ticket-driven environments with shifting priorities
Collaboration: Works effectively across infrastructure, engineering, and security teams to drive outcomes
Communication: Translates technical issues into clear, actionable insights for both technical and non-technical stakeholders
Continuous Improvement: Actively seeks opportunities to enhance processes, tools, and platform reliability
Learning Agility: Stays current with evolving cloud technologies, automation practices, and AI capabilities
Enterprise Mindset & Team Contribution: Proactively supports broader team and enterprise priorities by contributing to initiatives beyond core responsibilities; steps in where needed to drive outcomes and ensure collective success

Our Culture:

Fueling customers' journeys since 1964, innovation leads the way for this family-owned and operated business headquartered in Oklahoma City. With nearly 40,000 team members, travel stops are the core business along with products and services that provide value for professional drivers, fleets, traveling public, RVers, alternative energy and wholesale fuel customers. Giving back to communities and an inclusive workplace are hallmarks of the award-winning culture.

Love's is an Equal Opportunity Employer. Veterans encouraged to apply.

Cloud Engineer III

Benefits

Perks

Skills

About the Role

Similar Jobs

Senior Infrastructure Engineer (Core Infra, US)

Cloud Engineer III

Senior Cloud Engineer

Principal AI Cloud Infrastructure Engineer

Sr. Cloud Engineer

Cloud Engineer

Cloud Engineer

Cloud Engineer – Cloud Operations

Senior Infrastructure Engineer (Core Infra, US)

Cloud Engineer