Sysadmin Jobs
Jobgether

Infrastructure Engineer (GPU & Compute)

Jobgether

Remote (US) Senior Level $180k - $200k/yr
Posted 1 day ago

Benefits

  • Medical coverage
  • Dental coverage
  • Vision coverage
  • Retirement programs
  • Paid time off
  • Holidays
  • Paid parental leave

Perks

  • Equity participation
  • Financial wellness
  • Wellness stipend
  • Home office stipend
  • Remote OK

Skills

Linux Systems Administration GPU Diagnostics Python Bare-metal Provisioning NVIDIA DCGM Image Management System Validation PXE Workflows IPMI iDRAC Redfish Infrastructure Automation Hardware Qualification Virtualization Performance Monitoring Compute Scaling

About the Role

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an Infrastructure Engineer (GPU & Compute) in the United States.

This role is at the core of building and scaling high-performance infrastructure designed for modern AI and machine learning workloads. You will work across hardware, systems, and software layers to ensure GPU-enabled environments are reliable, efficient, and production-ready from day one. The position combines deep technical expertise with hands-on ownership of image pipelines, system validation, and large-scale compute environments. You will play a critical role in enabling seamless deployment and operation of cutting-edge AI infrastructure by improving automation, diagnostics, and performance. Collaborating with cross-functional teams, you will help bring new systems online, validate next-generation hardware, and enhance operational efficiency. This is a high-impact opportunity within a fast-paced, innovation-driven environment focused on scaling compute for the future of AI.

\n


Accountabilities:
  • Own and evolve systems for image management, deployment, and validation across large-scale bare-metal and GPU-enabled infrastructure environments.
  • Maintain and operate validation clusters used for system diagnostics, testing, and infrastructure bring-up to ensure readiness and reliability.
  • Lead GPU diagnostics and validation workflows, identifying performance bottlenecks, failure patterns, and system-level issues across hardware and software layers.
  • Build and enhance automation tools and workflows (primarily in Python) to streamline provisioning, validation, and operational processes.
  • Support hardware qualification efforts for new platforms, including firmware, drivers, and operating system validation.
  • Manage Linux-based production and validation environments, including virtualization and bare-metal provisioning systems (e.g., PXE workflows).
  • Collaborate with infrastructure, hardware, data center, and ML teams to align systems with workload requirements and ensure optimal performance.
  • Contribute to best practices for infrastructure lifecycle management, system diagnostics, and scalability improvements.

Requirements:

  • 5+ years of experience in infrastructure engineering, systems engineering, or related technical roles.
  • Strong expertise in Linux systems administration within production or large-scale environments.
  • Hands-on experience with GPU-enabled systems and performance/monitoring tools such as NVIDIA DCGM.
  • Solid understanding of bare-metal provisioning, system bring-up processes, and image-based deployment workflows.
  • Proficiency in Python or similar programming/scripting languages for building automation tools.
  • Demonstrated ability to troubleshoot complex issues across hardware, operating systems, GPUs, and system software layers.
  • Familiarity with hardware management interfaces such as IPMI, iDRAC, or Redfish.
  • Experience working with data center infrastructure and physical hardware environments is highly valued.
  • Bonus: Experience with high-performance interconnects (InfiniBand, NVLink), AI/ML or HPC workloads, and large-scale hardware validation frameworks.

Benefits:

  • Competitive base salary ranging from $180,000 to $200,000 USD, based on experience and location.
  • Performance-based bonus and meaningful equity participation.
  • Comprehensive medical, dental, and vision coverage.
  • Retirement and financial wellness programs.
  • Generous paid time off, holidays, and paid parental leave.
  • Flexible remote or hybrid work options within the United States.
  • Professional development support and learning opportunities.
  • Wellness and home office stipends.
  • Inclusive and collaborative work environment focused on innovation and balance.


\n

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

 Why Apply Through Jobgether? 

 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

 

 

#LI-CL1

Similar Jobs

Apply Now