Skip to content
Skip to content
Sysadmin Jobs
Jobgether

AI Data Infrastructure Engineer

Jobgether

Location
Remote (US)
Employment
Full-time
Level
Senior Level
Posted 4 days ago

About the Role

This role focuses on designing, building, and operating large-scale data systems that power modern AI training and evaluation workflows. You will work on complex, high-throughput data infrastructures that support multimodal datasets and ensure high-quality data delivery for machine learning pipelines.

Skills

Python Java Scala Go Spark Beam Ray Distributed Systems Data Modeling CI/CD System Design Multimodal Datasets Data Pipeline Architecture Dataset Versioning Lineage Tracking Data Governance

Benefits

  • Health Insurance
  • Dental Insurance
  • Vision Insurance
  • 401(k) Retirement Savings Plan
  • Paid Time Off

Perks

  • Wellness Support
  • Financial Security Programs
  • Work-life Balance Support
  • Professional Growth Opportunities
  • Remote Solely

Full job details

 

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Data Infrastructure Engineer in the United States.

This role focuses on designing, building, and operating large-scale data systems that power modern AI training and evaluation workflows. You will work on complex, high-throughput data infrastructures that support multimodal datasets and ensure high-quality data delivery for machine learning pipelines. The position combines deep data engineering expertise with a strong understanding of AI system requirements, including scalability, reliability, and performance optimization. You will contribute to building ingestion, transformation, validation, and dataset management systems that directly influence model quality and training efficiency. Working in a highly technical environment, you will collaborate with ML engineers and researchers to align data architecture with evolving AI needs. This is a hands-on, impactful role ideal for engineers passionate about large-scale systems and cutting-edge AI infrastructure.

\n


Accountabilities:
  • Design, build, and maintain large-scale data pipelines supporting AI training, evaluation, and continuous model improvement workflows.
  • Develop ingestion and processing systems for multimodal datasets including text, image, audio, video, and structured data.
  • Implement data cleaning, deduplication, validation, and quality assurance processes at petabyte-scale.
  • Build dataset versioning, lineage tracking, and reproducibility systems to ensure reliable AI training environments.
  • Optimize high-throughput data delivery systems to maximize compute and GPU utilization.
  • Collaborate with ML researchers and engineers to support dataset construction, evaluation pipelines, and AI model development needs.
  • Design scalable storage architectures and implement observability tools for data quality, performance, and pipeline health.
  • Ensure data governance, privacy compliance, and secure handling of sensitive datasets across systems.

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
  • 6+ years of experience in data engineering, preferably supporting machine learning or AI systems.
  • Strong proficiency in Python and at least one systems or JVM-based language (e.g., Java, Scala, Go).
  • Hands-on experience with distributed data processing frameworks such as Spark, Beam, or Ray.
  • Experience operating large-scale or petabyte-level data infrastructure systems.
  • Strong understanding of distributed systems, data modeling, storage formats, and pipeline architecture.
  • Experience with dataset versioning, lineage tracking, and ML reproducibility workflows.
  • Strong software engineering practices including testing, CI/CD, and system design.
  • Excellent communication skills and ability to work cross-functionally with technical teams.
  • Experience with multimodal datasets, privacy-aware systems, or AI training pipelines is a plus.

Benefits:

  • Competitive salary aligned with experience and expertise (W2 employment).
  • Full-time, long-term remote position within the United States.
  • Comprehensive benefits package (health, dental, vision, and wellness support).
  • 401(k) retirement savings plan and financial security programs.
  • Paid time off, holidays, and work-life balance support.
  • Opportunity to work on cutting-edge AI infrastructure and large-scale data systems.
  • Professional growth in advanced AI, distributed systems, and data engineering domains.


\n

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

 Why Apply Through Jobgether? 

 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

 

 

#LI-CL1

Not the right fit?

Browse all IT & Infrastructure roles.

Browse all jobs

Similar Jobs