Senior Azure Cloud Infrastructure Engineer (Healthcare AI Platform)
CIVIE
Benefits
- Paid vacation
- Sick time
- Personal days
- Health insurance
- Dental insurance
- Vision insurance
- Life insurance
- 401(k) match
Perks
- Company holidays
- UberEats voucher
- Fringe benefits
- Flexible schedules
- Professional development
Skills
About the Role
Overview
We're looking for a Senior Azure Cloud Infrastructure Engineer to design, build, and operate a highly resilient, secure, and cost-efficient cloud platform supporting advanced AI workloads in a healthcare environment.
This role is responsible for mission-critical infrastructure powering our proprietary foundational AI model, including GPU-based compute, while meeting strict requirements for compliance, data protection, and high availability. You will play a key role in ensuring our systems are fault-tolerant, auditable, and continuously optimized for both performance and cost.
What You'll Do
- Architect and manage highly available, fault-tolerant systems on Microsoft Azure with multi-region redundancy and disaster recovery
- Design infrastructure with strict adherence to healthcare compliance standards (e.g., HIPAA, HITRUST, SOC 2)
- Provision and optimize GPU-based environments for AI/ML workloads, including large-scale model training and inference
- Build secure, zero-trust architectures (private networking, encryption, identity isolation, least privilege access)
- Implement backup, failover, and business continuity strategies with clearly defined RTO/RPO targets
- Continuously reduce infrastructure costs through intelligent scaling, reserved capacity, spot instances, and workload optimization
- Develop Infrastructure as Code (Terraform, Bicep, ARM) for repeatable, auditable deployments
- Partner with AI/ML teams to productionize and scale foundational models reliably
- Establish observability across systems (logging, monitoring, alerting) with proactive incident response
- Conduct architecture reviews, risk assessments, and security audits
Required Experience
- 5–8+ years of hands-on experience with Microsoft Azure cloud infrastructure
- Proven experience designing high-availability and disaster recovery systems in regulated environments
- Strong background in healthcare or other compliance-heavy industries
- Deep expertise in:
- Azure Virtual Machines, VM Scale Sets, and GPU compute
- Azure networking (VNets, Private Link, ExpressRoute, firewalls)
- Storage solutions (Blob, Files, managed disks with redundancy options)
- Experience implementing compliance frameworks such as HIPAA or SOC 2
- Strong knowledge of identity and access control (RBAC, Azure AD, managed identities)
- Experience with Kubernetes (AKS) and containerized workloads
- Proficiency in scripting (Python, Bash, PowerShell)
Preferred Qualifications
- Experience with Azure AI ecosystem (Azure Machine Learning, Azure AI Foundry, Cognitive Services)
- Familiarity with distributed training, model parallelism, and GPU orchestration
- Experience implementing MLOps pipelines in regulated environments
- Azure certifications (Solutions Architect Expert, Security Engineer Associate, DevOps Engineer Expert)
- Experience with zero-downtime deployments and blue/green or canary strategies
Infrastructure Expectations
- Multi-region architecture with automated failover
- End-to-end encryption (data at rest and in transit)
- Segmented environments (dev/staging/prod) with strict isolation
- Real-time monitoring and alerting with defined SLAs
- Automated backup and recovery with regular testing
- Cost visibility and governance across all resources
What Success Looks Like
- Near-zero downtime systems with tested failover capabilities
- Full compliance readiness with audit trails and documentation
- Efficient GPU utilization supporting AI workloads at scale
- Measurable reduction in cloud spend without compromising reliability or security
- Seamless collaboration between infrastructure and AI teams
Why This Role Matters
You will be building the backbone of the next-generation healthcare AI platform - where reliability, security, and performance directly impact real-world outcomes. This is not just infrastructure; it is critical systems engineering at the intersection of cloud, AI and healthcare.
We offer
- Paid vacation, sick time, and personal days
- 11 company paid holidays
- Quarterly UberEats voucher
- Monthly Fringe benefits
- Flexible work schedules
- Professional development stipend
- Health, dental, and vision benefits, with employer HSA contribution
- STD, LTD and life insurance
- 401(k) company match and profit sharing
Collaborative Imaging Technology, LLC provides equal employment opportunities for all applicants and employees. All qualified applicants will be considered regardless of an individual’s race, color, sex, gender identity or expression, religion, age, national origin, citizenship, physical or mental disability, medical condition, family care status, marital status, domestic partner status, sexual orientation, military or veteran status, or any other basis protected by federal, state or local laws. If you cannot submit your application due to a disability, please email [email protected]; we will reasonably accommodate individuals with disabilities to the extent required by applicable law.
Similar Jobs
Sr Cloud Engineer - Remote
Infrastructure & Cloud Engineer
Senior Azure Cloud Engineer - US Remote
Senior Cloud Infrastructure Engineer
Senior Azure Cloud Engineer