Mastercard is a leading global payments and technology company seeking a Lead DevOps Engineer to join the Foundry RnD team. This role is pivotal in driving platform infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. You will design secure, scalable, and repeatable systems using Infrastructure as Code (IaC) to support complex R&D workloads.
Key Responsibilities
- Drive Platform Infrastructure: Own DevOps and infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. Design secure, scalable, repeatable systems using Infrastructure as Code (IaC) to support R&D workloads.
- Build secure CI/CD & automation systems: Enable secure tool access, workload isolation, and infrastructure for LLM-backed APIs and MCP servers, while partnering with security and compliance on access control, infrastructure governance and auditability.
- Ensure Reliability & Observability: Implement monitoring, logging, and alerting. Tune observability for ML-specific workloads to ensure performance, reliability, and operational insight.
- Provide Technical Leadership: Offer hands-on leadership across DevOps and platform initiatives. Review code, enforce best practices, improve tooling, and promote clean, well-tested infrastructure.
- Cross-Functional Collaboration: Partner with ML, software, and platform engineers to design deployment strategies, scope work, manage agile deliverables, and meet milestones.
Requirements and Qualifications
- Education: Bachelor's degree in Computer Science, Engineering, or a related field.
- Experience: 8–12+ years of proven experience in DevOps, SRE, or platform engineering, including senior/lead roles. Experience architecting and operating production-grade infrastructure, especially those supporting AI/ML workloads.
- Cloud Expertise: Strong skills in cloud platforms (AWS, Azure, or GCP) and AI/ML components such as Databricks, Azure ML, and MLflow.
- Infrastructure as Code: Expert in Terraform and IaC orchestration tools like Terragrunt. Strong experience with configuration management and GitOps practices.
- Containerization: Expertise in Kubernetes and Docker, including how they optimize ML development workflows. Experience with container security, networking, and cluster management at scale.
- Programming & Scripting: Advanced Bash and Python skills and strong software engineering fundamentals. Familiarity with Go or other systems programming languages is a plus.
- CI/CD & Automation: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools.
- Monitoring & Observability: Experience with monitoring stacks such as Prometheus, Grafana, Splunk, and ELK.
- Security & Networking: Knowledge of security best practices for MLOps, including data privacy, compliance, access controls, and encryption. Understanding of modern networking protocols (mTLS).
Preferred Skills
- Hands-on experience with Databricks (workspace administration, cluster management, Unity Catalog, Delta Lake).
- Advanced experience with Azure ML, SageMaker, or similar ML platforms.
- Knowledge of ML frameworks like TensorFlow, PyTorch, or Scikit-learn.
- Experience implementing self-service platform automation or internal developer platforms (IDPs).
How to Apply
Interested and qualified? Go to MasterCard careers portal to apply.