
Wikimedia Foundation
Remote Job: Staff Site Reliability Engineer – Machine Learning Infrastructure at Wikimedia Foundation
Organization: Wikimedia Foundation
Position: Staff Site Reliability Engineer (Machine Learning Infrastructure)
Location: Remote (across eligible countries in Americas, Europe, and Africa)
Job Type: Full-time
Salary Range (U.S.): $129,347 – $200,824 (adjusted based on location outside the U.S.)
About This Opportunity
The Wikimedia Foundation, the non-profit powering Wikipedia and other free knowledge projects, is seeking a Staff Site Reliability Engineer focused on Machine Learning Infrastructure. This fully remote role is part of a distributed team building robust systems that empower machine learning researchers and engineers to deploy, scale, and monitor models in production.
This is a high-impact opportunity to shape the future of open-access AI tools in one of the most visited websites globally.
Key Responsibilities
-
Design, build, and maintain ML infrastructure for training, deployment, and monitoring
-
Optimize system performance, availability, and scalability
-
Collaborate with ML engineers, researchers, product teams, and volunteer contributors
-
Lead infrastructure automation, observability, and monitoring
-
Contribute documentation and mentor team members on reliability engineering best practices
-
Ensure security, capacity planning, and operational excellence in ML systems
Required Qualifications
-
7+ years in SRE, DevOps, or infrastructure engineering roles
-
Hands-on experience with on-premise ML infrastructure: Kubernetes, Docker, GPU acceleration
-
Proficiency with automation tools: Terraform, Helm, Ansible, Argo CD
-
Monitoring experience with Prometheus, Grafana, ELK Stack, etc.
-
Familiarity with ML frameworks like PyTorch, TensorFlow, scikit-learn
-
Strong communication skills and ability to work across global, asynchronous teams
Desired Strengths
-
Expertise in scaling high-performance ML workloads
-
Strong background in system reliability and operational automation
-
Commitment to open-source software and decentralized knowledge access
-
Experience mentoring and contributing to technical knowledge-sharing
What Wikimedia Offers
-
Remote-first work across 40+ countries
-
Competitive salary based on location and experience
-
Transparent, mission-driven compensation structure
-
Inclusive, diverse workplace committed to equity and openness
-
Global impact on how billions access knowledge
Apply now to help build the machine learning infrastructure behind one of the world’s most important open knowledge platforms.
To apply for this job please visit job-boards.greenhouse.io.