3M has a long-standing reputation as a company committed to innovation. We provide the freedom to explore and encourage curiosity and creativity. We gain new insight from diverse thinking, and take risks on new ideas. Here, you can apply your talent in bold ways that matter.
Job Description:
Collaborate with Innovative 3Mers Around the World
Choosing where to start and grow your career has a major impact on your professional and personal life, so it’s equally important you know that the company that you choose to work at, and its leaders, will support and guide you. With a diversity of people, global locations, technologies and products, 3M is a place where you can collaborate with 93,000 other curious, creative 3Mers.
The Impact You’ll Make in this Role
3M is seeking a Platform Ops & Reliability Lead to join the Corporate Research Digital Platforms (CRDP) team to ensure the reliability, stability, and operational excellence of our enterprise data and AI platform, including Databricks and supporting cloud infrastructure (AWS). In this role, you will own the runtime health of the platform, serving as the primary leader for incident response, operational processes, and system reliability. You will work across Databricks, Temporal workflows, AWS infrastructure, and metadata systems to ensure that platform services operate predictably, scale effectively, and meet reliability expectations.
You will operate at the intersection of engineering and operations, partnering closely with Platform Engineers, Data Engineers, and Governance teams to translate platform capabilities into reliable production systems.
This role is critical in enabling scalable adoption of the platform by ensuring systems are stable, supportable, and operationally mature.
In this role, you will have the opportunity to
Ensure the stability, availability, and operational health of the platform, including Databricks, workflows, integrated systems, and AWS infrastructure.
Define and monitor health, performance, and reliability metrics such as platform availability, job success rates, workflow completion, and incident resolution times.
Lead the management of critical incidents by coordinating technical teams, running war rooms, and driving root cause analysis with a focus on continuous improvement.
Implement and enhance SRE practices, observability, monitoring, alerting, and automation to improve platform reliability.
Design and operate the platform support model, including intake, triage, escalation, SLAs, and operational KPIs.
Monitor and support production pipelines and workflows, acting to prevent and resolve failures, delays, and service disruptions.
Ensure adherence to operational and governance standards in production, including access controls, naming conventions, and approved platform policies.
Partner with Platform Engineering, Data Engineering, and Governance teams to ensure operational readiness, effective support, and sustainable platform evolution.
Maintain up-to-date operational documentation, including runbooks, incident playbooks, and troubleshooting guides.
Your Skills and Expertise
To set you up for success in this role from day one, 3M requires the following qualifications:
Strong experience in infrastructure operations, cloud operations, SRE, or platform operations roles.
Experience supporting Databricks or modern data platforms (Lakehouse architectures).
Proven experience managing production systems with high availability, reliability, and operational rigor.
Experience leading incident management, including major incident response and root cause analysis.
Hands-on experience with monitoring and observability tools (e.g., Datadog, Grafana, CloudWatch, Dynatrace).
Experience supporting cloud-based platforms (AWS preferred), including troubleshooting infrastructure and services.
Strong understanding of IT operations, support models (L1/L2/L3), and service management practices (ITIL).
Experience working in cross-functional environments, coordinating between engineering, platform, and support teams.
Bachelor’s degree or higher in Computer Science, Engineering, or related technical field.
Proficiency in English.
Additional qualifications that could help you succeed even further in this role include:
Familiarity with workflow orchestration platforms such as Temporal, Airflow, or Step Functions.
Experience implementing SRE practices, including SLIs, SLOs, and reliability metrics.
Experience with cloud cost management and FinOps practices.
Familiarity with IAM, access control models, and security best practices in cloud environments.
Basic scripting or automation experience (Python, Bash, or similar).
Strong communication and coordination skills, with the ability to lead cross-team operational efforts.
Work location
This role follows an on-site working model, requiring the employee to work at least four days a week at 3M in Sumaré/SP
Supporting Your Well-being
3M offers many programs to help you live your best life – both physically and financially. To ensure competitive pay and benefits, 3M regularly benchmarks with other companies that are comparable in size and scope.
Chat with Max
For assistance with searching through our current job openings or for more information about all things 3M, visit Max, our virtual recruiting assistant on 3M.com/careers.
Learn more about 3M’s creative solutions to the world’s problems at www.3M.com or on Instagram, Facebook, and LinkedIn @3M.
A 3M é um empregador que oferece oportunidades iguais à todos. A 3M não discriminará nenhum candidato baseado em sua raça, cor, idade, religião, gênero, orientação sexual, identidade ou expressão de gênero, nacionalidade ou deficiência.
Safety is a core value at 3M. All employees are expected to contribute to a strong Environmental Health and Safety (EHS) culture by following safety policies, identifying hazards, and engaging in continuous improvement.
Please note: your application may not be considered if you do not provide your education and work history, either by: 1) uploading a resume, or 2) entering the information into the application fields directly.
3M Global Terms of Use and Privacy Statement
Carefully read these Terms of Use before using this website. Your access to and use of this website and application for a job at 3M are conditioned on your acceptance and compliance with these terms.
Please click on the following links and select the country where you are applying for employment to review the applicable Terms of Use (link here) and Privacy Policy (link here). Before submitting your application, you will be asked to confirm your agreement with the terms.