Sao Leopoldo, Rio Grande do Sul
Job Summary
As a Site Reliability Engineer at Staples, you will collaborate with a business-critical team of engineers responsible for the B2B and B2C sites performance and availability of one of the top eCommerce companies in the United States. You will be a key contributor to the success of our Public Cloud Adoption initiative. This program will drive critical technology and tangible business value utilizing the latest cloud technologies. We are looking for a highly motivated and experienced Site Reliability Engineering lead who wants to grow their career and work with cutting-edge tools and technologies. The Senior Site Reliability Engineer must have a proven track record of supporting B2B, B2C sites and their integrations, both on-premises and in the public cloud, with demonstrated expertise in related technologies. We are looking for an experienced full-stack engineer who wants to innovate, automate, and transform the enterprise.
Key Responsibilities
Duties & Responsibilities
- Engage and collaborate with cross-functional Product, Engineering, Security, Operations, Infrastructure teams and Vendors to improve MTTD and MTTR
- Design, develop, and implement infrastructure & application monitoring to ensure optimal platform availability and performance
- Research, analyze and recommend approaches for solving challenging operational issues
- Develop and maintain robust knowledge documentation for the Site Reliability Engineering team and its partners
- Proactively perform analysis and identify opportunities to innovate, automate, improve efficiency, and achieve cost savings
Skill Requirements
Bachelor’s degree in Computer Science or related field with continuous and progressive experience
- Minimum of 5 years of related experience working with some of these technologies:
- Application Performance Management and Monitoring tools such as New Relic, AppDynamics, SiteSpect, and Datadog
- Infrastructure monitoring tools like Zabbix, and Prometheus
- Databases eg: MongoDB, Oracle, Couchbase, Redis, MySQL
- Frameworks such as Dust/Angular, Nodejs, Springboot
- Log Analytics tools like Splunk, and ELK/Elastic
- Digital experience tools like Fullstory
- 2 to 5 years of experience with Cloud Technologies, at least half of which should be on the Microsoft Azure platform
- Strong hands-on experience with infrastructure and services (systems, network, cloud technology, provisioning, storage, etc)
- Must have strong experience with programming in one or more scripting languages (Python, Azure CLI, or Powershell)
- Hands-on experience with tool sets related to automation, orchestration, and managing infrastructure (Terraform, Puppet, Ansible, or Jenkins)
- Experience with configuring, deploying, and administering infrastructure and application monitoring tools that assist in troubleshooting performance and stability issues in a cloud environment.
Other Requirements
Awareness of AI/ML applications in observability and incident response Familiarity with LLMs and AI-driven automation tools. Understanding of AI-enhanced anomaly detection and predictive analytics
#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-