About Us
Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid.
At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world.
Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you.
Job Description
The Site Reliability Engineer is responsible for supporting the deployment and configuration of monitoring and logging tools, automating routine operational tasks, and maintaining observability tools such as Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch. This role works closely with team members to implement and maintain monitoring solutions across development, staging, and production environments, and contributes to the setup and maintenance of CI/CD pipelines to support automated build, test, and deployment processes.
The engineer provides support in managing cloud infrastructure (AWS, Azure) to ensure availability and security, learns and applies DevOps and SRE best practices, and assists with the implementation and management of containerization technologies like Docker and Kubernetes. Responsibilities include monitoring system performance, identifying and escalating issues, participating in troubleshooting and root cause analysis for production incidents, and creating and updating documentation for infrastructure and operational procedures.
All roles require digital fluency, including the ability to work with emerging technologies such as Generative AI tools (e.g. ChatGPT, Microsoft Copilot) to support everyday work.
Key Responsibilities:
-
Support deployment and configuration of monitoring and logging tools.
-
Automate routine operational tasks to improve efficiency and support system integration.
-
Assist with maintenance and management of observability tools (Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, CloudWatch).
-
Implement and maintain monitoring solutions in development, staging, and production environments.
-
Contribute to setup and maintenance of CI/CD pipelines for automated build, test, and deployment.
-
Provide support in managing cloud infrastructure (AWS, Azure) for availability and security.
-
Use infrastructure as code tools (Terraform, Ansible, CloudFormation) for environment configuration.
-
Monitor system performance and assist in identifying and escalating issues.
-
Support implementation and management of containerization technologies (Docker, Kubernetes).
-
Participate in troubleshooting and root cause analysis for production incidents.
-
Create and update documentation for infrastructure, processes, and operational procedures.
-
Provide first-level support for routine infrastructure and deployment issues, escalating complex problems as needed.
- Seek opportunities to automate repetitive tasks and suggest workflow improvements.
This is a remote position. A remote position does not require job duties be performed within proximity of a Visa office location. Remote positions may be required to be present at a Visa office with scheduled notice. #LI-Remote
Qualifications
Basic Qualifications:
-
Bachelor's degree, OR 3+ years of relevant work experience
Preferred Qualifications:
-
Hands-on experience designing and operating cloud‑native infrastructure.
-
Knowledge of Infrastructure as Code (Terraform), including contributing to reusable modules and platform components.
-
Good understanding of Kubernetes and container orchestration concepts.
-
Familiarity with CI/CD systems, pipeline configuration, automation, and secure deployment practices.
-
Basic understanding of database technologies including SQL, NoSQL, and common data storage patterns.
-
Experience using observability tools and stacks (Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar).
-
Basic automation experience using Bash, Python, or Ansible-like tools.
-
Strong problem-solving skills with demonstrated ability to reduce toil, address technical debt, and improve system stability.
-
Availability to participating in on-call rotations, incident response, and post-incident reviews.
-
Clear written and verbal English communication skills.
Visa is an EEO Employer
Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.