Role Summary
The Lead AWS Data Engineer provides technical leadership and hands‑on execution for enterprise data platforms hosted on Amazon Web Services (AWS). This role leads the design, migration, modernization, and operation of cloud‑native data architectures supporting mission‑critical financial, advisor payout, mobility, and corporate analytics platforms.
The Lead AWS Data Engineer owns end‑to‑end technical decision‑making for AWS data platforms, including architecture, security, orchestration, and production readiness. The role requires deep expertise in AWS data services, Python and PySpark development, workflow orchestration, data lake design, infrastructure as code, and operational excellence, along with the ability to mentor engineers and partner effectively with infrastructure, IAM, and database teams.
Key Responsibilities
Technical Leadership & Platform Ownership
-
Serve as the technical lead and design authority for AWS data engineering initiatives across multiple enterprise platforms.
-
Own architectural decisions related to scalability, reliability, security, and cost optimization of AWS data platforms.
-
Define and enforce engineering standards, coding patterns, and operational best practices for cloud data pipelines.
-
Provide hands‑on technical guidance, design reviews, and code reviews for data engineers.
AWS Data Platform Engineering
-
Lead the design, development, and support of cloud‑native data pipelines using Amazon S3, AWS Glue (PySpark), MWAA (Apache Airflow), and AWS Step Functions.
-
Drive on‑premises to AWS data platform migrations, including reverse engineering of legacy ETL workflows and re‑implementation using AWS‑native services.
-
Re‑architect legacy Oracle Data Integrator (ODI)–based ETL processes into scalable PySpark‑based Glue jobs.
-
Optimize Spark workloads for performance, memory usage, and cost efficiency in AWS Glue environments.
Data Lake, Iceberg & Architecture Design
-
Architect and implement enterprise AWS data lakes using Medallion architecture (Bronze, Silver, Gold).
-
Design and manage Apache Iceberg tables to support incremental processing, schema evolution, and efficient data lake operations.
-
Establish standardized ingestion, transformation, and consumption patterns across financial, mobility, and corporate datasets.
-
Ensure data quality, reconciliation, lineage, and auditability across all layers of the data platform.
Workflow Orchestration & Automation
-
Lead orchestration strategy using MWAA (Managed Workflows for Apache Airflow).
-
Design and implement Airflow DAGs in Python to orchestrate end‑to‑end workflows, including Glue jobs, validations, and downstream dependencies.
-
Implement scheduling, retry logic, monitoring, and failure handling to ensure resilient and scalable pipelines.
-
Integrate orchestration workflows with AWS services such as S3, Glue, Athena, Iceberg‑based data lakes, and downstream systems.
Security, Infrastructure & AWS Networking
-
Drive implementation of AWS security best practices, including IAM role design, least‑privilege access, encryption using AWS KMS, and secrets management.
-
Lead configuration of AWS networking components such as VPC Endpoints (VPCE) to enable secure service‑to‑service communication.
-
Manage infrastructure provisioning using Terraform, ensuring repeatable and auditable deployments across DEV, QA, and PROD environments.
-
Coordinate with IAM, network, DevOps, and DBA teams to resolve access, firewall, and Oracle database connectivity challenges.
Production Readiness, Operations & Support
-
Own production readiness for AWS data platforms, including configuration, secrets, access controls, and deployment planning.
-
Act as the escalation point for complex production issues, performing root‑cause analysis and permanent fixes.
-
Implement logging, metrics, and alerting using Amazon CloudWatch to meet enterprise SLAs and availability targets.
-
Support parallel‑run and hybrid architectures during migration phases to ensure business continuity.
Automation, Compliance & Regulatory Enablement
-
Design and oversee Python‑based automation solutions supporting operational efficiency and compliance initiatives (e.g., file retention and document processing).
-
Ensure pipeline designs meet regulatory, audit, and enterprise governance requirements, including traceability and controlled data handling.
Collaboration & Stakeholder Engagement
-
Partner closely with data architects, DevOps teams, infrastructure teams, and business stakeholders to deliver AWS data solutions aligned with enterprise strategy.
-
Translate business and platform requirements into scalable technical designs and execution plans.
-
Produce technical documentation and support knowledge transfer to enable long‑term platform sustainability.
Required Qualifications
-
Experience with Apache Iceberg or similar data lake table formats.
-
Exposure to analytics or BI platforms (e.g., ThoughtSpot, Tableau).
-
Exposure to Oracle databases or legacy ETL tools (e.g., ODI).
-
Experience in financial services or regulated enterprise environments.
-
Familiarity with CI/CD practices for data engineering workloads.