•Serve as a subject-matter expert for a Hadoop-based self-service data platform (DPaaS) executing 45K+ daily ETL job instances (Spark, MapReduce, Java, Python, Shell), supporting high availability and SLA compliance across regional clusters.
•Lead full-stack development and operations across platform services including UI, job scheduling and execution, messaging, authentication, alerting, monitoring, metrics dashboards, and SQL databases, with regular participation in on-call incident response.
•Build and maintain shared platform libraries, authentication utilities, connector frameworks, and processing tooling to standardize secure job execution and enable scalable integration across clusters and environments.
•Develop and maintain CI/CD pipelines and containerized deployment workflows to package and release application services and data processing artifacts across Kubernetes, HDFS, and VM/bare-metal environments.
•Implement and operate large-scale conversation data scrubbing pipelines using LLMs with custom system prompts and classification criteria to identify and remove PII and sensitive content, while building monitoring dashboards to track pipeline health, scrubbing accuracy, and compliance metrics.
•Implement and operate end-to-end PII joining and data delivery systems, including APIs, storage, orchestration jobs, and client-facing tooling, ensuring accurate, compliant, and reliable distribution to 800+ customers across regional and multi-cloud platforms.