Experienced Lead Data Engineer with a demonstrated history of working in the financial services industry and executing large scale Data projects. Skilled in Data Engineering, Big Data, Machine Learning, Algorithm Design.
Experience
2019 — Now
2019 — Now
San Francisco Bay Area
1) Managed, architected, developed and maintained several data pipelines of petabyte volumes (generating over 100TB/day) for data products like B2B web analytics & intent . (Tools: GCP Dataproc, AWS EMR, GCP Bigquery, GCP Dataflow, GCP Composer, AWS EKS, GCP GKE).
2) Developed & Migrated high volume applications between clouds to reduce costs (Ex Quoble -> GCP Dataproc, EMR.) for web analytics & Business identification and saved costs upwards of 300k/year.
3) Architected cloud cost reduction tools to save upwards of 1million$/year.
4) Developed systems with smart algorithms + modeling + statistical techniques to filter B2B Intent signals from DSP auctions.
5) Migrated ETL pipelines from composer airflow to astronomer.
6) Developed & contributed to Integration of back end data systems to middleware and platform.
7) Managed principal level engineers to deliver projects in a high velocity environment.
8) Hiring & Mentoring engineers to add strong talent to Demandbase Engineering.
Skills: Scala, Python, Spark, (AWS,GCP)
2018 — 2019
Emeryville, California
Executing Realogy's vision to leverage data to make Realogy, a real estate leader.
Responsibilities:
1) Design tech architecture & develop innovative solutions for real estate products.
2) Leverage technical resources to deliver software products and scale software development.
Some projects executed :
* Master Data Management: there are more than 20 business units in realogy doing more or less same business, we wanted to built centralize Data Enrichment platform for the organization. Common uses cases are Agents changes jobs, works in different brokerages/brands and multiple companies/offices at the same time assuming different identities, Leads comes from different lead channels, zillow facebook with data suffering from duplication, low quality. We wanted to build 360 degree view of real world person on timeline by cleansing, deduplicating, consolidating and enriching the data attributes. API/Batch/Streaming based ingestion/expose of data from/to multiple sources. End to End Design (Infra/Network/DB), architecture, data modeling, infrastructure as code, development. (scala/java/go, Cloudformation ECS services, postgres, kafka, MDM software)
2018 — 2018
Oakland, CA
* Built Replication Monitoring & remediation software for several TB of fan-in multi-tenant MySQL DBs to Redshift (airflow based replication setup). Involves joining 20 billion row hash from MySQLto Redshift datasets. Guarantees data integrity & consistency and has significant business & brand impact (EMR, Spark, airflow, scala, python, MySQL, Redshift).
* Built Fast refreshing ETLs to populate different DW & Datamarts for Transaction consistency (Auth, Loads, Pin), analytics, stats for different data entities. (Redshift, airflow)
* Developing ETL stats to be consumed by Fraud systems. Also working on kafka streams based realtime analytics pipeline. (Redshift, MySQL, airflow, python)
* Support testing & validating third party fan-in MySQL Binlog based realtime replication to Redshift via Kafka (MySQL, Redshift)
* Research Debezium framework for mysql to Kafka fan-in replication.
* Recruiting , mentoring engineers, other admin tasks.
2014 — 2017
2014 — 2017
San Francisco Bay Area
* Data Platform ETL: Data from MySQL, Mongo, S3 and other data sources are enriched with inferred and derived information. Cascading framework is used for data operations & Hadoop-MapReduce support. A set of data taps extract data from different input sources, and flows incorporate conformation & Transformation of the data, sink taps outputs the data to S3 & Redshift. Data is ingested by BIRST BI. Supported Design & Development since the inception of the project. (Cascading, EMR, Java, MySQL, Redshift)
* Developed and maintained financial RollForward process, automated loan security backed creditor funding process. (Java, Scriptella, Mysql, Gradle, Groovy)
* End to End Architected, designed, developed automated GAAP accounting & reporting system. Has accounting features like investor portfolio management, revenue recognition, accruals, receivables, risk-allowances (Java, Scriptella, Mysql, Gradle, Groovy)
* Worked with teams to design Financial engine data model design, workflows. (MySql)
* Designing & developing a new real time loan level General Ledger system with GAAP standards. Supports investor portfolio management and has metrics, stat details of loan performance. (MySql )
* Developed data lakes, reporting projects, back end financial processes as needed for accounting, compliance, risk & analytics team. (Java, Redshift, Scriptella, Mysql)
* Developed Spark based solution for processing historical large datasets for financial back end process. (Scala, Spark, EMR, Docker)
* Developed resilient financial processing Systems, automate processes, CI & CD.
* Dockerize development projects, to automate integration testing.
* Worked on POC Debezium mysql/mongo to Kafka replication, framework for server-less dataset querying using Hive, sqoop.
2013 — 2014
2013 — 2014
San Francisco Bay Area
* Lead a team of 6 people in SF(2) & Vietnam(4).
* Designed and developed new java based high performance Web portal.
* Integrated automatic SDK/package release into Web portal.
* Integrated Security product certifications workflow into Web portal.
* Customer support service software integrations (Zendesk, Desk, Magnolia) and development.
* Architected & Engineered applications, reviewed code and ensured TDD.
* Timely delivery of quality products to Business.
* Mentored & Motivated Engineers to stay focus on OPSWAT's goals.
Languages & Softwares: Java stack(95%) (Spring, Hibernate, MVC, JPA), shell, python, perl (5%), javascript, jquery, xml, html, tomcat, apache2, SAML2 SSO.
Databases: mysql, mongodb.
Education
University of Southern California
Master's degree
Kakatiya University