•Led team for building experimentation data infrastructure at Twitter, partnered with data science teams.
•Build data warehousing solutions to make twitter’s Ads revenue data available by leveraging the largest-scalable distributed data processing technologies in the world.
•Built a petabyte-scale data warehouse on GCP using a big query. Built and maintained legacy and new data pipelines to transfer data between HDFS, Mysql, big query and Druid databases. Utilized tools like BQE, BQR, scalding, Big Query to Druid data migration tools, Apache Airflow for orchestration, Aurora for workflow deployment
•Implemented data layer for metrics and dimensions for Experimentation dashboard at twitter, which played critical role in Ship no ship decisions for Ads and user experiments launched by revenue teams
Tools used - BigQuery, Dataflow, Airflow, GCP, Scalding, Map Reduce, Scala, Batch Processing, Data Engineering, Hadoop, Druid(Distributed analytics platform), Kafka, Aurora, Data quality watcher (great expectation)