I work in the engineering division of the Data services and Platform -DSP team ( formerly called Enterprise Data Warehousing -EDW) which oversees the Big Data platform and workflows at Expedia. Worked on a project called 'Jetstream' with the following work highlights
•Contributed to building highly available, scalable data lakes for various teams across expedia.
•Built an ingestion system that can replicate hive data across datalakes
•Built a hive data analyzer that smartly finds which hive partitions were updated for replication
•Worked on building systems that enable hive data compression and compaction for better data replication performance
•Built orchestration engine to orchestrate various hive data pipelines.
Big Data technologies used: Hadoop, Hive, HBase, Kafka, Presto, Elasticsearch, DynamoDB
AWS technologies worked on : lambda, data pipeline, EMR, dynamo , API Gateway, Athena, ECS, EC2, S3, SNS
Deployment tools : terraform, Jenkins
Languages: Java, typescript, python