Design, build and manage scalable data ingestion service for inventory data from some of the world’s largest e-commerce retailers
• Process Billions lines of feed data with tens of millions of products indexed daily on top of Hadoop Cascading
• Customized Hadoop InputFormat and RecordReader for various kinds of data formats
• Modularized ingestion operations such as merge and aggregation to increase reusability
• Simplified complex custom data integrations by introducing a DSL-based module orchestration approach reducing the average integration time by half
• Minimized the operational cost by selecting the correct machine type and Hadoop configuration