Key developer of Amazon product defect detection system, whose outputs were consumed by product search to refine search results and were published to millions of 3rd party sellers to improve online product data and to help sales:
• A fully distributed application processing 40 billion records per build with 16 input sources and 3 output targets, over 200 steps synchronized using Amazon Simple Workflow Service (SWF).
• An Elastic Map Reduce (EMR) cluster of 350 hosts was used for production build.
• Integrated with SABLE, an internal high available and scalable NoSQL storage and pre-computation service.
Technologies: Java, S3, EC2, Hadoop, Elastic Map Reduce (EMR), SimpleDB, Simple Workflow Service (SWF)
Designed and implemented a general purpose Web Service for catalog management:
• Owned the full life cycle of creating a scalable Web Service for updating Amazon catalog, hiding all the underlying complexities of interacting with various heterogeneous services.
• Integrated Amazon smart reconciliation engine and brand normalization engine with the service.
Technologies: Java, Amazon Service Framework, SOA, Jetty, REST, SOAP, XML, Oracle
Created a community based product improving system:
• Owned it from the front end to backend processing and the Admin tool.
• Auto approval engine has been gradually enhanced from a manually tuned naive model to a random forest based machine learning model.
Technologies: Java, Javascript, HTML, CSS, Tomcat, Hibernate, Oracle, Simple Queue, Random Forest