Java, Python, Hadoop, Cassandra, MongoDB, SQL Server, Oracle, PostgreSQL, MySQL, Hibernate, Spring, Camel, RabbitMQ, Elasticsearch, Kafka Large scale data integration, processing, storage and service, on both Relational and Non-Relational platforms.
Experience
2017 — Now
2017 — Now
Redwood City, California, United States
2014 — 2017
Being both a hands-on engineer and a manager for the team.
Lead a team of engineers and data scientists on building a machine learning system that serves classification and attribute extraction functions to Walmart eCommerce's catalog. The system is composed of knowledge storages, data services, training and evaluation processes and data integration pipelines. The machine learning system enables us to continuously improve our classification results in an iterative and incremental fashion. We were able to maintain or lift our accuracy metrics during the catalog expansion of 4X in 1.5 years timeframe.
Roles:
• Took full ownership of the team's product features, product quality, productivity and delivery
• Built the team from scratch. Hire, motivate and grow team members
• Be the technical leader. Mentor junior team members. Lead or contribute to technical designs. Prototype and explore futuristic solutions
• Work with business partners, product managers and other teams on product design, system architecture and project management
Individual Contribution:
* Designs:
• architected the machine learning system that integrates knowledge storage, ML processes, web services and data integration processes.
• contributed to the initial design of the knowledge storage and its surrounding data services
• contributed to the initial design of Walmart eCommerce's catalog data processing pipeline (micro-service based architecture)
• participated in various API specification designs between teams
* Implementations:
• implemented the initial REST API for classifications and the MapReduce jobs that processes data for the knowledge storage
• contributed to data services for our machine learning system
• implemented a custom rule engine that executes classification rules 30X faster than the previous Drools implementation on an 8-core server and uses 1/10 memory.
2010 — 2014
2010 — 2014
Analytics Team:
Data Pipeline -- Designed and implemented the initial data pipeline that integrates and
aggregates file-based customer performance data based on a multi-tenant HDFS. Technology Stack: Java, Hadoop, MR, Camel
Message Bus -- Designed and implemented the Message Bus platform that allows diverse, distributed processes to integrate at event and data level. Implemented the messaging
infrastructure and client libraries that provide unified and open interfaces for applications to interact with the bus. Technology Stack: Java, RabbitMQ, Kafka (pre-production), Spring AMQP, Redis, S3
Change History -- Designed and implemented the Change History system, a central
repository for all application events, such as process statuses, user actions and data
changes, that are ingested from the Message Bus. A RESTful data service serves real-time attribute-based or time-based queries that give users or applications a holistic view of the historical events. Technology Stack: Java, RabbitMQ, Elasticsearch, Jersey
Datawarehouse ETL -- Designed and implemented reporting data ETL through the
Message Bus into the multi-tenant reporting datawarehouse. Technology Stack: Java, RabbitMQ, Spring Roo, PostgreSQL
Platform and Landing Pages Team:
Team lead on web services and data services using Spring and Hibernate, etc. that define metadata driven, dynamic landing pages.
Designed and implemented the initial authentication, authorization and administration services for the multi-tenant environment. Technology Stack: Java, Hibernate, Spring Web, Spring Security, MySQL, Tomcat
2005 — 2009
2005 — 2009
Designed and implemented large scale, multi-tier enterprise SaaS applications. Engineering lead for database design, data processing and integration of retail suppliers’ supply chain data. Technology Stack: Java, SQL Server, SOAP Web Services
Designed and implemented terabyte OLTP databases and data access API for retail suppliers.
Leader on performance, scalability and reliability (PSR). Improved data processing throughput and query performance by 10X over 2 years, reducing hardware costs by approximately 75%
Managed offshore projects.
2000 — 2005
2000 — 2005
Worked on Supplier Relationship Management product suite.
Education
Caltech
Bachelor of Science with Honor
Peking University