● Develop and maintain the data pipeline for public records in real estate, including data syncing, ingestion, extraction, transformation and loading to database (ETL), with services deployed on K8S.
● Design and implement multiple new features to the listings, end to end from data providers to customers: transactions’ buyers, sellers, property tax information, and mortgages along with the transaction, with kafka service for data streaming, AWS RDS for data storage, and React for front-end visualization.
● Collaborate with other teams for their feature requests: with search team to design and implement owner names parsing, standardization, and permutation to enable searching properties by owners’ name; with AI team to calculate and build ownership information (eg. number of years), and owner occupancy status to contribute to the sales prediction model and find potential sellers for agents.
● Enrich property database by introducing property foreclosure data and NYC property tax assessor data from new data sources, including raw data quality and coverage analysis by postgres SQL language, schema design, pipeline design and implementation and output validation.
● Set up a beta environment and enable pre-merge testing on Jenkins, to reduce gamma and production breaks.
● Migrate legacy Python projects with EMR jobs on EC2 servers to Java microservices and kafka on K8S, with architecture redesign and enabling incremental loading, to reduce cost and improve the data freshness.