# Feng Zhang > PhD | Principal Software Engineer | Apache Sedona PMC Member Location: San Francisco Bay Area, United States Profile: https://flows.cv/fengzhang Distinguished software engineering leader and researcher with over 15+ years of experience in directing enterprise-wide implementations, managing various technology initiatives, and leading dynamic teams. A dedicated, analytical, and proactive leader renowned for successfully managing talent and fostering teams that achieve outstanding results through collaborative efforts. Exceptionally skilled in orchestrating strategic, high-impact projects that synchronize technological solutions with organizational objectives. Expert at utilizing a blend of technical expertise, business insight, and strategic planning to effectively engage with stakeholders and guide team operations. Technical Proficiencies: Rust | Java | Scala | Python | R | JavaScript | Apache Spark | Apache Sedona | Apache Parquet | Apache Arrow | Apache Iceberg | Delta Lake | Apache Airflow | DuckDB | RocksDB | Unity Catalog | Kubernetes | AWS, GCP, Azure | TensorFlow | PyTorch | Chainer | MLflow ## Work Experience ### Principal Software Engineer @ Wherobots Jan 2024 – Present | San Francisco, California, United States - Led the development and enhancement of WherobotsDB and its distributed query engine, as well as WherobotsAI, by designing and implementing new features and APIs to expand geospatial query capabilities. - Contributed to Apache Sedona by implementing advanced spatial joins, optimizing and refactoring code, and actively participating in community activities to support adoption and collaboration. - Provided technical leadership and mentorship to team members, driving the successful delivery of complex projects while fostering a culture of collaboration and innovation. - Focused on improving performance, observability, and reliability, ensuring seamless and efficient data processing at scale. ### Apache Sedona PMC Member & Committer @ The Apache Software Foundation Jan 2024 – Present Guiding strategic direction and project governance for Apache Sedona. Contributing core enhancements to the distributed spatial query engine, extending Apache Spark, Flink, and Snowflake. Implemented scalable kNN join, spatial indexing optimizations, and STAC-based data ingestion. Developed GPU-accelerated query execution and vectorized spatial query processing to improve performance at scale. Created SpatialBench, a comprehensive benchmarking framework for evaluating spatial query engine performance. ### Apache Parquet Contributor @ The Apache Software Foundation Jan 2024 – Present Contributed directly to the standardization of spatial data storage within the Parquet file format. This work involved developing critical improvements to geospatial metadata and logical type specifications, which significantly enhance the interoperability and performance of spatial datasets across various analytical engines. ### Apache Spark Contributor @ The Apache Software Foundation Jan 2025 – Present Identified, reported, and troubleshooted and fixed correctness bugs in Spark SQL's aggregate function behavior under Adaptive Query Execution, contributing to the reliability of Spark's distributed query engine. ### Apache Datafusion Contributor @ The Apache Software Foundation Jan 2024 – Present Developed custom geospatial join and expression functionalities for the in-memory query engine, enhancing its capabilities in the spatial domain. ### Apache ShardingSphere Contributor @ The Apache Software Foundation Jan 2023 – Present Contributed to distributed database features including sharding logic, scaling mechanisms, and SQL parsing extensions for complex analytics workloads. ### Technical Fellow, Database @ Space and Time Jan 2023 – Jan 2024 | Los Angeles Metropolitan Area - Develop and implement a strategic technical vision for the team, with a focus on emerging technologies and industry trends within the distributed data analytics domain - Develop a verifiable computation layer for a blockchain-based decentralized database by incorporating sub-second zero-knowledge proofs - Provide thought leadership on database optimization, query planning, workload management techniques, cluster-to-cluster communication, and scaling up to proof of concept to “cluster scale” and eventually hundreds of clusters with hundreds of terabytes each ### Senior Principal Software Engineer @ Space and Time Jan 2022 – Jan 2023 | Los Angeles Metropolitan Area - Manage a team of 20+ software engineers writing new code to build a decentralized Web3-native HTAP database - Develop vision and strategy for the product portfolio and management execution about data acquisition, engineering, governance, database tuning, automated database management, sharding, etc - Build blockchain dApps that join tamperproof data indexed from major blockchains & off-chain sources at enterprise scale ### Senior Principal Software Engineer @ Aetion Jan 2022 – Jan 2022 | New York, United States - Directed and grew founding engineering team while company achieved solid fundings from seed to Series A, B, and C - Provide senior level technical leadership across 7 engineering teams and lead technical architecture from design through implementation for large-scale, cloud-based and data-intensive software products, interacting with external stakeholders - Lead team of 8 senior technology experts, who guide large strategic customers through cloud journey and collaborate to innovate on new and disruptive technologies including distributed computing and cloud-based analytics - Coordinate with 5 senior executives on developing long-range plans up to 5 years for engineering and product groups, including resource capacity planning and talent management ### Principal Software Engineer @ Aetion Jan 2019 – Jan 2022 | New York, United States - Built and maintained a portfolio of large-scale cloud-based SaaS products and delivered technical leadership at architect level that is material to overall company success by leading strategy of Aetion's evidence platform, a cloud-based SaaS product - Led design and implementation of proprietary and domain-specific data manipulation language (DSL) as core functionalities of healthcare analytics and support strategies regarding real-world data, real-world evidence and digital health ### Senior Staff Software Engineer @ Aetion Jan 2015 – Jan 2019 | New York, United States - Designed and implemented processes and layouts for complex, large-scale data services including multiple tenets platform, petabyte level data sets and thousands node computing clusters - Developed and led delivery of data models of longitudinal patient datasets, optimized for evidence generation analytics, which power Aetion's evidence platform - Constructed 200 data ingestion pipelines, managing data collection, cleaning, transformation, aggregation and deployment - Collaborated with 100 medical scientists to understand 200+ terabyte client datasets and validate data integrity ### Senior Technical Lead @ Esri Jan 2011 – Jan 2014 | Redlands, CA - Oversaw technical development of 40+ programming support unit employees including software architecture design - Developed software solutions based on GIS platforms and patterns including web application, embedded analytics and desktop applications that demonstrate business value of geospatial technology on big data analytics by customizing Hadoop framework - Built statistical regression, classification and clustering models to perform geospatial data analysis on large geospatial datasets using statistical machine learning algorithms as well as spatial application solutions and web services ### Research Associate @ Georgia Institute of Technology Jan 2007 – Jan 2011 - Invented a new joint variable spatial downscaling (JVSD) technique for statistically downscaling gridded spatial variables to generate high resolution gridded datasets - Contributed to research projects and reports sponsored by U.S. and foreign organizations including National Aeronautics and Space Administration (NASA), US Geological Survey (USGS), National Oceanic and Atmospheric Administration (NOAA), World Bank, and National Climate Task Force - Developed distributed statistical learning algorithms (e.g., spatial classification, clustering) on Hadoop platform using Hive and MapReduce on large-scale climate datasets ### Teaching Assistant @ North Carolina State University Jan 2006 – Jan 2007 | Raleigh, North Carolina, United States ### Research Lecturer @ Huazhong University of Science and Technology Jan 2003 – Jan 2005 | Wuhan, Hubei, China ## Education ### Doctor of Philosophy (PhD) in Engineering Georgia Institute of Technology ### Master of Science (MS) in Engineering Huazhong University of Science and Technology ### Bachelor's degree in Engineering Huazhong University of Science and Technology ## Contact & Social - LinkedIn: https://linkedin.com/in/feng-zhang-data --- Source: https://flows.cv/fengzhang JSON Resume: https://flows.cv/fengzhang/resume.json Last updated: 2026-04-12