# Prakhar Jain > Staff Software Engineer at Databricks Location: San Francisco Bay Area, United States Profile: https://flows.cv/prakharjain I have around 10 years of software development experience and I am passionate about working on Databases and distributed systems. My latest works revolves around making Hadoop and Spark stack run on cloud(AWS, Azure and Oracle BMC) in a cost effective and performant manner. ## Work Experience ### Staff Software Engineer @ Databricks Jan 2021 – Present | San Francisco Bay Area ### Senior Software Engineer @ Databricks Jan 2021 – Jan 2023 | United States ### Senior Software Engineer @ Microsoft Jan 2019 – Jan 2021 | Bengaluru Area, India Worked on making Apache Spark performant, resilient, scalable and cloud native: - Improved Spark cluster downscaling by building features like RDD Cache decommissioning, Shuffle offloading. Contributed the same to OpenSource. - Improved Spark TPCDS performance by more than 2X. Researched, Designed and Implemented multiple SQL optimizations - Pre-Aggregation, CNF-DNF Predicate pushdown, Better Sort order selection, Join reordering improvements, Inner to Semi join conversion, Smart shuffle key selection, Distinct pushdown, Adaptive Hash Aggregates etc in Apache Spark. - Worked on Sub-Query Fusion Optimization for Eliminating Redundant I/O in Spark. ### Member of Technical Staff, Spark @ Qubole Jan 2017 – Jan 2019 | Bangalore - Designed and developed features such as direct writes, parquet metadata caching, executor packing, proactive shuffle data cleanup, recover partition improvements, smart executor sizing which improves spark performance and cluster utilization. - Made SQL optimizations in Spark Catalyst optimizer such as Skew join, UDF pushdown. - Designed and developed a new Spark Datasource connector to support HiveAcid tables in Spark. ### Member Of Technical Staff, Cluster Orchestration @ Qubole Jan 2015 – Jan 2017 Created a new cloud agnostic, multi engine framework Cloudman from scratch which supports multiple clouds - AWS, Azure, Oracle OPC and multiple engines - Hadoop, Hive, Presto, Spark. Gave a talk in "Strata+Hadoop" conference on the same. Added multiple features in Cluster orchestration layer - Heterogeneous clusters, Smart AZ selection, Parallel master-slave bringup, cluster management for Public/Private subnet in VPC, Spot loss handling, Metrics collection framework. ### Software Design Engineer @ Chronus - Mentoring & Talent Development Solutions Jan 2014 – Jan 2015 ### Intern @ AnB Education Jan 2012 – Jan 2012 Developed a native Android application - RSquare from scratch with an attractive GUI allowing users to get different modules/chapters from the server, solve them offline and submit report back to server ### Summer Internship @ MetrixLine Jan 2012 – Jan 2012 Prototyped back-end of the MetrixLine's flagship product, MetrixTrack; Designed a No-SQL graphical representation of the data based on MySQL, testing it on up to 1.4 GB of data Designed and developed a graphical representation of the tagged metrics to enable user specific tagging and calculations of metrics, an open problem being faced by the organization ### Hostel Secretary @ Computer and Web, Hostel 5, IIT Bombay Jan 2011 – Jan 2012 Developed from scratch, a new website featuring the latest hostel news and an improved gallery Developed a custom Content Management System and facilitated the website with an online editing interface enabling the hostel secretaries to update their respective sections by themselves ## Education ### Bachelor of Technology (B.Tech.) in Computer Science Indian Institute of Technology, Bombay ### DAV Public School ## Contact & Social - LinkedIn: https://linkedin.com/in/prakharjain09 --- Source: https://flows.cv/prakharjain JSON Resume: https://flows.cv/prakharjain/resume.json Last updated: 2026-04-12