# Prakhar Jain

> Staff Software Engineer at Databricks

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/prakharjain

I have around 10 years of software development experience and I am passionate about working on Databases and distributed systems.

My latest works revolves around making Hadoop and Spark stack run on cloud(AWS, Azure and Oracle BMC) in a cost effective and performant manner.

## Work Experience
### Staff Software Engineer @ Databricks
Jan 2021 – Present | San Francisco Bay Area

### Senior Software Engineer @ Databricks
Jan 2021 – Jan 2023 | United States

### Senior Software Engineer @ Microsoft
Jan 2019 – Jan 2021 | Bengaluru Area, India
Worked on making Apache Spark performant, resilient, scalable and cloud native:

- Improved Spark cluster downscaling by building features like RDD Cache decommissioning, Shuffle offloading. Contributed the same to OpenSource.

- Improved Spark TPCDS performance by more than 2X. Researched, Designed and Implemented multiple SQL optimizations - Pre-Aggregation, CNF-DNF Predicate pushdown, Better Sort order selection, Join reordering improvements, Inner to Semi join conversion, Smart shuffle key selection, Distinct pushdown, Adaptive Hash Aggregates etc in Apache Spark.

- Worked on Sub-Query Fusion Optimization for Eliminating Redundant I/O in Spark.

### Member of Technical Staff, Spark @ Qubole
Jan 2017 – Jan 2019 | Bangalore
- Designed and developed features such as direct writes, parquet metadata caching, executor packing, proactive shuffle data cleanup,  recover partition improvements, smart executor sizing which improves spark performance and cluster utilization.

- Made SQL optimizations in Spark Catalyst optimizer such as Skew join, UDF pushdown.

- Designed and developed a new Spark Datasource connector to support HiveAcid tables in Spark.

### Member Of Technical Staff, Cluster Orchestration @ Qubole
Jan 2015 – Jan 2017
Created a new cloud agnostic, multi engine framework Cloudman from scratch which supports multiple clouds - AWS, Azure, Oracle OPC and multiple engines - Hadoop, Hive, Presto, Spark. Gave a talk in "Strata+Hadoop" conference on the same. 

Added multiple features in Cluster orchestration layer - Heterogeneous clusters, Smart AZ selection,  Parallel master-slave bringup, cluster management for Public/Private subnet in VPC, Spot loss handling, Metrics collection framework.

### Software Design Engineer @ Chronus - Mentoring & Talent Development Solutions
Jan 2014 – Jan 2015

### Intern @ AnB Education
Jan 2012 – Jan 2012
Developed a native Android application - RSquare from scratch with an attractive GUI allowing users to get different modules/chapters from the server, solve them offline and submit report back to server

### Summer Internship @ MetrixLine
Jan 2012 – Jan 2012
Prototyped back-end of the MetrixLine's flagship product, MetrixTrack; Designed a No-SQL graphical representation of the data based on MySQL, testing it on up to 1.4 GB of data

Designed and developed a graphical representation of the tagged metrics to enable user specific tagging and calculations of metrics, an open problem being faced by the organization

### Hostel Secretary @ Computer and Web, Hostel 5, IIT Bombay
Jan 2011 – Jan 2012
Developed from scratch, a new website featuring the latest hostel news and an improved gallery

Developed a custom Content Management System and facilitated the website with an online editing
interface enabling the hostel secretaries to update their respective sections by themselves


## Education
### Bachelor of Technology (B.Tech.) in Computer Science
Indian Institute of Technology, Bombay

### DAV Public School


## Contact & Social
- LinkedIn: https://linkedin.com/in/prakharjain09

---
Source: https://flows.cv/prakharjain
JSON Resume: https://flows.cv/prakharjain/resume.json
Last updated: 2026-04-12