# Sandip Agarwala

> Engineer @ Databricks

Location: Cupertino, California, United States
Profile: https://flows.cv/sandipagarwala

Hands-on Technologist and Leader with a history of building successful large-scale distributed data storage systems and services, and private cloud computing infrastructure. Skilled at understanding the customer pain points and market opportunities, at driving technical strategies, and at leading cross-functional projects to address those needs from conceptualization to architecture and development. 

Principal architect of the industry’s first petabyte-scale distributed, key-value based, log-structured file system for private hyper-converged cloud; Designed and built the core data platform from scratch that grew annual revenue from zero to $500 million.

Designed and implemented several core distributed file system components and protocols, sharding and replication, QoS, on-disk metadata format, checksums, erasure coding, snapshots, multi-tier caching, deduplication, garbage collection, encryption, resync and recovery.

Expertise in distributed systems, enterprise storage, disaster recovery, cloud computing and virtualization, performance monitoring and optimization, storage for streaming, batch and event processing, and Big Data analytics. 

28 patents and 17 research papers in refereed conferences. (https://scholar.google.com/citations?hl=en&user=SrIYV9UAAAAJ)

Mentored more than 25 engineers across geographies from junior engineer to senior technical leads.

Software skills: Linux, C, C++, Java, python, shell script (bash, awk, etc.)

## Work Experience
### Staff Software Engineer @ Databricks
Jan 2023 – Present | San Francisco Bay Area

### Principal Engineer @ Cisco
Jan 2017 – Jan 2023 | San Francisco Bay Area
•	Core Data Platform Architect and tech lead for Hyperflex (Springpath Acquisition)
•	Architected, designed, and implemented several data storage services including erasure coding, virtual machine snapshots, encryption, etc. that increased the TAM 50% to $750 million.
•	Collaborated with the CTO, product managers, sales, and cross-functional teams to improve the product and drive technical strategy and product roadmap like edge and next-generation hardware accelerator.
•	Improved storage efficiency 2X without any performance degradation by designing and implementing distributed erasure coding.
•	Developed a novel distributed snapshot feature that can create 100s of virtual machine snapshot per second. This is used extensively for protecting VMs via backup and disaster recovery.
•	Strengthened the reliability of the cluster more than 50% by creating a bit rot detection and recovery feature that eradicated silent disk corruptions.
•	Reduced cluster recovery time 50%+ with a new resync-recovery mechanism.
•	Troubleshoot complex distributed system bugs and performance issues for many large customers by analyzing logs and gdb core, code instrumentation, monitoring events and stats, etc.
•	Designed and implemented tools to troubleshoot and to improve productivity and product quality.
•	Mentored more than 25 engineers across geographies from junior engineer to senior technical leads.

### Founding Engineer / Architect @ Springpath (Acquired by Cisco)
Jan 2012 – Jan 2017 | San Francisco Bay Area
•	Designed and implemented several core distributed file system components and protocols, sharding and replication, QoS, on-disk metadata format, checksums, snapshots, multi-tier caching, deduplication, garbage collection, encryption, resync and recovery.
•	Architected a novel scale-out log-structured file system that achieved greater than 50% capacity savings using compression and deduplication without any read-modify-write performance penalty.
•	Drove consensus across teams and delivered key features with quality and stringent deadlines in a fast-paced development environment.
•	Delivered technical presentation to the executives, departmental staff, and external partners.
•	Assumed an active role in fulfilling requirements and addressing technical concerns of Cisco’s acquisition due diligence team. Delivered rapid features and improvements to guarantee sub-milliseconds latencies.
•	Designed and implemented a novel multi-tier caching architecture that delivered SSD-like performance at a fraction of cost using hybrid nodes consisting of SSD-based cache and HDD-based capacity tier.
•	Designed and implemented a distributed ‘rebalance’ protocol that handled resource failure and addition in the caching tier gracefully with minimal performance or availability impact.
•	Developed a new metadata format that reduced the file system metadata overhead by 66%. This resulted in greater than 3X performance improvement for workloads with large working set. 
•	Designed and developed a prefetching algorithm that improved sequential read performance by 5X.
•	Wrote detailed technical documents including design specification, wikis, and onboarding procedures.
•	Led various innovations and patent initiatives resulting in the granting of eight patents.
•	Mentored engineers to quickly become productive during the rapid growth (5 to 125 in 4 years).

### Research Staff Member @ IBM Almaden Research Center
Jan 2007 – Jan 2012 | San Jose, CA
Research and Development in storage, systems and virtualization management. Automation, performance and resource management for large scale enterprise systems like SAN, NAS, cloud computing and Big data storage.

Conceived and led the architecture, design and implementation of multiple projects:
•	Built a scale-out storage for virtualized environment and worked with the Business Development team and VMware for a joint clustered storage solution partnership.
•	Cloud-based storage optimization service that reduces the cost of data storage by more than 20% via intelligent data placement, compression and deduplication.
•	Automation for continuous health checks, problem determination, provisioning and chargeback in the IBM storage cloud services.
•	Tools for automated planning and provisioning in heterogeneous and multi-vendor SAN arrays
•	Mentored interns and engineers on various projects; Presented to the customers, partners and executives, as well as at IEEE/ACM conferences.

### PhD, Computer Science @ Georgia Institute of Technology
Jan 2001 – Jan 2007 | Atlanta, GA
Thesis title: System Support for End-to-End Performance Management
Advisor: Prof. Karsten Schwan

### Summer Research Intern @ Hewlett-Packard Laboratories
Jan 2005 – Jan 2005 | Palo Alto, CA
Resource usage accounting in distributed systems; SLA enforcement with resource-aware scheduling in multi-tier enterprise applications

### Summer Research Intern @ Microsoft Research Lab
Jan 2003 – Jan 2003 | Cambridge, UK
Designed and implemented an online file system request tracking tool that matched low level system events to high level user’s requests for resource measurement and performance diagnosis.

### Summer internship @ Intel Corporation
Jan 2002 – Jan 2002 | Santa Clara, CA
Analyzed Linux kernel performance and bottlenecks for Intel’s Mckinley (Itanium II) platform using low level profiling tools like Vtune and Emon.


## Education
### PhD in Computer Science
Georgia Institute of Technology

### B. Tech. in Computer Science and Engineering
Indian Institute of Technology, Kharagpur


## Contact & Social
- LinkedIn: https://linkedin.com/in/agarwala
- Portfolio: http://researcher.ibm.com/person/us-sagarwala

---
Source: https://flows.cv/sandipagarwala
JSON Resume: https://flows.cv/sandipagarwala/resume.json
Last updated: 2026-04-12