# Rishabh Jain

> Software Engineer at ByteDance | TikTok

Location: San Francisco Bay Area, United States
Profile: https://flows.cv/rishabhjain

Experienced Software Developer with 6+ years in building products in APIs and Cloud Security domain. Dedicated to developing, operating, and optimizing a robust data infrastructure, with a focus on site reliability and cloud managed platform, covering big data computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases.

## Work Experience
### Software Engineer - Data Infrastructure SRE @ TikTok
Jan 2024 – Present | San Jose, California, United States
- Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud 
 infrastructures, with a focus on site reliability and cloud managed platform, covering big data 
 computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases.
- Participate in and enhance the complete service lifecycle, from inception and design, through 
 development, capacity planning, launch reviews, deployment, operation, and refinement.
- Design and implement software platforms and monitoring frameworks to govern service-oriented 
 architecture (SOA) efficiently, automatically, and intelligently.
- Develop and manage components of cloud-managed data infrastructure, encompassing technologies 
 such as Kubernetes, Redis, MySQL, Flink, and more

### Software Engineer - Data Infrastructure SRE @ ByteDance
Jan 2024 – Present | San Jose, California, United States
- Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud 
 infrastructures, with a focus on site reliability and cloud managed platform, covering big data 
 computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases.
- Participate in and enhance the complete service lifecycle, from inception and design, through 
 development, capacity planning, launch reviews, deployment, operation, and refinement.
- Design and implement software platforms and monitoring frameworks to govern service-oriented 
 architecture (SOA) efficiently, automatically, and intelligently.
- Develop and manage components of cloud-managed data infrastructure, encompassing technologies 
 such as Kubernetes, Redis, MySQL, Flink, and more

### Senior Software Engineer (Site Reliability) @ Netskope
Jan 2020 – Jan 2024 | Santa Clara, California, United States
✧ Capacity Planning: Manually led capacity planning for multiple micro-services across multiple production stacks
    - Built capacity reporting tool to identify VMs/Hosts with overprovisioned CPUs, and memory on underlying KVM. Automation helped in fixing major performance bottlenecks and saving 30% in infrastructure costs
    - Led cross-team efforts on a tenant onboarding project, employing T-shirt sizing methodology, leading to streamlined capacity planning
✧ Monitoring & Alerting: Collaborated with cross-functional teams to understand complex application architectures and implement effective top-down monitoring strategies, resulting in improved service visibility, reduced MTTD, and proactive issue resolution
✧ Infrastructure & Automation: Developed IaaC libraries for provisioning and operating infrastructure at a massive scale using Terraform
    - Implemented Noname WAAF across Netskope to increase visibility to our web access firewall
✧ CI/CD: Enhanced existing Deployment Jenkins Pipelines to reduce overall deployment time from 12 to 3 hrs across multiple stacks
    - Implemented Spinnaker as CI/CD solution for faster release churn, rollbacks, and canary for k8s native supported infrastructure
✧ Onboarding: Led system designs and features to improve availability, scalability, latency, and efficiency of multiple microservices
    - Embedded with product teams to ensure that applications are production-ready, scalable, and reliable
    - Mentored newly onboarded team members on design principles, documentation efforts, troubleshooting production application services, and SRE best practices
    - Led incidents post-mortem to identify root cause, ensure remediation, and further identify measures to curb the future repetition of the issues
    - Introduced and streamlined processes for on-call and incident management

### Software Engineer @ Netskope
Jan 2019 – Jan 2020 | Santa Clara, California
✧ Monitoring & Alerting: Created service monitoring dashboards, actionable incident alerts, comprehensive Runbooks
✧ CI/CD: Developed ansible CD pipeline to deploy packages across multiple microservices, reducing deployment time from 20 to 12 hrs
✧ On-call: Worked on 12/7 production on-call  for a large fleet of hosts, monitoring host/app health, triaging/resolving errors on the application and host level, identifying and disabling faulty applications/features, leveraging SRE tools and automation, mitigating outages
   - Reviewed and approved PRDs for new services and managed new services as they were onboarded for SRE support
   - Built an automotive system to poll data from different SAAS apps and inject data into the production environment


## Education
### Master of Science - MS in Computer Science
University of Southern California

### Bachelor of Technology (B.Tech.) in Computer Science
Delhi University


## Contact & Social
- LinkedIn: https://linkedin.com/in/jainrishu95
- GitHub: https://github.com/jainrishu95

---
Source: https://flows.cv/rishabhjain
JSON Resume: https://flows.cv/rishabhjain/resume.json
Last updated: 2026-03-29