# Rishabh Jain > Software Engineer at ByteDance | TikTok Location: San Francisco Bay Area, United States Profile: https://flows.cv/rishabhjain Experienced Software Developer with 6+ years in building products in APIs and Cloud Security domain. Dedicated to developing, operating, and optimizing a robust data infrastructure, with a focus on site reliability and cloud managed platform, covering big data computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases. ## Work Experience ### Software Engineer - Data Infrastructure SRE @ TikTok Jan 2024 – Present | San Jose, California, United States - Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud infrastructures, with a focus on site reliability and cloud managed platform, covering big data computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases. - Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement. - Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently. - Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more ### Software Engineer - Data Infrastructure SRE @ ByteDance Jan 2024 – Present | San Jose, California, United States - Dedicated to developing, optimizing and overseeing one of the industry's most extensive cloud infrastructures, with a focus on site reliability and cloud managed platform, covering big data computing, orchestration, storage, AI/ML infra, non-SQL, and relational databases. - Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement. - Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently. - Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more ### Senior Software Engineer (Site Reliability) @ Netskope Jan 2020 – Jan 2024 | Santa Clara, California, United States ✧ Capacity Planning: Manually led capacity planning for multiple micro-services across multiple production stacks - Built capacity reporting tool to identify VMs/Hosts with overprovisioned CPUs, and memory on underlying KVM. Automation helped in fixing major performance bottlenecks and saving 30% in infrastructure costs - Led cross-team efforts on a tenant onboarding project, employing T-shirt sizing methodology, leading to streamlined capacity planning ✧ Monitoring & Alerting: Collaborated with cross-functional teams to understand complex application architectures and implement effective top-down monitoring strategies, resulting in improved service visibility, reduced MTTD, and proactive issue resolution ✧ Infrastructure & Automation: Developed IaaC libraries for provisioning and operating infrastructure at a massive scale using Terraform - Implemented Noname WAAF across Netskope to increase visibility to our web access firewall ✧ CI/CD: Enhanced existing Deployment Jenkins Pipelines to reduce overall deployment time from 12 to 3 hrs across multiple stacks - Implemented Spinnaker as CI/CD solution for faster release churn, rollbacks, and canary for k8s native supported infrastructure ✧ Onboarding: Led system designs and features to improve availability, scalability, latency, and efficiency of multiple microservices - Embedded with product teams to ensure that applications are production-ready, scalable, and reliable - Mentored newly onboarded team members on design principles, documentation efforts, troubleshooting production application services, and SRE best practices - Led incidents post-mortem to identify root cause, ensure remediation, and further identify measures to curb the future repetition of the issues - Introduced and streamlined processes for on-call and incident management ### Software Engineer @ Netskope Jan 2019 – Jan 2020 | Santa Clara, California ✧ Monitoring & Alerting: Created service monitoring dashboards, actionable incident alerts, comprehensive Runbooks ✧ CI/CD: Developed ansible CD pipeline to deploy packages across multiple microservices, reducing deployment time from 20 to 12 hrs ✧ On-call: Worked on 12/7 production on-call for a large fleet of hosts, monitoring host/app health, triaging/resolving errors on the application and host level, identifying and disabling faulty applications/features, leveraging SRE tools and automation, mitigating outages - Reviewed and approved PRDs for new services and managed new services as they were onboarded for SRE support - Built an automotive system to poll data from different SAAS apps and inject data into the production environment ## Education ### Master of Science - MS in Computer Science University of Southern California ### Bachelor of Technology (B.Tech.) in Computer Science Delhi University ## Contact & Social - LinkedIn: https://linkedin.com/in/jainrishu95 - GitHub: https://github.com/jainrishu95 --- Source: https://flows.cv/rishabhjain JSON Resume: https://flows.cv/rishabhjain/resume.json Last updated: 2026-03-29