# Ibrahim Khan > Building better stock portfolios Location: San Francisco, California, United States Profile: https://flows.cv/ibrahimkhan I’m an engineer working on large-scale AI and compute infrastructure, focused on how scarce resources get allocated efficiently at global scale. I’ve spent time building reliability and observability systems for YouTube and now work on dynamic scheduling systems that power major AI and ML workloads across Google. University of Maryland alum. Computer Engineering by training. ## Work Experience ### Co-Founder & CEO @ Stealth AI Startup Jan 2025 – Present | San Francisco Bay Area building better portfolios ### Software Engineer @ Google Jan 2022 – Jan 2026 Orchestrating dynamic scheduling and capacity management for Google’s flagship AI prodcuts, including Gemini, DeepMind, Waymo, Ads and YouTube. • Built a Google wide adopted eval framework for analyzing LLM accuracy • Designed and implemented an automated chip allocation system, reducing capacity provisioning time from several days to under 5 minutes. • Spearheaded the GQM Debugging Agent, enhancing customer satisfaction through self-resolution of scheduling conflicts. • Implemented critical observability and failure detection for the Workload Management Service managing 1.5M resources and Quota Management Service (700k+ owned capacity units), ensuring system resilience for Google's largest AI workloads. • Supported the development of our service’s response to power capping events in datacenters ### Software Engineer @ YouTube Jan 2022 – Jan 2024 YouTube Core Reliability Shorts Reliability • Instrumented 17 error logs with context to track Android Shorts watch failures • Triaged and fixed top-impacting issues, improving global Shorts Watch Time by X% Client Error Logging • Identified logging inconsistencies across YouTube clients • Led standardization effort (adopted by 16 teams) for uniform metadata • Mapped errors to UX flows, improving triage speed • Reclassified error severities, reducing metric noise by 75% • Added real-time signals for YT Music & YTTV, detecting 5 Major to Huge outages over 4 months • Enabled pre-prod error detection to block regressions before launch Stuck RPC Monitoring • Built metric to track stuck unary/streaming RPCs • Created dashboards, alerting, and mitigation playbook for OnCall teams Monitoring Consoles Migration • Migrated observability from legacy internal tool to a new platform Load Balancer CPU Optimization • Increased CPU limits on YT’s frontend load balancers, saving ~2 SWE/year Degradation Monitoring • Added monitoring for optional dependencies returning degraded yet successful responses • Focused on revenue and UX-critical paths in YouTube’s frontend service ### Student Software Engineer @ AFFORDABLE Jan 2022 – Jan 2022 Worked on a Service that connects Medial Donors to Patients in need of monetary assistance for their medical bills. We do this by providing donors accurate Public Electronic Health Records (EHRs) from Hospitals available through EPIC Systems ### Resident Assistant @ University of Maryland Jan 2021 – Jan 2022 | College Park, Maryland, United States - Fostered an inclusive community among 2200 residents, maximizing positive resident interactions - Mediated and provided conflict resolution in a professional and timely manner ### Undergraduate Research Assistant @ University of Maryland Jan 2021 – Jan 2021 | College Park, Maryland, United States Research Assistant under Professor Udaya Shankar. I defined a service that builds on the popular 2 Phase commit (2PC) Database Concurrency Control Protocol. The protocol processes database transactions reliably and atomically in a distributed environment. Dealt with a single point of failure scenario where the coordinator node crashes. I started with a stub implementation followed by a service tester for testing applications and implementations. ### Software Engineering Intern @ Aramco Jan 2021 – Jan 2021 | Houston, Texas, United States ### Software Engineering Intern @ University of Maryland Jan 2020 – Jan 2020 | College Park, Maryland, United States Programmed, designed and deployed an Agile Web App that enabled students to interact with lab equipment remotely. The project was initiated in response to the COVID-19 pandemic. The front of the Web Application was developed using the React framework and the backend was developed using Node.js. The wireframe and UI/UX of the Web App were designed using Figma. The user database and sessions were maintained using MongoDB. Finally, the Web Application was deployed on Heroku. The Web App is defunct as labs are no longer online ### Private Tutor @ MyPrivateTutor Jan 2019 – Jan 2020 Taught Mathematics, Physics, English, and Programming subjects to students online and in-person. While most of my students I tutored were College Students, some of them were High School students that needed help with Standardized Tests like SAT and IELTS. I also tutored College students on Algorithms and Object Oriented Programming. ### Technology Intern @ National Skill Development Corporation Jan 2018 – Jan 2019 | New Delhi Area, India Data analysis and visualization of employment rate, literacy rate and dropout rate for each state. Core data provided by National Commission on Statistics. In response to the findings, I also led presentations focused on familiarizing High School dropouts about employment opportunities in the Electronics Sector of India. I was tasked with reaching out to new schools to expand the organizations’ reach to High or Middle School Dropouts. Lastly, I initiated and worked on the idea of establishing an online platform for educating aspiring candidates. ## Education ### Bachelor of Engineering - BE in Computer Engineering University of Maryland Jan 2017 – Jan 2022 ### High School Manarat Al-Riyadh School ## Contact & Social - LinkedIn: https://linkedin.com/in/ibrahim-khan-632b2714a - Website: https://ibrahimkkhan.github.io/ --- Source: https://flows.cv/ibrahimkhan JSON Resume: https://flows.cv/ibrahimkhan/resume.json Last updated: 2026-03-22