# Ahmad Sharif > Software Engineer Location: New York, New York, United States Profile: https://flows.cv/ahmadsharif I do have authorization to work in the US legally and do not require sponsorship. I am a Staff Software Engineer at Meta. Experienced in building large-scale software systems to solve problems at scale. I love writing code, performance debugging, improving throughput and decreasing latency and delighting customers with my work. I hope to work with a team full of builders who are enthusiastic about building products. I value: 1. Technology/project: the project should be interesting and the technology should be cutting-edge. 2. Impact: there should be real users using the product. I am not here to do research -- I have limited time in this world and I want my work to be used by real users. 3. Execution speed: low bureaucracy/red-tape/overhead. 4. Fair compensation: goes hand-in-hand with impact. If I can leverage my skills to create value for millions of customers, I deserve a small share of the value I create in this world. 5. Culture: lean, fast-moving teams, who are passionate about delighting customers. Ain't no one got the time for optics or politics or corp-speak. I love scientific integrity (always speak the truth, even if it looks bad) and low ego (customers don't care about your ego). I am open to all roles including IC or TL/TLM. Give me resources and a problem to solve and I will work hard until I crack it! Please read my full profile below before contacting me. ## Work Experience ### Software Engineer @ Meta Jan 2023 – Present | New York, New York, United States AI Infrastructure Improving GPU utilization fleet-wide for Meta by improving frequently used CUDA kernels Pytorch Domains Media decoding, transforms and preproc libraries for ML workloads Performance analysis and optimization C++, Python, CUDA, Nvdec ### Software Engineer @ Google Jan 2010 – Jan 2023 | New York, New York, United States Projects in reverse-chronological order: 1. Time series data ingestion and serving at scale for searches with commercial intent. Think of near-realtime data like stocks or currencies, etc. that needs to be ingested and served to hundreds of millions of users worldwide. 2. Structured search and indexing on Bigtable and Spanner. Think of a corpus like Google Drive (petabytes of data). Think of searching for a token within that corpus. Except the search should only find documents that you (the searcher) has access to. Documents can be granted access directly to a user or through a group (groups can contain other groups). We use Zanzibar [1], Google's planet-scale authorization system and build token and partitioning systems on top of that. We ended up reduce latency by double digit percentage while keeping serving costs minimal. 3. ChromeOS profile-guided optimization using compiler techniques. Chrome is one of the most widely used apps in the world. We profiled it using sample-based, extremely low-overhead tools that are the same that we use to profile workloads in Google Datacenters [2]. These tools were used to profile real users data in the wild who were using Chrome on ChromeOS. This data (which is not user data, but instruction pointer data) is then anonymized and sent over to Google, only for opt-in users, being careful to respect their privacy settings. We then symbolize billions of samples using internal symbol servers and feed that data back into the compiler. The result is double digit performance improvement for Chrome. [1] https://research.google/pubs/pub48190/ [2] https://research.google/pubs/pub36575/ ### Systems Engineer @ Qualcomm Jan 2009 – Jan 2010 Write kernels that ran on GPUs for image denoising, graphics shading, etc. These were used to do low-level (instruction or cycle-level) performance analysis to guide the architecture of the next generation mobile GPU. Wrote many kernels in OpenCL and Cuda. Tuned the kernels for maximum occupancy and throughput. Predicted the performance of kernels using ML and other models. ### GPU Architect @ NVIDIA Jan 2008 – Jan 2008 Worked on micro-architecture performance analysis. The specific project was to work on the MMU/TLB simulation for the Fermi GPU architecture. ### Performance Architect @ Intel Jan 2007 – Jan 2007 Captured GPU and CPU workloads for replay and performance analysis. Did very low level (instruction and cycle-level) performance simulation and analysis for the next-generation CPU and GPU architectures. Predicted performance for GPU kernels and also tuned workloads to perform well on Intel next generation architectures. ## Education ### Electrical Engineering and Computer Science Georgia Institute of Technology ## Contact & Social - LinkedIn: https://linkedin.com/in/ahmadsharif --- Source: https://flows.cv/ahmadsharif JSON Resume: https://flows.cv/ahmadsharif/resume.json Last updated: 2026-04-05