# Sivaramakrishnan Subramanian > Perception @ Zoox | Computer Vision @ CMU, RI | Past: Waymo, OpenMined Location: San Francisco, California, United States Profile: https://flows.cv/sivaramakrishnan Working on large-scale foundational models to bring step function improvement to self-driving vehicles at Zoox. I was a graduate student studying Computer vision and Machine learning at the Robotics Institute, CMU. Last summer, I worked with Waymo's Perception and Research teams, working on large Vision Language Models to solve the long-tail edge scenarios. I'm broadly interested in problems at the heart of perception, image synthesis, multi-modal learning, and all things machine intelligence. Before grad school, I led the India DS team in the R&D Div. at App Orchid Inc., a Fast500 AI company in the utilities and energy industry. My work was skewed towards Document Representation Learning (DRL) and Semantic PDF understanding for financial doc cohorts i.e. extracting document metadata from legalese docs using DL & vision techniques. My previous work and publications run the gamut from electrical motor design and applied statistical analysis to industrial machine vision. Feel free to DM via Linkedin or krishnansr [dot] siva [at] gmail [dot] com ## Work Experience ### Software Engineer @ Zoox Jan 2024 – Present | San Francisco Bay Area Foundation models for autonomous driving | Zoox Perception Using multimodal ML to help cars perceive everything, everywhere, all at once. ### Graduate Teaching Assistant @ Machine Learning Department at CMU Jan 2023 – Jan 2023 | Pittsburgh, Pennsylvania, United States Fall 2023: - Returned as TA (Prof. Matthew Gormley and Prof. Henry Chai) for 10-601, CMU's flagship ML course offered by MLD. Writing practice problems and updating the latter sections of the course to reflect the current trends in literature. Enrolled by 487 students (one of the largest in CMU/SCS). Spring 2023: - Course TA for 10-601 (Prof. Matthew Gormley). Worked on exam/homework prep and automating MOSS plagiarism detection tasks. Includes holding course recitations and Office hours for 392 students. ### Perception Intern @ Waymo Jan 2023 – Jan 2023 | San Francisco Bay Area Worked on foundation models for autonomous driving, - Addressed object understanding problems in the perception long-tail by leveraging large foundational vision-language models (VLMs) to outperform production models - Implemented 3D Point Cloud Transformer (PCT) in JAX, achieving XX F1-score for mission-critical objects - Extended Google research’s internal VLM with PCT and implemented distributed multi-pod TPU fine-tuning of the fused 5.3B parameter model with T5x backend - Presented within the broader perception team and garnered positive feedback for the whole project. Also recognized as part of the 2023 Intern spotlight: https://youtu.be/QPVlRHZFZog?t=7 ### Graduate Research Assistant @ Carnegie Mellon University School of Computer Science Jan 2022 – Jan 2022 | Pittsburgh, Pennsylvania, United States - Investigated controllable-GAN pipelines for 3D scene representation of Cryo-EM cells in tomography images. - Mentored 2 interns on self-supervised domain adaptation for detecting cell organelle in 3D tomogram slices. ### Senior Data Scientist @ App Orchid Inc Jan 2022 – Jan 2022 | Greater Hyderabad Area • Led the India data science team of 12 members. • Applied ML to CLM problems in the legal tech space (more info below). • Engineered end-to-end ML flow cycle [data engineering - model training loop - serving deployment - online maintenance] using a microservices architecture. • As the resident CV expert, solved doc analysis and text typography research problems (similar to ones stated below) using hypothesis testing and weak supervision to handle OOD inference. ### Data Scientist @ App Orchid Inc Jan 2021 – Jan 2022 | Greater Hyderabad Area A selection of the CV problem descriptions for Document Representation Learning that I worked at AO: • Analyzing page layout : Created a LayNet model (yolo_v4/faster_rcnn arch) for this end with model quantization. Increased mAP by ~30% to 0.73. Reduced cuda runtime to ~60 millisec from initial caffe2 benchmark of ~2 sec using ablation studies. • Extracting document table structures : Designed and built a hybrid CV algorithm & TabNet model (cascade_mask_rcnn/efficient_det arch) solution for this end with 0.48 AP. Optimized model cost & CPU latency by 3x (~9 sec/page to ~2.5 sec/page). • Detecting author signatures : Designed a SignNet (custom arch with SPP pyramid pooling) model for extracting signature & related metadata. Deployed model with 0.93 precision at 34 fps. Some models are trained entirely on a synthetic dataset with out-of-distribution generalization on client data. These vision models are core competencies of ao-Vision and ContractAI, our flagship contract-analysis product offering targeted toward the legal & procurement sector. ### Associate Data Scientist @ App Orchid Inc Jan 2019 – Jan 2021 | Greater Hyderabad Area Semantic PDF understanding models/algorithms I worked on during this time: • Detecting document artifacts : Extracting header/footer; margins; image orientation from scanned images using CV & custom DL models. More robust and reduced latency by 4x-80x over prior deployed solutions. • Identifying document structure : Extracting section-clause tree relational hierarchy of legal docs using 2-level hierarchical clustering; includes extension to multi-column data using CV. It's a notoriously hard problem to generalize given the long-tail distribution of real-world buyer/supplier paper contracts. ### Project Engineer (CV) @ Soliton Technologies Jan 2018 – Jan 2019 | Greater Coimbatore Area 1. Machine Vision: • Developed real-time algorithm to detect & track particulate air voids in industrial glass rods (using images from GigE Vision cameras). • Final CV pipeline using KLT tracker, Delaunay triangulation had to meet speed (real-time) + accuracy + compute (on-prem NI DAQ) constraints. • Pipeline deployed in a ~$2MM pilot system and 3 replicas in North Am. 2. OCR: • Built & shipped a sparse text detection-recognition pipeline (with seeded segmentation) to extract text out of a sensory monitoring device's image display. • Redefined evaluation metric (WER → CER); the built custom classifier outperformed Google VisionAPI by 23% (0.781 CER) on client dataset; solution replaced benchmark recipe in production. • Business PoV: this was the core component of MAVIS - our automated V&V product offering targeted towards the medical tech sphere; it helped onboard 2/10 Fortune 500 medical device companies for the beta version of our IP. Responsible for end-to-end [data collection - training - model deployment] pipeline phases of my work while at Soliton. Involved in team-building and promotional activities across org as well. ### Teaching Assistant, Department of EE @ SSN College of Engineering Jan 2017 – Jan 2017 | Greater Chennai Area TA of Special Electrical Machines (EE6703) course taught by Prof. Nagarajan VS ### Product Development Intern @ Euro Process Automatik Jan 2017 – Jan 2017 | Chennai Area, India • Designed an Automated Guided Vehicle (AGV) to haul industrial payloads of weight 1-1.5 tonnes • Generated a binary occupancy grid of the AGV’s environment to calculate the shortest optimal path to set point for AGV • Utilized a SONAR rangefinder to detect dynamic obstacles and reroute the AGV after a preset time • Worked with ABB Variable Frequency Drives, PLC, CodeSys & LabVIEW ## Education ### Master's degree in Computer Vision Carnegie Mellon University ### Bachelors Degree in Electrical Engg SSN College of Engineering ### High School in HSC Zion Matriculation Higher Secondary School ### Secondary School in SSC Boaz Public School - India ## Contact & Social - LinkedIn: https://linkedin.com/in/sivaramakrishnan-subramanian - Portfolio: https://krishnansr.github.io --- Source: https://flows.cv/sivaramakrishnan JSON Resume: https://flows.cv/sivaramakrishnan/resume.json Last updated: 2026-03-29