A proficient software professional skilled in leading teams and projects, executing agile software development lifecycles, implementing SOA, Data/AI/ML OPS, CICD, SRE/DevOps solutions, and delivering cloud-based services.
Experience
2024 — Now
2024 — Now
United States
Develop enterprise AIML application and infrastructure to bring autonomous IT to large companies.
2011 — Now
Sunnyvale
Manage a GPU cluster with Nvidia A100 in private cloud for ML model training with MIG, Triton, TensorFlow, Kubernetes, Kubeflow, and automated pipelines for data storage and model serving. Deploy Nvidia TensorFlow container with optimized GPU acceleration and performance improvement in model training and serving.
Launch Xfinity notification service for TV viewing personalization with a suite of micro services and native AWS resources such as ALB, Route53, ECS, Aurora, Lambda, ElastiCache, SQS, SNS, S3, Kinesis, and CodePipeline. Architect scalable platform for broadcast notifications, data analytics pipelines, A/B testing, and service observability.
Create hybrid cloud CICD pipelines using Jenkins, Concourse, CodePipeline, Terraform, CloudFormation, ECR, and S3 that integrate legacy in-house software release tools with AWS native tools for release process automation, reliability, and security. Deploy Nvidia Triton, Kubeflow, Kubernetes, ZooKeeper, ElastiCache, Aurora MySql, Kinesis, SQS, SNS, and Azkaban for cloud based solutions with enhanced customization, resiliency, and scalability.
2007 — 2011
2007 — 2011
Manager of software build and release engineering team for Advertising Product Group in Santa Clara. Implement Software Configuration Management (SCM) process and automation for products competing with Google's AdSense and AdWords. Automate software build and release using Hudson, Maven, gmake, cppunit, junit, and other tools. Work with multiple teams in Ads Science, Ads Selection, Ads Serving, Ads Creatives, and Ads Analytics on integrated release tracking across multiple tools (e.g., SVN code repository, bug tracking system, deployment configuration management system, change management system). Automate software releases for offline data processing on grid running Hadoop. Set up servers running Zookeeper and Memcached for application deployment testing. Work in cross function teams in release planning (sprints, resources, capex), release sign off, product launch, and postmortems.
2005 — 2007
2005 — 2007
Take a newly created position as Service Engineering (SE) lead for the Membership (user registration and login) services. Complete migration of global service operation responsibilities from the development team to newly created SE teams in US and Bangalore. Main accomplishments include production roll out of anti-phishing protection with personalized sign-in seals, upgrading servers world-wide to support SSL (https), migrating services across data centers, revamping load balancing and fail over configurations, implementing Service Level Agreement (SLA) monitoring using in-house and open source tools. Contribute in company-wide initiatives in SLA improvement, postmortems, and capacity planning. Improve the quality of service for the Membership services dramatically.
Education
University of Southern California
Ph.D
University of Southern California