15+ performance engineering and tools development experience in improving the resiliency and performance of small to large scale cloud and distributed platforms with performance benchmarking, scalability experiments, identifying bottlenecks with profiling, monitoring, Instrumentation tools and data science.
Experience
2023 — Now
San Francisco Bay Area
Architecture and design to scale Agentforce platform to build and deploy agent to help customers add AI capabilities in their products.
LLM and Model benchmarking with various LLM's and GPU's.
Build tools to benchmark LLM models find optimal cost to serve.
Build Mock platform to mock LLM responses and save millions of $ for internal development.
2019 — Now
2019 — Now
San Francisco Bay Area
Founder of jetmanlabs.com
Jetman is developer productivity platform. It enable development teams to design, build, mock and test API’s and ship products at faster pace with highest quality.
Everything from ground zero, product vision, architecture, design, full stack development in nodejs , javascript, HTML, CSS etc.
HIre and manage a team of 5 engineers to build supporting products.
Developed platform independent MacOS and Windows client for API development and testing.
Developed cloud platform and portal using microservices running in Google Cloud (GCP)
2022 — 2023
San Francisco Bay Area
Leading the effort to scale and improve end to end user experience latency across multi tier AWS based cloud platform of ClickUp CRM product.
Architect and implement distributed multi user API and User experience benchmarking frameworks to run in perf and production environment to ensure services and end user latency and resiliency are with SLA with release or cloud technologies changes.
Design and Implement Test Data Generation Frameworks for BE and AI use cases for OpenAI and CRm data applications.
Monitor and actionable trend analysis of production services and infrastructure health as new features or customer usage patterns changes.
Instrument and profile end user latency issues across front end, services and AWS cloud infrastructure.
Shorten the feedback cycle for services latency and impact with Left Shift Performance with CI/CD.
Infrastructure capacity planning for DB (Postgres, Elasticsearch, ECS, Redis,) and other app run time services running on ecs by understanding user traffic pattern, time of use and mapping to infrastructure resource usage.
RCA and MTTR of production performance/scalability issues, and work closely with development teams to make improvements.
Architect to instrumentation and collect perf metrics/signals for user page load time measurements and opportunities to improve loading time.
Perf Tools:
Jmeter, Loadrunner, Locust, Playwright/Puppeteer and added multi user support.
APM/Observability:
DataDog, New Relic, Appdynamics, Graffana, YourKit, JVM Tunning, ELK
Programming languages:
NodeJS, Javascript, Java, Python
Technologies and Cloud
EC2, SSD, ECS, Docker, NodeJS, JVM, Cassandra, Postgres, AWS, Redis, Kafka, RabbitMQ, ElasticSearch, Kibana
2021 — 2022
San Francisco Bay Area
Improved end to end user experience of AI and Natural language powered Einstein Search Platform by identifying the scale and areas of optimization by measuring the breakdown of the latency spent from UI interaction to last hop.
Design and Implemented new capability to measure real user page load time and intercept the traffic and time spent on various components and sent to metrics store for waterfall breakdown comparison.
Act as Product Owner to prioritize/align performance short term and long term goals.
Mentor and manage small team.
Patent on UI load generation and waterfall based analytics user experience platform.
Custom Instrumentation across client and API tier for latency breakdown.
BE and UI performance load and metric collection generation framework for real time analysis and regression detection.
Built system crawler to watch and collect system health and custom metrics of 10's of thousands of DB and Runtime production nodes for system health and later used the data for capacity planning and estimating private Datacenter cap add.
SRE for monitoring, debugging and RCA of production services in private and public cloud.
Root cause performance/scalability issues and work closely with development teams to make improvements.
Technology Stack:- Oracle, Solr, Nodejs, Postgres, AWS, Private Cloud, Java, Spring, Angular JS
APM: Appdynamics, New Relic, VisualVM, YourKit, Splunk
Programming: Python,Java, NodeJs, Javascript, JQuery/HTML
Cloud: Loadbalancer, AWS, Elasticsearch, Kibana, Splunk, Redis, Auto scaling, ecs, Docker
2017 — 2021
2017 — 2021
San Francisco Bay Area
Leading Salesforce Einstein Search performance initiative: Improving user experience, scaling to millions queries/day.
Architect and design backend and UI perf instrumentation framework to capture performance metrics.
Managing and leading Observability platform to accurately measure, waterfall breakdown of request/webpage roundtrip. Exciting part is you can run actionable analytics for trending, drill-down the cause of slowness and get insight to find opportunity to optimize.
Education
Punjab Technical University