Experience
2024 — Now
● Led architecture for high-scale creator and gaming platforms serving 50M+ global consumers, designing distributed, event-driven systems across Node.js (TypeScript) and Python (FastAPI, asyncio).
● Architected and scaled horizontally distributed microservices and Kafka-based streaming pipelines powering real-time telemetry and mission-critical financial transactions with strong consistency guarantees and resilience under burst traffic conditions.
● Re-architected GraphQL APIs and data access layers, achieving 200% performance gains through resolver optimization, request batching, caching strategies, and query cost governance.
● Engineered non-blocking I/O, clustering, and async concurrency controls to maximize throughput while sustaining low-latency SLAs across high-traffic endpoints.
● Designed streaming-first telemetry contracts and enrichment pipelines enabling AI/ML-ready workloads for behavioral analytics, fraud detection, and adaptive monetization.
● Improved platform reliability and cost efficiency through structured observability (metrics, tracing, logs) and increased mobile performance by 35% via deep React Native Hermes optimization.
2024 — Now
2024 — Now
Boston, MA
● Architected a distributed AI-native logistics platform (React Native + Kubernetes) supporting real-time dispatch and routing across thousands of concurrent delivery flows.
● Designed Kafka streaming pipelines processing 50K+ GPS events/min, enabling sub-second anomaly detection and improving route deviation accuracy by 28%.
● Built a RAG-powered operational intelligence layer that reduced manual escalations by 35% through contextual retrieval and exception automation.
● Implemented embedding pipelines and low-latency vector search (<120ms), powering similarity-based decision augmentation for routing and driver behavior analysis.
● Engineered a production-grade LLM orchestration layer (prompt routing, guardrails, deterministic fallbacks), increasing automated resolution rates from 42% to 71%.
● Optimized LLM inference costs via selective invocation and token control strategies, reducing token usage by 38% and lowering monthly AI infrastructure costs by 31%.
2021 — 2024
Boston, MA
● Co-defined architecture for AI voice assistant powered by Node.js, Python, and TensorFlow inference services.
● Designed secure integration layer between mobile clients, ML inference services, and core banking systems.
● Implemented streaming API interactions for low-latency inference workflows.
● Designed scalable microservice communication model across distributed domains.
● Reduced application footprint by 20% through modular build strategies and dependency rationalization.
● Led 22 engineers building enterprise UI platform with 80%+ test coverage.
2019 — 2020
Boston, MA
● Architected a Kubernetes-native Node.js platform handling 100M+ monthly API requests, enabling automated horizontal scaling and high-availability multi-tenant SaaS workloads.
● Led migration from REST polling to WebSocket streaming, reducing end-to-end latency by 50% and doubling real-time concurrency capacity.
● Optimized cloud infrastructure through container rightsizing, caching strategies, and batch processing, reducing operational costs by $67K/month.
● Designed event-driven data pipelines with BigQuery and implemented distributed OAuth2/JWT identity across services, establishing scalable retrieval and indexing patterns later leveraged for semantic and vector-based search architectures.
2016 — 2019
Boston, MA
● Designed OAuth2 + JWT authentication systems supporting 5M+ daily logins.
● Built Backend-for-Frontend layer consolidating multiple domain services.
● Developed 60+ REST services with resilience patterns and observability hooks.
● Improved fault isolation and error handling across distributed services.