# Shabari Vignesh

> AI & Data Engineer @ Swirepay | Ex-Peoplelens.ai, Capgemini | 4+ yrs in FinTech, Energy, and Enterprise Data | Actively Looking for Opportunities

Location: United States, United States
Profile: https://flows.cv/shabari

🌟 AI & Data Engineer | LLM/Agentic Systems Builder | Data-Driven Problem Solver

I’m Shabari Vignesh, an AI Engineer with 4+ years of experience building scalable data platforms and AI-driven systems that support real business decisions. I have worked across FinTech/merchant payments, enterprise retail analytics, energy (smart grid & IoT), and aviation operations, partnering with product and engineering teams to turn messy, high-volume data into reliable, usable insights.

I enjoy roles where the problem is ambiguous, the data is imperfect, and the output needs to be accurate, auditable, and production-ready. My recent work has focused on AWS-based data lakes, event-driven pipelines, and verification-first AI agents that generate grounded answers from curated datasets rather than “best guesses.”

🔍 Key Highlights
•Built an enterprise AWS S3 data lake ingesting from Shopify, Fiserv, and Clover, processing 10M+ records/day, improving data freshness to under ~20 minutes and reducing fragmentation by ~70%.
•Delivered a ChatGPT-style merchant Q&A agent using AWS Bedrock + LangChain, enforcing deterministic SQL / internal tool execution to keep responses grounded; improved query success from ~70% to 92%+.
•Implemented AI observability + guardrails (prompt/tool traceability, adversarial testing, confidence-based fallbacks), cutting AI-related support tickets by ~50%.
•Built near real-time pipelines with Kafka + Spark Streaming for smart-meter/grid signals and optimized Synapse/SQL Server warehouses (fact-dimension, partitioning/indexing) to bring queries down to single-digit seconds.
•Shipped production ETL workflows with Airflow + Python + SQL for airline and manufacturing finance datasets, adding data quality gates (freshness, uniqueness, control totals) to prevent silent failures.

💡 Core Skills
•AI / Agentic Systems: AWS Bedrock, LangChain, tool-use patterns, structured prompting, guardrails, adversarial testing, evaluation/traceability
•Data Engineering (Cloud): AWS (S3, Athena, Lambda, Step Functions, EventBridge, Glue), Azure (Data Factory, Synapse), Snowflake
•Pipelines & Orchestration: Airflow, Dagster, event-driven + batch ETL/ELT, incremental loads, reconciliation workflows
•Streaming & Compute: Kafka, Spark Streaming, Hadoop
•Languages & Analytics: Python (Pandas, PySpark, NumPy), SQL, Shell, Looker/Power BI/Tableau, dimensional modeling, query optimization

🤝 Let’s Connect
Email: vigneshshabari97@gmail.com
Phone: +1(408) 390-3347

## Work Experience
### AI Engineer @ Swirepay
Jan 2025 – Present | Santa Clara, California, United States
Project 1: Enterprise Data Lake & Merchant Data Platform (AWS) for single, reliable source of truth for merchant, payment, inventory, and settlement data
-> Built a centralized AWS S3 data lake consolidating data from 3+ external systems (Shopify, Fiserv, Clover), processing 10M+ transaction and inventory records per day.
-> Designed event-driven ingestion pipelines using EventBridge -> Step Functions -> multiple Lambda functions, enabling modular, fault-tolerant data collection workflows.
-> Reduced data fragmentation by ~70%, replacing siloed, system-specific reports with curated, analytics-ready datasets consumed across BI and AI use cases.
-> Initially stored ingestion outputs in JSONL format for flexibility, then migrated curated datasets to Parquet, reducing data scanned by ~70% and improving Athena query performance by ~4-6×

Project 2: AI-Powered Merchant Q&A System (ChatGPT-style Agent) to enable merchants to ask natural-language questions over financial data, receive accurate, grounded answers
->Delivered an agent-based conversational analytics system using AWS Bedrock and LangChain, supporting the 250+ enterprise and mid-market clients.
->Enabled self-service insights for daily sales, inventory planning, and settlement analysis, significantly reducing reliance on dashboards and ad-hoc reporting workflows.
->Designed the system to enforce deterministic SQL and internal API execution, ensuring responses were always grounded in curated data lake tables rather than model inference.

Project 3: AI & Data Observability, Guardrails, and Adversarial Testing to ensure AI-generated financial insights were safe, and auditable
->Built an AI observability layer tracking prompts, tool calls, data sources, and outputs, enabling full traceability for merchant-facing financial answers.
->Designed and executed adversarial testing frameworks using real merchant queries and edge cases, significantly reducing hallucinated or speculative responses before production rollout.

### Data Engineer @ PeopleLens
Jan 2024 – Jan 2024 | Cupertino, California, United States
->Built curated demo and evaluation datasets derived from Salesforce, Google Calendar, and Highspot schemas, used to train and validate internal models powering the demo experience, improving product iteration speed by ~30%.
->Automated ingestion and transformation workflows, reducing manual data preparation and ad-hoc analysis by ~50% for sales, product, and demo teams.
->Generated highly realistic synthetic Python-based sales and calendar data aligned to real Salesforce, Calendar, and Highspot structures, enabling safe, repeatable demos without exposing sensitive customer information.
->Removed demo dependencies on live or limited production data, ensuring consistent demo reliability and allowing sales reps to confidently run customer demos at any time.
->Built Looker dashboards visualizing sales engagement, platform usage, and rep activity, helping leadership quickly assess performance and demo effectiveness.
->Directly supported customer-facing demos and sales conversations, contributing to two new qualified customer opportunities during the internship

### Chief Coordinator - Alumni Relations @ Indian Students Organization - SJSU (ISO-SJSU)
Jan 2023 – Jan 2024 | San Jose, California, United States

### Associate Consultant - Data @ Capgemini
Jan 2021 – Jan 2023 | Bangalore Urban, Karnataka, India
Client: Ellevio (Electricity Distribution)
Ellevio manages large-scale grid and smart-meter data in Sweden to support grid monitoring, anomaly detection, and operational analytics. Operational and IoT data arrived at high volume from APIs and streaming sources, with quality issues and strict performance requirements for both analytics and near real-time monitoring.
->Built scalable ETL pipelines in Azure Data Factory with incremental loads, retries, and monitoring to ingest operational and meter data reliably.
->Implemented raw -> cleaned -> curated data layers, stabilizing downstream analytics and reducing data quality issues.
->Developed SQL transformations to standardize time and units, normalize device identifiers, deduplicate events, and aggregate data into reporting-friendly grains.
->Modeled and optimized Azure Synapse / SQL Server warehouses using fact-dimension design, partitioning, and indexing, cutting dashboard query times from timeouts to single-digit seconds.
->Implemented Kafka and Spark Streaming pipelines for near real-time grid signals, handling late and duplicate events to support faster operational awareness

### Associate IT Consultant - Data @ ITC Infotech
Jan 2019 – Jan 2021 | Bangalore Urban, Karnataka, India
Client:Qatar Airways
Qatar Airways operates large-scale airline booking and flight operations systems that generate high-volume, and globally distributed operational data. Booking, flight, and partner data arrived from multiple systems with inconsistent timestamps, duplicate updates, schema drift, and partial daily loads, making reporting unreliable
->Built production ETL pipelines using SQL, Python, and Airflow to ingest booking, ticketing, and flight operations data from multiple upstream systems.
->Standardized timestamps, time zones, and identifiers; cleaned status fields; and deduplicated records using composite keys with “latest update wins” logic.
->Joined bookings, flight schedules, and operational status into single reporting-ready tables, eliminating the need for analysts to query multiple systems.
->Orchestrated workflows with Airflow DAGs, implementing retries, dependencies, and alerting to prevent partial or silent data failures.

Client:British American Tobacco (BAT)
BAT’s Global Manufacturing Execution System (MES) supports production and finance workflows across plants and work centers worldwide. Manufacturing and finance data arrived late, incomplete, or corrected after initial load, and finance reporting could not tolerate partial or inconsistent data
->Built and maintained ETL pipelines for global MES and finance data, integrating plant, work center, shift, and production events into curated finance-ready datasets.
->Implemented incremental and backfill logic to handle late-arriving, corrected production records across time zones.
->Applied business rules in SQL to distinguish valid production, rework, and scrap, aggregated data at day/shift / work-center levels for finance reporting.
->Used Airflow validation gates (expected plant coverage, control totals, row-count thresholds) to fail pipelines early when data was incomplete.
->Automated reconciliation summaries with Python, making it easy to identify missing plants, dates, or mismatched totals.

### Software Developer @ TechCiti Technologies Private Limited
Jan 2019 – Jan 2019 | Bangalore Urban, Karnataka, India


## Education
### Master of Science - MS in Applied Data Science
San José State University

### Bachelor of Engineering - BE in Computer science
CMR Institute Of Technology


## Contact & Social
- LinkedIn: https://linkedin.com/in/shabari-vignesh

---
Source: https://flows.cv/shabari
JSON Resume: https://flows.cv/shabari/resume.json
Last updated: 2026-04-17