Experience
2023 — Now
Santa Clara, California, United States
Hands on expertience in K8s: cluster installation, upgrades, autoscaling, service creation, horizontal pod autoscaling, and effective issue troubleshooting.
Implemented an open-source Golang-based Kubernetes scheduler plugin, reducing GPU access wait times by 45%.
4+ years of experience working with Kubernetes operators and CRDs and integrating with Go frameworks, Prometheus, and gRPC. Created Go tools for observability metrics, gRPC API servers, and integrated Prometheus and Grafana for cluster metering and cost estimation.
Designed and implemented a distributed Prometheus system for a large service-oriented architecture, managing significant data. Developed a comprehensive platform-wide monitoring, alerting, logging, auditing, and notification solution. Collaborated with Opentelemetry for exporting logs, metrics, and traces.
2022 — 2022
Santa Clara, California, United States
Contributed to an existing scalable MLflow infrastructure to manage, track, and version machine learning models, resulting in improved model lifecycle management.
Designed and integrated custom authentication and authorization mechanisms in MLflow, enhancing security and access control for sensitive model data.
Managed Grafana, Prometheus, and Loki for K8s monitoring. Established CI/CD deployment for a customized MLFlow server, improving ML experiment tracking and data collection.
2014 — 2022
2014 — 2022
San Jose, California, United States
Lead monitoring efforts for IBM Cloud Pak for Data, a product that offers a wide selection of IBM and third-party services spanning the entire data lifecycle
DevOps Engineer for Predictive Customer Intelligence solution { PCI }
IBM Predictive Customer Intelligence personalizes the customer experience by making recommendations that are most relevant to each unique customer based on their buying behavior, web activity, social media presence and much more.
Architected, designed and implemented an automated end-to-end build, deploy and test solution for PCI. Predictive Customer Intelligence has a multi-node architecture stack combining different IBM solutions like Cognos (Business Intelligence), SPSS ( Analytics ), DB2, Websphere Application Server to better serve customer needs.
Wrote scripts and auxiliary programs for testing SPSS and Cognos solutions; developed using Jython and Groovy.
Developed system to monitor solution performance. Performed advanced debugging and analysis to improve system performance.
Designed and developed a build management system to generate solution artifacts using Maven / Java. Setup source and version control using IBM Rational Team Concert ( RTC )
Automated the entire build-deploy-test setup using Urbancode deploy/ Chef.
Responsible for developing Python and NodeJS-based applications to support asset migration between IBM Cognos tenants. Experience working with MEAN(Mongo, Express, Angular, NodeJS) stack.
Built performant systems working with hundreds of thousands of assets.
Experience working on CI/CD platforms with Maven, Chef, Urbancode Deploy. Proficient in RESTful web service development. Skilled in using task runners (e.g., GRUNT), application integration (ANT, Maven), and debugging with log4j.
Education
The University of Texas at Dallas
Masters
Fr. Conceicao Rodrigues College of Engineering