Machine Learning Engineer specializing in ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต, ๐ฑ๐ถ๐๐ฐ๐ผ๐๐ฒ๐ฟ๐, ๐ฎ๐ป๐ฑ ๐ฟ๐ฒ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐ฑ๐ฎ๐๐ถ๐ผ๐ป ๐๐๐๐๐ฒ๐บ๐, with experience building ๐ฟ๐ฎ๐ป๐ธ๐ถ๐ป๐ด ๐ฝ๐น๐ฎ๐๐ณ๐ผ๐ฟ๐บ๐, ๐ ๐ ๐ถ๐ป๐ณ๐ฟ๐ฎ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ, ๐ฎ๐ป๐ฑ ๐๐๐ -๐ฝ๐ผ๐๐ฒ๐ฟ๐ฒ๐ฑ ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐๐๐๐๐ฒ๐บ๐.
Experience
2022 โ Now
2022 โ Now
New York, NY
Worked on infrastructure powering fuboTVโs recommendation and personalization systems serving 1.5M+ subscribers, including ML ranking services, feature infrastructure, and distributed pipelines for candidate generation and recommendation scoring.
Helped build and scale fuboTVโs recommendation platform, transitioning from a heuristics-based request-time approach to a precomputed machine learning recommender by developing ranking services, feature infrastructure, and candidate generation pipelines supporting personalized content discovery.
Partnered with data science to productionize a LightGBM ranking model for LiveTV recommendations, powering the platformโs highest-traffic carousel and delivering personalized recommendations.
Led the architecture, development and introduction of sp-ranking-service, a FastAPI/Python-based ML ranking service into a Golang microservice ecosystem, establishing CI/CD pipelines and Kubernetes deployment patterns to productionize models.
Designed and implemented the teamโs first feature store, introducing an online/offline architecture with Bigtable for low-latency inference and BigQuery for scalable training pipelines.
Built Airflow-based recommendation precomputation pipelines generating predictions ahead of request time, improving latency by up to 20% across personalization services.
Redesigned the content representation and candidate generation pipeline to support multiple metadata sources, decoupling recommendation features from vendor-specific schemas and enabling recommendations for international content as the platform expanded into European markets.
Developed distributed recommendation feature pipelines using Scala, Scio, and Apache Beam on Google Cloud Dataflow, generating user profiles and recommendation features used by personalization systems.
Re-architected large-scale Scio data pipelines by replacing inefficient groupBy operations with aggregateByKey patterns and optimizing distributed joins as the platform scaled.
2020 โ 2022
2020 โ 2022
Boston, MA
o Functioned as an individual contributor reporting to the Director of Software Development working on projects across various teams building out internal tooling & backend services for data-intensive SAAS applications utilized by the top 25 biopharmaceutical companies for optimizing clinical trials (~$40B industry) through the novel use of Machine Learning
o Developed power analysis capabilities in Eureka Trial Optimizer, a flagship SAAS product, by extending the Flask backend statistical layer enabling end users to view power and sample size calculations on hypothetical clinical trials
o Architected & built a suite of microservices with REST endpoints for Eureka Digital Trial Solutions using Docker, FastAPI, AWS Redshift/RDS; these suite of microservices acted as a middleware layer between external vendors and Eurekaโs eScreening, eConsent & ePRO modules helping optimize clinical studies by identifying patients most likely to meet study criteria (site criteria optimization) and sites most likely to have patients for a trial (site selection optimization)
o Developed & tested a mission critical data pipeline in AWS Glue (PySpark/Redshift) to update OMOP compliant medical terminologies/vocabularies to an internal data model utilized by ConcertAI data products and the Eureka Foundation module
o Developed a data pipeline in AWS Glue (PySpark) utilizing OMOP ontologies that increased the codification and standardization rate of clientsโ electronic health record (EHR) data by upwards of 30-50%
2019 โ 2020
Boston, MA
o Functioned 50% as a technical lead for product data science including AB testing analysis, experimental design, analytics support for strategic business initiatives, project scoping & exploratory data analysis, and 50% as the sole analytics engineer for building out robust & efficient data pipelines for search data assets, a critical input for demand-based fleet planning
o Deployed Lifetime Value and RFM machine learning analytical models on Airflow for easier orchestration, better logging, monitoring and automation than previous deployment via AWS Lambda & EC2 instance that was always up and running
o Empowered & educated the analytics & data science team on utilizing dbt & Airflow for authoring & automating analytical modeling pipelines which created a multiplier effect & decreased turn around time by 50% without data engineering support
o Developed & automated search data pipeline utilizing tools such as Dask (structuring raw JSON on S3 into parquet files queryable by AWS Athena), dbt for transforming data on Redshift, and Airflow for orchestration & automation
o Optimized financial data mart ETL through incremental dbt logic and performance tuning Redshift by altering sort and distribution keys increasing query performance by ~20x for analytics team members identifying transactions with waivers
2018 โ 2019
2018 โ 2019
Boston, Massachusetts, United States
2017 โ 2018
2017 โ 2018
Greater Boston
o Member of the Business & Clinical Intelligence (BCI) team charged with end to end support, maintenance, & development of the data warehouse (DW) infrastructure, ETL, BI reporting and ad-hoc analysis capabilities of Ioraโs data assets
o Architected & redesigned the Medicare Risk Adjustment (MRA) Engine ETL from legacy PostgreDB to Amazon Redshift by rewriting procedural PL/pgSQL functions to set-based SQL which reduced processing time from 6 hours to 5 minutes
o Updated & added CMSโs logic for Risk Adjustment Processing System (RAPS) & Encounter Data Processing System (EDPS) in MRA Engine allowing for blended risk adjustment factor (RAF) calculations for each and every patient
o Redesigned & updated MRA Engineโs risk model logic, reference and mapping tables adding ETL flexibility to calculate the RAF for the entire patient population for any payment period
o Designed ETL pipeline to help the reconciliation process with CMS by identifying Medicare Advantage patients with unpaid (by CMS) Hierarchical Chronic Conditions (HCCs) that have claim diagnosis evidence in Ioraโs data warehouse
o Designed & implemented analytic mining queries to generate a clinical suspects reporting list (possible under coded patients based on clinical mining rules for depression, breast Cancer, CKD, DME Oxygen, Acute stroke, etc.)
o Redesigned & optimized legacy SQL which cut down query execution time by over 50% for quarterly KPIs & cost reporting analytics (admissions, ER visits, specialist visits, imaging visits, etc.) for Carpenters of MA (client) patients
o Designed a Python/Jupyter notebook analysis script on Carpentersโ clinical data (Cholesterol levels, BMI, etc.) creating charts and statistics for different measurements to showcase during annual meeting with sponsor
Education
Cornell University
Master of Engineering (MEng)
Stony Brook University
Bachelor of Engineering (B.E.)
ABRHS
Conant Elementary School