2024 — NowGoogleSenior Software Engineer

2024 — Now

Boston, Massachusetts, United States

2022 — 2024GoogleSoftware Engineer - Resource Efficiency Data Science

2022 — 2024

Boston, Massachusetts, United States

2020 — 2022AmazonSoftware Development Engineer - Alexa Edge Machine Learning

2020 — 2022

Boston, Massachusetts, United States

2018 — 2020Engine MLSoftware Engineer / Resident Deep Learning Expert

2018 — 2020

San Francisco Bay Area

I worked with an impressive team of engineers to build a GPU-enabled distributed deep learning compute and experiment tracking platform. I mentored our clients’ ML and perception teams, helping to integrate their models and build pipelines with our platform and educating them on best practices as they transitioned from training on one to many GPUs. Using the data from these interactions, I worked with our team to design and program new products and features.

Deep Learning Writings and Presentations:

Led research showing how layer-wise optimizers (e.g. LAMB) can train object detectors (e.g. Mask-RCNN) with large batch sizes in a fraction of the time without performance degradation. Results can be found on our company blog at https://bit.ly/35gfM0P.

Built a cat detector that was trained live in 5 minutes on 64 GPUs at VentureBeat Transform 2019 using a TensorFlow implementation of RetinaNet. The demonstration by our CEO can be found at https://bit.ly/2YdMbnr.

Software Development Examples:

Led the design and programming effort of our local offering that allowed users to run deep learning experiments on their own hardware and compare the results in the Engine Dashboard alongside their cloud jobs. The product tracked and persisted code changes, logs, outputs, model performance metrics, system utilization metrics, and dataset metadata. Technologies include: Kotlin, Python, NGINX, PostgreSQL, Hasura, GraphQL, InfluxDB, Elasticsearch.

Designed and programmed an email alerting service that notified users when their experiments entered a terminal state. Technologies include: Kubernetes, Docker, Prometheus, PromQL, Python.

Designed and programmed a feature to pre-fetch training data from S3 buckets, storing it in an in-memory read-through cache using Alluxio and Alluxio’s FUSE-based POSIX API, resulting in up to a 5x speedup when reading a remote file.

2016 — 2018Self EmployedIndependent Machine Learning Researcher

2016 — 2018

I studied to become an ML expert by taking online courses, replicating papers, and learning by doing. Sample projects include:

“Top Contender” in The Lyft Perception Challenge 2018, a semantic segmentation competition, using a tweaked version of Google’s DeepLabV3 with ResNet-152 as the backbone (https://github.com/sathomas2/Lyft_Perception_Challenge).

Designed and integrated perception, behavior planning, trajectory generation, and controller modules so Udacity’s driverless car could safely navigate a road with traffic lights (https://github.com/sathomas2/CarND-Capstone-Solution).