Experience
2026 — Now
2026 — Now
Greater New York City Area
2024 — 2025
2024 — 2025
Greater New York City Area
Staff Engineer in Datadog’s Infrastructure organization, focused on global networking, large-scale distributed systems, and reliability engineering. My work spans across Infrastructure, ensuring Datadog remains highly available, scalable, and ready for future growth.
• Led the design and delivery of core platform capabilities, from deployment tooling and certificate automation to private connectivity and resilient edge infrastructure, supporting Datadog’s rapid global expansion.
• Drove multi-region failover approaches and a repeatable, fast path for bringing new datacenters online, including enablement for regulated environments such as FedRAMP High and IL5.
• Collaborated with product engineering teams, including but not limited to On Call, MCP, and Status Pages, to ensure new services launched reliably on top of core infrastructure.
• Built cross-organizational practices that strengthened operational readiness and accelerated incident response.
• Managed technical execution with cloud and network service providers to deliver secure, cost-efficient connectivity and accelerate Datadog’s infrastructure roadmap.
2021 — 2024
2021 — 2024
Greater New York City Area
Senior Software Engineer building Datadog’s global edge and networking platform.
2019 — 2020
2019 — 2020
Greater New York City Area
Site Reliability Engineer at Dropbox focused on large-scale fleet lifecycle management, automation, and deployment orchestration across Dropbox’s hybrid cloud infrastructure.
• Core contributor and member of the Fleet Management team responsible for the lifecycle,
allocation, and OS installation of Dropbox’s on-premise fleet consisting of over 75,000 bare metal hosts.
• Designed and implemented an image-based OS installer compromised of several microservices which integrated with Dropbox's deployment infrastructure stack. The new installer dropped compute host p95 provisioning time to just 10 minutes which allowed for more rapid and reliable deployment of servers into production.
• Designed and implemented an OS image building framework that abstracts the customization of a target image using a simple YAML based configuration syntax. As a result, new bare metal images were able to be created quickly without introducing redundant configuration and technical debt into the codebase.
• Constructed a catalog of user-friendly Grafana dashboards for foundational services which increased the operability of the team’s stack by allowing on-call to quickly identify and troubleshoot issues.
• Lead weekly operational meetings to systematically review incidents, user reported issues, and alerts to identify regressions and improve on-call health for the team.
2016 — 2019
Greater New York City Area
Site Reliability Engineer supporting Palantir’s most impactful customer deployments in sensitive, self-hosted environments. Built and operated in-house monitoring and automation systems to improve observability and reliability at scale. Played hands-on roles in environments where uptime and trust were critical, working directly with customer IT organizations to ensure secure and resilient operations. Contributed to both open source and internal platforms while driving adoption of modern reliability practices across distributed infrastructure.
• Provided infrastructure and systems support for large on-prem deployments, including server racking/cabling, OS installation and tuning, system administration, troubleshooting, and incident management in collaboration with customer IT teams.
• Designed and implemented an in-house monitoring platform to replace Nagios with Prometheus across multiple sites, enabling deeper visibility into host and service health.
• Built a Go-based health agent to surface critical hardware metrics from HP and Dell servers.
• Developed a RESTful API for programmatic recording/alerting rule management in Prometheus.
• Conducted code reviews and mentored engineers as they ramped up on Go.
• Contributed upstream improvements to Prometheus and Grafana.
• Core contributor to Palantir’s migration to Kubernetes and AWS for cloud-based platform deployments.
• Implemented Datadog monitoring and real-time dashboards for cluster operations.
• Reduced toil and improved resiliency of host bootstrapping via systemd-driven self-healing automation.
Education
University at Buffalo