I am a 10+ year hands-on platform and backend engineer specializing in architecting, building, and scaling mission-critical distributed systems from scratch. I’ve built large-scale infrastructure supporting 100M+ concurrent users and 5B+ messages per day.
Experience
2022 — Now
2022 — Now
San Mateo County, California, United States
AI/ML Platform team at Roblox | Building SOTA LLM Safety Guardrails (RobloxGuard) | Reinforcement Learning (FAI-RL)
1. Developed RobloxGuard 1.0 Model — Roblox’s Open-Source, State-of-the-Art LLM Safety Guardrails
• Achieved SOTA performance, outperforming leading models such as Llama Guard, ShieldGemma, NVIDIA NeMo Guardrails, and even GPT-4o on key benchmarks
• Launched a Text Generation API powered by RobloxGuard 1.0
🔗 GitHub: https://github.com/Roblox/RobloxGuard-1.0
🔗 Paper: https://www.arxiv.org/abs/2512.05339
🔗 Text Generation API Announcement: https://devforum.roblox.com/t/beta-introducing-text-generation-api/3556520
2. Implemented & Open-Sourced FAI-RL — A Production-Ready Reinforcement Learning Framework for LLM Fine-Tuning
• Engineered a unified framework to support RL algorithms including DPO, PPO, GRPO, GSPO, and Supervised Fine-Tuning (SFT).
• Designed a highly extensible system with YAML-based configs and support for custom reward functions and dataset templates.
🔗 GitHub: https://github.com/Roblox/FAI-RL
3. Implemented a LLM Labeling Platform (Built from Scratch, Integrated with Label Studio)
• An automated data pipeline for LLM labeling tasks.
• Built a test case framework to streamline prompt engineering.
• Implemented a daily search quality evaluation.
• Functionality to sample and evaluate the quality of labeled data through human evaluators and LLMs.
• A fine-tuned CLIP model for image labeling, effectively addressing a cold start issue.
• Ability to retrieve internal datasets for use in a RAG system.
Impact:
• Created over 30 datasets with the help of 300 human evaluators.
• Trained more than 15 models across 5 different teams.
• Enhanced search quality by 2.2% through a fine-tuned CLIP model.
2018 — 2022
2018 — 2022
San Francisco Bay Area
Built and scaled a real-time, multi-country A/B testing platform handling 200K QPS, implementing monitoring, circuit breakers
1. Implemented an in-house A/B test platform based on Java/Scala/Spring/Spark and Kafka/Hive/AWS for more than 3 years.
• Implemented Circuit Breaker to detect problematic A/B tests within a few minutes from scratch based on Scala/Spark Streaming/Kafka/S3/Hive/Oozie/Yarn/Sqoop.
• Implemented Exploration mode to reduce result update wait times from up to 24 hours to around 10 minutes for a full result set update from scratch based on Clickhouse/ZooKeeper.
• Implemented query-based monitoring system and rule-based message generator from scratch based on Prometheus/Kotlin/MySQL.
• Created a new data pipeline for exposure details widget to detect major exposure logging issue as early as possible from scratch.
• Set up infra/batch/deployment/monitoring to expand the A/B test platform from one country to multiple countries.
2017 — 2018
2017 — 2018
Singapore
Carousell is a simple and easy way to buy and sell with anyone.
1. Implemented Search Service from monolithic towards microservices based on Go/Grpc/Protobuf/Envoy and Docker/Kubernetes/Elasticsearch.
• Implemented real-time item quality score, seller score for search ranking from scratch using Apache Beam/Redis/Kafka/Elasticsearch.
• Ingested user impressions into Elasticsearch and implemented simple random buckets to make A/B testing.
• Implemented boosting new seller’s items on the home page.
• Implemented a cache layer to reduce the number of Elasticsearch access for low latency and cost reduction.
2015 — 2017
2015 — 2017
South Korea
Built high-performance messaging and data streaming platforms, managing massive user messages and company-wide Kafka/RabbitMQ clusters.
1. Implemented Messaging Platform to handle massive messages for users based on Java/Vertx/Spring and Redis/Cassandra.
• Implemented user service to collect user activity logs from connected sessions using Vertx/WebSocket/Redis.
• Implemented inbox service to send and receive massive messages using Spring/Redis/Cassandra.
2. Provided company-wide Kafka cluster to aggregate data from different teams based on Kafka/RabbitMQ/Spark and Mesos/Marathon from scratch.
• Set up and maintained Kafka, RabbitMQ cluster for the entire company on AWS.
• Implemented cluster migration/data transmission between topics based on Spark Streaming/Mesos/Marathon.
2011 — 2015
2011 — 2015
South Korea
Built LINE Push Platform handling 100M+ concurrent users and 5B daily messages, designing high-throughput distributed messaging and Redis-based services.
1. Implemented LINE Push Platform for over 100M concurrent users and 5B messages per day from scratch based on Java/Netty/Spring and Redis/ZooKeeper/MySQL for 4 years.
• Implemented Session-Service to handle over 1M concurrent users per server using Netty.
• Implemented Message-Service to distribute messages for 5B messages per day using Spring/Redis.
• Implemented a high throughput, distributed Message Queue platform for asynchronous message -processing to support over 1M messages per second at peak time in front of API Gateway using Luxun.
• Implemented Service discovery to lookup Servers, support multiple regions, and failover for replacing load balancers.
• Implemented Redis cluster manager for high scalability and availability using Redis/ZooKeeper.
Education
Yonsei University