# Lingnan Liu > Staff Software Engineer at Confluent Location: San Jose, California, United States Profile: https://flows.cv/lingnanliu Experienced software engineer who has a passion on Software Backend Development and Distributed Systems. Currently I am working at Confluent, involved in the development of transforming Apache Kafka into a Cloud-native service. I have strong hands-on experience on: Software Development: Java, Python, HTML, CSS, JavaScript, Bash, Software Design Patterns Cloud Technologies: AWS (S3, DynamoDB, ECS, Lambda, Step Functions, CloudWatch) Containerization: Kubernetes, ECS, Docker ## Work Experience ### Staff Software Engineer @ Confluent Jan 2022 – Present | San Francisco Bay Area Cloud-Native Kafka Transformation and Engineering Leadership Spearheading the transformation of Kafka into a cloud-native service, with a focus on Kafka cluster load balancing, multi-tenancy support, quota enforcement, and comprehensive observability. Team Lead – Workload Rebalancing: Led a team of three engineers in developing a dynamic workload rebalancing algorithm across Kafka broker cells. This initiative reduced mean time to recovery (MTTR) for imbalance issues from one week to one day. Enhanced cluster observability by surfacing critical metrics such as CPU usage, topic replica distribution, and network traffic to provide clear insights into cluster balance. Lead Engineer – Broker Visibility Metrics: Delivered a suite of observability enhancements including: Client metadata (Kafka client version and software name) Hot partition detection (identifying partitions consuming >80% of broker resources) Metrics for deprecated Kafka client requests These metrics significantly improved issue diagnostics and cluster health monitoring. Lead Engineer – Compute Offload: Designed and implemented the Compute Offload framework, enabling execution of stateless functions within Kafka Produce/Fetch request workflows. This unlocked use cases such as sensitive data masking and high-throughput schema validation. Led architecture design and seamless integration of custom function execution into core Kafka request handling. Core Contributor – Incremental Rebalancing: Played a key role in designing and implementing an incremental workload rebalancing algorithm, significantly improving balance in large-scale, multi-tenant Kafka clusters (100+ brokers). Results included a reduction in p99 end-to-end request latency from 400ms to 20ms and a >90% drop in customer escalations related to imbalance. Mentorship & Code Quality: Actively mentored junior engineers, providing guidance on project design and code reviews to ensure quality, maintainability, and alignment with system goals. ### Software Development Engineer II @ Amazon Jan 2019 – Jan 2022 | San Francisco Bay Area - Designed and developed Physical Device Benchmarking, allowing ASR scientists to benchmark language models on real, managed Alexa devices. Physical device benchmarking is a key step in fully-automated ASR model release workflow. - Developed Alexa Device Provisioning workflow, enabling ASR scientists to reserve and configure Alexa devices with various firmware and model revision in one-click. - Developed Audio Streaming capability to the underlying device directly from benchmarking services. - Set up the device lab in office and developed device agent running on the lab hosts to encapsulate the complexity of managing and performing health checks on devices - Mentored 3 engineers on career development, solving ambiguous problems and AWS technologies - Took part in the decisions in team's feature request intake process and ticket resolution categories ### Software Development Engineer II @ Amazon Lab126 Jan 2019 – Jan 2019 | San Francisco Bay Area • Owned the Athena Utterance Paraphrasing Workflow (AUPW) backend and frontend. This system serves engineers as a reliable way to generate paraphrases with given utterances and compare their and their corresponding answers' similarity. I build the entire system from backend to frontend with Java, Python, HTML, CSS and JavaScript. • Delivered the Athena Portal Dashboard. This dashboard would show stakeholders the trend of their scheduled test run with configurable metrics. The user can create, modify and delete a graph on the dashboard with simple operations. Athena Portal Dashboard followed the design pattern of Single-page Application (SAP). Its backend is supported by Java, DynamoDB and Amazon S3. Its frontend is built by HTML, CSS and JavaScript. • Trusted as a troubleshooter and mentor for the new teammates in my team to jump start them to our development and test procedure. ### Software Development Engineer I @ Amazon Lab126 Jan 2016 – Jan 2019 | San Francisco Bay Area • Built a scalable test solution for French spellings and definitions sourced from Synapse dictionary, and extended it such that we can use the same framework for other locales. Definitions QSR for fr-FR locale, post dictionary ingestion as well as good quality testing, increased definitions QSR by 38%. • Built the database access layer to query FUD database for the frequently asked questions, which will be used by multiple parts of Athena eco-system. • Built the Utterance Replay Service (URS) that will allow engineers to replay questions in test environments to increase our test utterance coverage. • Built Athena Device Testing Service that will allow us to handle automation requests for testing screens of multimedia devices. • Athena Service didn't have critical metrics available, which made it hard to monitor the service's status. I made the necessary changes to our code base and added the metrics page which has been very useful to track the load on Athena service. • I worked with Arts & Entertainment team to build a intent analyzing tool so that the team have a clean way to fetch, store and compare DCQS intents and NLU intents in a highly configurable way. ### Software Development Engineer @ Turbonomic Jan 2015 – Jan 2016 | Greater New York City Area Developing and testing a complicated, market-based data center monitor and control software - VMTurbo Operations Manager. I built an automated testing framework based on Robot Framework for the Operations Manager. This automated testing framework is meant to help System Test Engineers on the regression tests and hopefully, the System Test Engineers will only need to test the new features instead of spending large amount of time on regression testing. In the meantime, I also take part in the Development and manual QA process of the Operations Manager to familiarize myself with the features of the product and come up with the ideas about automated testing. ### Research Assistant - ASIC Verification @ Cornell Computer System Laboratory Jan 2013 – Jan 2014 Verified 79 ARM and Thumb instructions supported by an in-house pre-designed single cycle processor. Redesigned and tested 8 bugged instructions. Finished a detailed document of the entire instruction set. Proposed three major improvement advices (Pipelining, Superscalar and OOO) with their design sketches to the technical group for further development stages. ## Education ### Master's Degree in Electrical and Computer Engineering Cornell University ### Bachelor's Degree in Electrical and Electronics Engineering South China University of Technology ## Contact & Social - LinkedIn: https://linkedin.com/in/lingnan --- Source: https://flows.cv/lingnanliu JSON Resume: https://flows.cv/lingnanliu/resume.json Last updated: 2026-04-12