# Ian Dai > Principal Software Engineer @ Roche | Master's in Computer Science Location: San Jose, California, United States Profile: https://flows.cv/iandai Rich experiences in IT industry, including Micro Services Architecture and Multi-Tiers Architecture, strong in performance improvement, system monitoring, site observability and quality assurance, strong in RDBMS technology, data modeling, configuration, management and application coding. Exploring the latest IT technologies including Cloud Computing, AI Observability. Experienced with project management and team management. Certifications: Certified Programmer for the Java Platform Oracle Certified Professional (OCP) - Oracle Database Administrator Patents: US9239741B2 System and method for flexible distributed massively parallel processing (MPP) ## Work Experience ### Principal Software Engineer @ Roche Jan 2018 – Present | Santa Clara, California, United States Roche is a global leader in the healthcare industry, Roche Diagnostics (RDS) builds the Navify Platform on AWS, Kubernetes, Okta, and other technologies, providing the foundation to support upstream online clinical applications. This platform serves thousands of hospitals and millions of patients across data centers around the world. My responsibilities: 1. Lead and primary owner of Roche Navify Platform Performance, Observability and Reliability. 2. Build Performance Infrastructure from scratch, including running workload using JMeter, save perf results into influxdb and show result metrics on grafana dashboard. 3. Create perf test cases and test data for multiple micro-services, GUI components, run regular test, longevity test and step up test every week, to simulate production workload and identify issues. 4. Generate Datadog, AWS and Prometheus Dashboards to monitor system behavior and resource usage, do profiling to find the root cause of perf issues and work with teams to resolve them. 5. Deploy Datadog, AWS and Prometheus Dashboards to production environments for site reliability, set alerts when critical metrics is abnormal, send the notification to stakeholders. 6. Predict the production workload and data size, run perf test with workload a few times heavier then production to config the system capacity, run step up test to stress system autoscaling. 7. Prepare release and production performance report and present to product manager and department leader, participate project architectural design, promote performance best practice. 8. Leverage AI to generate program and script, use Datadog LLM Observability to monitor LLM RAG workflow. Environment Java, Spring Boots, Python, SQL, Datadog, AppDynamics, Grafana, InfluxDB, Prometheus, JMeter, AWS, Kubernetes, Terraform, Jenkins, Github Action, PostgreSQL, MongoDB, Novu, Okta, Selenium WebDriver, SauceLabs, React, Chrome Dev Tool, OpenTelemetry, ArgoCD, Docker, Helm, CI/CD ### Lead Software Engineer @ SalesForce.com Jan 2015 – Jan 2018 | San Francisco Bay Area Core App Platform and Customization Performance Enterprise API and Bulk API provides Rest and Soap interface for upstream applications to access Salesforce core app data. DFS (Data Federation Service) provides cross cloud (For example: Retail to Commerce) data access through canonical data model using Micro Services technology including Docker, Kubernates, Scone, SAM. Customization provides tools for customers to enhance and build their own applications on Salesforce platform. Salesforce platform handles 1.3+ billion transactions per day. My responsibilities: 1. Build and enhance daily backend/frontend workload on VPod/AWS/Corsa using JMeter, Suzuki/STAF, Selenium/Web Driver, SFXTester, SUIT, Perfecto, and Enterprise API, Apex, Soql, Lightning SDK, Visualforce, App Builder, Community, Developer Console on Salesforce org 2. Design performance test env on VPod/AWS 3. Design performance metrics to measure the performance in parallel and distributed multi downstream micro services env 4. Create Performance Dashboard using Splunk, Argus 5. Report weekly performance status using PerfKit 6. Identify the performance regression root cause and fix using Splunk, Yourkit/JFR, AWR, EPT Viewer, ChromeDevTool 7. Resolve the production(sandbox) performance issue using Splunk, Sql Tracing/Tkprof/AWR, Warden 8. Involved in project design phase and make influence from performance perspective 9. Coding for Enterprise API, Bulk API, SUIT, SFXTester enhancement Environment Jetty, Docker, Kubernates, Java, Javascript, SQL, PLSQL, Python, shell, Oracle, Sayonara(Postgres), HBase, Linux, Mac, JMeter, Yourkit, Splunk, Selenium/Web Driver, Perforce, Git, Jenkins, Visualforce, Lightning SDK, Apex, SOQL, Salesforce CI, Agile/Scrum, TestNG, Mobile App ### Architect @ Chanjet Information Technology Co. Ltd. Jan 2013 – Jan 2015 | San Francisco Bay Area Cloud Platform for Business Social Network and Online ERP Management Applications. My responsibilities include Enterprise Event Messaging System Design and Implementation, Data Authorization System Design and Implementation, Subfamily Implementation, and Global Schema Development. 1. Data Authorization System provides APIs for Enterprise Apps to configure which of entity instances a particular user can select/insert/update/delete, App can specify user's privilege on one entity depending his/her privilege on another entity as long as there are relationships between two entities. User can grant/revoke privileges to/from another user. Privileges can be configured for user's department also. All conditions will be attached with each SQL/HQL statement during execution. 2. Subfamily: controls visibility of the shared entity instances between Enterprise Apps. The approach is similar with Data Auth. 3. Global Schema: Enterprise Apps will share the entity instances if they are mapped together. Global Schema is a platform level application form ISVs to specify which entities are mapped, also entity fields, event, enum types etc. It also involves with App Upgrade, Data Migration, Mapping Change and Validations etc. 4. Event Message System provides APIs for Enterprise Apps to send Event Message with in enterprise. It hides the JMS detail and complicity from App Developer, with transaction, persistent, request/reply, system event, event map/match, entity association, event message retrieving and notification etc. features. Environment: Java, SQL, HQL, Store Procedure, Postgres, Hibernate, Spring, XML, JAX-WS Web Services, Jersey, JSON, Jackson, JMS, ActiveMQ, JavaScript, Ajax, CVS, RESTful, maven, Geronimo, Git, Eclipse, Log4j, Junit ### System Architect @ FutureWei Technologies Jan 2011 – Jan 2013 | San Francisco Bay Area Big Data Solution Based On Postgresql Technology. My responsibilities include Distributed Data Management System Architecture, Database Cluster Management, Database Workload Management. 1. Distributed Data Management System Architecture: data partitioned on Postgres nodes, optimizer is responsible to distribute data to different nodes (DB Partitions) and assembly the results, Global Transaction Manager (GTM) is responsible to coordinate the transactions, nearly linear performance increase when number of nodes increasing (scale out, share nothing, similar with Citus). 2. Database Cluster Management: start/stop database cluster, add/remove nodes (DB Partition) 3. Workload Management: resources allocation - memory, cpu, network etc Environment: c/c++, lex/yacc, git, centOS ### Principal Member of Technical Staff @ Oracle Jan 2006 – Jan 2011 | Redwood Shores, CA Oracle Database Configuration, Oracle Real Application Cluster(RAC) Management, Oracle Automatic Storage Management (ASM) Configuration, Assistant Tools Development, Oracle Database/RAC Database/ASM Upgrade 1. Oracle Single Instance Database configuration and upgrade (release 9, 10, 11, 12) 2. Oracle RAC Database configuration and upgrade (release 9, 10, 11, 12) 3. Oracle ASM configuration and upgrade (release 10, 11, 12) 4. Oracle Network (NETCA) configuration (release 9, 10, 11, 12) Environment: Java, SQL, PL/SQL, Oracle9/10/11/12, Oracle RAC, Oracle ASM, Linux, Windows Server ## Education ### Master’s Degree in Computer Science & Technology University of Science & Technology of China ### Business Administration and Management in General Henley Business School ## Contact & Social - LinkedIn: https://linkedin.com/in/ian-dai-4664074 --- Source: https://flows.cv/iandai JSON Resume: https://flows.cv/iandai/resume.json Last updated: 2026-04-12