# Venkata krishnan Sowrirajan > Staff Engineer | Data Infrastructure | OSS Contributor | Speaker & Reviewer Location: San Francisco Bay Area, United States Profile: https://flows.cv/venkatakrishnansowrirajan I enjoy building scalable and reliable distributed systems, and have extensive experience working with Apache Spark, Apache Flink, Trino, and more. Currently, I'm leading an effort on compute convergence at LinkedIn using Apache Flink. I am passionate about contributing to open-source projects and have contributed significantly to projects like Spark and Flink. If you're interested in distributed systems or open source, feel free to connect with me! ## Work Experience ### Staff Software Engineer @ LinkedIn Jan 2020 – Present | Mountain View, CA ### Staff Engineer @ Qubole Jan 2019 – Jan 2020 ### Member Of Technical Staff @ Qubole Jan 2016 – Jan 2019 | Mountain View ● Worked on Spark CBO stats estimation fixes for Aggregate and Sort operators. Speed up of almost 2x in select queries like Q83 of TPCDS. Contributed back to open source - (commit id b1857a ) ● Fixed deadlock issue in Spark’s UnsafeExternalSorter affecting one of the largest Qubole customer workloads. Contributed back to open source - (commit id 6c4552c6 ) ● Spark - S3 Select connector for Qubole Spark to push down projects and filters for CSV and JSON automatically; TPCDS benchmarks Geo-mean - 2.9x ; Max speedup - 5x (Blog - https://www.qubole.com/blog/amazon-s3-select-integration/) ● Serverless Spark on AWS Lambda - Spark executors completely runs as Lambda functions with S3 being the external storage to manage shuffle data (Blog - https://www.qubole.com/blog/spark-on-aws-lambda/) ● Worked on Qubole Spark autoscaling based on stage progress - pluggable, custom auto-scaling policies can be defined. ● Implemented Workload based Scaling limits leveraging Apache YARN’s Fair Scheduler queue limits. ● Implemented HDFS auto-scaling , scales up nodes based on DFS disk capacity and incoming data velocity. ● Mentoring new grads and interns; PR reviews; Spark version upgrades; On-call, Customer issues troubleshooting etc. ### Software Engineer @ MapR Technologies Jan 2014 – Jan 2015 | San Jose ● Worked on a ​Real time performance monitoring/troubleshooting dashboard for large MapR Hadoop clusters; Contributed from the POC stage ● Worked on Apache projects like Drill, Spark and Hive; Contributed to Apache Drill; Participated in user/dev groups of other open source projects like Apache Samza. ● Developed ​Real time Log analysis for MapR Hadoop clusters using ELK stack. ### Software Engineer Intern @ Intel Corporation Jan 2013 – Jan 2014 | Chandler ● Designed and developed a proof of concept on "Machine Learning as a Service" - Cloud based framework. ● ML as a service framework exposes machine learning algorithms available in various packages (Weka, Scipy, Numpy, Mahout etc) as web services. ● Developed visualizations using D3.js to analyze data samples over a timeline graph ● Set up a single node Apache Hadoop cluster to demonstrate the idea. ### Graduate student @ Arizona State University Jan 2012 – Jan 2014 | Tempe, AZ Masters in Computer Science ### Software Engineer Intern @ Apollo Group Jan 2013 – Jan 2013 ● Constructed Data Pipelines to aggregate and summarize instrumentation logs; Extracted behavioral attributes from the generated logs using Hive UDF’s; Created an automated Workflow using Oozie. ● Social Graph Analysis on discussion data stored in HDFS. Calculated Prestige score of each participant in order to measure the importance of the participant in the network. ● Graph Visualization of the discussion forum using D3.js is also developed. Used: Hive, Oozie, Spring MVC, JAVA, D3.js ## Education ### Bachelor of Engineering (B.E.) in Computer Science and Engineering Anna University Chennai ### Masters in Computer Science in Computer Science Arizona State University ## Contact & Social - LinkedIn: https://linkedin.com/in/venkatakrishnans --- Source: https://flows.cv/venkatakrishnansowrirajan JSON Resume: https://flows.cv/venkatakrishnansowrirajan/resume.json Last updated: 2026-04-12