# Sujay Patil > Research Software Engineer at Lawrence Berkeley National Laboratory Location: San Francisco, California, United States Profile: https://flows.cv/sujay My current research/development interests include: * Developing open source tooling/software to standardize (meta)data capture and representation with schemas * Application of developed toolchains to government/federally-funded scientific projects largely related to biology/medicine to enable (meta)data integration by enforcing FAIR metadata guidelines * Leveraging knowledge graphs and ontologies as powerful knowledge representation structures that can be used as input to ML/prediction models ## Work Experience ### Research Software Engineer @ Berkeley Lab Jan 2021 – Present | Berkeley, California, United States > Contributing to the development of LinkML, a linked open data modeling language > Development and maintenance of schema in LinkML for DOE/OSTP-funded National Microbiome Data Collaborative (NMDC) project > Development of ETL pipelines to integrate knowledge/metadata about microbiome projects from various federal data sources as part of the NMDC data repository > Development of microbiome related knowledge graphs to power ML/prediction models > Development of bioinformatics pipelines for the detection of synthetic engineering ### Research Software Engineer @ Sage Bionetworks Jan 2020 – Jan 2021 | Seattle, Washington, United States > Added APIs/functionality to Python backend/package for a novel data ingress ecosystem, which uses schema.org (serialized as JSON-LD) based data-model specification, and JSON schema for validation > Packaged/bundled Python app backend, making it more modular and distribution-ready > Added RESTful web service/interface to expose core module functionalities using Python/Flask, Swagger/openAPI and connexion > Designed and developed (and benchmarked) algorithm to allow easy modification of nodes/edges on a networkx graph created from JSON-LD > Integrated Python/Flask backend with frontend powered by R Shiny > Added unit tests and integration tests using Python pytest > Added CI/CD workflow to automate processes like linting of code and automatically running the test suite ### Student Software Developer @ Keck Medicine of USC Jan 2019 – Jan 2020 | Los Angeles, California, United States Developing clinical informatics pipelines and downstream processing automation tools, to help clinical research scientists and coordinators, supporting their work with the NIH-funded All of Us research project. > Wrote script to automatically update health/patient records with enrollment status in REDCap DB managed by CTSI, from All of Us (AoU) research project's HealthPro system, using necessary APIs. > Developed automation scripts (using PyCap library) and native REDCap plugins (using "REDCap External Modules" framework) to streamline automatic generation of reports from REDCap based on externally specified (googlesheet) branching logic. > Migration of scripts from SAS to Python. ### Software Engineer @ Mercedes-Benz Research and Development India Jan 2018 – Jan 2019 | Bengaluru Area, India > Worked in the Powertrain Mechatronics department—taking care of all (Software in Loop) SiL testing activities like MISRA analysis and MATLAB/Polyspace analysis. Wrote scripts that generate test cases for ECU-specific unit testing modules/functions in the powertrain controller softwares. In addition, I also worked as a web developer, building web-based tools particularly using PHP (frameworks) to automate many processes in the unit testing workflow. > Worked as a PostgreSQL/PostGIS database developer to write PL/PgSQL, Python (for QGIS as well) scripts to analyze this map data. This map data is used by the Integrated Predictive Powertrain Controls ECU for improving fuel efficiency and optimizing gearshift strategy in Daimler Trucks and Buses. > Laid out the infrastructure and wrote the software to store and process NOx (emission) information parsed from text dumps of MATLAB/Simulink models, to be used by the technical compliance team (tCMS) in Daimler AG, Germany. Backend was built using MongoDB and Java (for the sake of the PoC). ### Research Intern @ European Bioinformatics Institute | EMBL-EBI Jan 2018 – Jan 2018 | Cambridge, United Kingdom > Worked with the Protein Data Bank (in Europe, PDBe) team to develop a querying language—MolQL. The ultimate goal of which was the meaningful sharing of annotated structural biology data in the form of 3D views of structures is critical in exploiting data in the PDB. My task was to allow MolQL to parse generic PyMol/JSmol/VMD scripts and manipulate the 3D-model rendered in LiteMol accordingly. > The basic workflow I employed was — write a set of functions that could interpret PyMol/JSMol scripts, develop a parser that could translate its abstract syntax tree (AST) into a MolQL expression tree, which could ultimately be interpreted by different viewers. Transpilers were written in TypeScript, and visualization tool used a combination of React.js + WebGL. ### Research Intern @ Mazumdar Shaw Cancer Center Jan 2017 – Jan 2018 | Bengaluru Area, India > At the neuro-oncology lab in MSCTR, I was involved in the development of a full-stack web application for the efficient storage and retrieval of large-scale clinical and genomic data relating to brain tumours. I helped collate large (”big”) glioma datasets including omics, radiology and pathology images, and clinical data that were publicly available with the in-house generated datasets and store them in a NoSQL database with efficient querying utilities. In addition, I also provided the user with specific interactive, biologically-useful statistical visualizations (like KM survival curves). The tools were deployed on the lab’s server for use by the research scholars, and was also supervised by scientists from the Institute of Bioinformatics, India. ### Research Intern @ National Centre for Biological Sciences Jan 2017 – Jan 2017 | Bengaluru Area, India > Worked at the neurobiology lab to help build a Java-based plugin for an open-source medical image processing tool called ImageJ (Fiji) by extending the ImageJ API, to help in the detection of pyknotic (apoptotic) cells—based on size, shape and cell signal intensity. > Had to be done for 3D-images so it involved looping through the z-stack and analyzing each plane of the confocal image separately. Firstly, it involved the successful identification of 3D objects/cells from the z-stack and secondly, using CNNs to classify the objects as "pyknotic/non-pyknotic". ### Research Intern @ Xcode Life Jan 2016 – Jan 2017 | Chennai Area, India > Built a custom web-based content management system for maintenance of data, which included patient data, sequenced data from patient saliva samples (after genotyping) and the health profiles which came in after running a comparison with the Single Nucleotide Polymorphism database, which we had for each disease. The reports from the data (genetic profiles) would ultimately be used by experts to suggest treatment/therapies to help patients avoid diseases. ### Software Development Intern @ VMware Jan 2016 – Jan 2016 | Bengaluru Area, India > Built a web application for members of the HR team, which would allow verified users (LDAP authentication) to update employee details directly in the Active Directory. > JavaScript frameworks like jQuery and web-services technologies like RESTful API, in addition to knowledge of scripting languages like PHP was necessary to implement the tool. ## Education ### Master's degree in Bioinformatics University of Southern California ### Bachelor's degree in Computer Science and Engineering Vellore Institute of Technology ### Senior Secondary School Delhi Public School - Bangalore South ### Secondary School Delhi Public School - Bangalore South ## Contact & Social - LinkedIn: https://linkedin.com/in/sujay-patil --- Source: https://flows.cv/sujay JSON Resume: https://flows.cv/sujay/resume.json Last updated: 2026-03-29