Data Management Engine (DME) is a Java services layer that is part of the control plane for DataSphere.
Responsibilities
• Significantly improved performance of the product by eliminating bottlenecks (identified using YourKit profiler), better thread and memory management. Refactored the data access layer (using a intermediate cache and batching of updates) and used native SQL instead of Hibernate for better performance. Scaled the system to handle a billion objects (from around a million).
• Worked with the team to develop a module to copy files between a NFS data store and Amazon S3 (using Amazon S3 Java SDK and a Java NFS Client), all managed using rule based objectives.
• Optimized the file creation and deletion process using a feedback loop (scale up or down depending on past performance).
• Developed a module for volume latency prediction (using bandwidth and iops) using machine learning (pandas, matplotlib, SciKit-Learn, tensorflow)
Environment: Java, Spring, REST, Hibernate, PostgreSQL, Kafka, InfluxDB, Sun RPC, Protocol Buffer, python, Flyway, NFS, YourKit Profiler, Machine Learning (matplotlib, pandas, SciKit-Learn), Amazon S3 SDK.