Lead engineer for the backend streaming analytical processing engine developed in Spark, Cassandra, Kafka and JRI.
Extracted data from HDFS using Hive queries from Hadoop Infrastructure.
Worked extensively on Spark streaming job to enhance the performance of the job multiple folds.
Used multi broker architecture to avoid single point of failure in Kafka system.
Developed and tuned schemas in NoSQL database Datastax Cassandra by spreading the data evenly based on the database queries.
Stress tested the complete integration of Kafka, Spark and Cassandra for reliability and high availability.
Handled day to day operation and maintenance of the infrastructure which also involved installation, provisioning new Spark and Cassandra clusters, configuration and administration, security, performance measurement and tuning.
Build an error logging application for Spark.
Used protocol buffer to serialize the captured data from the sensor.
Used Java/R Interface(JRI) to run R algorithms in Java application.
Installed and integrated Apache Spark, Apache Kafka with Datastax Cassandra.
Implemented Applications on Spark cluster in Java, Python and Scala.
Implemented Spark Streaming Applications to read data from Apache Kafka, analyze the data using Spark machine learning algorithm and store it to Cassandra.
Upgrade Cassandra from Datastax Community Version to Datastax Enterprise Version.
Added and deleted nodes to the Cassandra and Spark clusters.
Stress tested Spark and Cassandra to optimize performance of the cluster.
Installed and integrated SparkR with Apache Kafka to use R language in Spark cluster.
Used Protocol Buffer to send and receive serialized data.