Los Angeles, California, United States
Center on Knowledge Graphs
• Transformed a NoSQL database into a Neo4J-compatible knowledge graph with 17B edges and 400M nodes. This graph built using various graph database techniques is used to study author success, citation & collaboration patterns and citation intent.
• Implemented a pipeline using ElasticSearch and lexical similarity matching to link 5M records for data completeness.
• Utilized high-performance and parallel computing for graph analysis techniques like community detection and snowball sampling.
• Produced 2 workshop papers on topics of author success and citation intent classification.
• Analyze data and matched researcher data to voter records on names using EM algorithm.
• Mine time-varying researcher affiliation data from publications to uncover work history and geocode work addresses via Google Maps
API, to group data to metro areas, reducing record matching scope to a smaller area.
• Enhanced organization name matching, resulting in an increased match rates to create high-quality ground truth data.