Experience

T. Rowe PriceSoftware Engineer - Data and Analytics (Investments)

2024 — Now

London, England, United Kingdom

AccelexSenior Data Engineer

2022 — 2024

London, England, United Kingdom

Python / PySpark (SparkSQL) / Databricks (AWS) / ArangoDB / GitHub (Actions) / pytest-pycov

Lloyds Banking GroupData Engineer

2020 — 2022

London, England Metropolitan Area

Lloyds Banking GroupAudit Assistant Manager - Applied Science & Data Analytics

2019 — 2020

London, United Kingdom

In a ring-fenced division of 16+ people, I have the role of implementing solutions to facilitate the analysis of large volume (un)structured data.

The bulk of my activities are carried out using Python 3.

I have mainly worked on:

Back-End of an NLP web-application

------------------------------------

In charge of designing & implementing the class structure which can be split in 3 modules:

> Data-ingestion

Extract in bulk raw text data from (PDF, DOCX, PPTX) documents, this text data is then sliced and stored in a custom data structure.

> Preprocessing

Toolkit of functions which performs various transformations on text data (remove stopwords, lemmatization using memoization, bigrams generator...)

> Text Analysis

Exact Search: Implements SQL Like/Wildcard matching behavior

Semantic Search: Based on Word2Vec implementation from Gensim, word matching via cosine similarity.

Named Entity Recognition: Spacy implementation to retrieve mentions of People, Organization and Geopolitical Entities.

Insights via Top-Words: Generates top N word/document using Term-Frequency Inverse Document Frequency (TF-IDF) Score.

FuzzyMatching utility

---------------------

Before: Usage of SQL Like function which shows its limits when trying to match addresses, customer names from various data sources.

Now: Implemented 2 different versions of fuzzy matching

1: fuzzywuzzy package which can be slow when number of comparison > 10e6 (based on string manipulations)

2: sklearn NearestNeighbors (K = 1) + TFIDF Vectorizer (with char trigrams) this solutions reduces significantly processing time and based on cosine similarity (linear_kernel).

Processing time: 10e6 comparisons in 15 minutes (2)

KPMG LuxembourgSenior IT Auditor & Data Analyst

2018 — 2019

Luxembourg

Education

EPITA: Ecole d'Ingénieurs en Informatique

Experience+2

Education

Master of Science - MS

Experience