Lukas

Verified

Contact this Expert

5+ Years

Data Analyst, NLP Engineer

ML Toptal, CultureX


Industry: IT & Software, E-commerce

Specialization: Anomaly Detection, Recommendation Systems, Natural Language Processing

Lithuania

$-

Tech Stack: Python, SQL

Expert’s cases:

  1. As the primary machine learning engineer, spearheaded a project alongside an intern, where we successfully implemented two innovative language models which generated novel dictionary entries and ranked existing dictionary data.

  2. Despite working under tight deadlines, the team I managed, which consisted of myself and an intern, successfully implemented the solutions and received excellent feedback after an extensive review by dictionary editors. The final outcome is utilized by tens of millions of individuals worldwide.

  3. Used Pytorch, Pandas, Fasttext, NLTK, spaCy and other Python libraries to develop generative and ranking algorithms which employed Large Language Models, word vectors, pre-trained transformer models for toxicity filtering, spell-checking tools and programmatic rules designed with Subject Matter Experts.

  4. The project objectives required accessing terabytes of public and private data stored in MongoDB and AWS S3, preprocessing by means of powerful AWS EC2 instances and using redis cache for increasing the speed of the final algorithm.

  5. Established a comprehensive MLOps pipeline hosted on EC2 instance which incorporated data retrieval from MongoDB, algorithmic data transformations using python, and extensive data validation of the model output.

  6. Iteratively refined the algorithm based on close collaboration with Subject Matter Experts and metrics scored against a sample dataset. Lead bi-weekly meetings with non-technical Subject Matter Experts by preparing diagrams, presentation slides and thoroughly explaining the steps of the algorithm.

  7. Guided user feedback and data-driven iterative planning with the CEO of a large eCommerce analytics company based in California, US. The long-term goal was to optimize over $250M of client spending by means of data science and machine learning.

  8. Researched terabytes of eCommerce data using ElasticSearch, MongoDB, and AWS Athena. Dashboards and charts for stakeholder decision-making were prepared using Google Data Studio, Tableau, Plotly, or Matplotlib.

  9. Unlocked better spend opportunities by building proprietary automated insights. Algorithms were developed in Python, but data was preprocessed using AWS Athena or ElasticSearch.

  10. Investigated an early version of Amazon Marketing Cloud containing 300+ features with interaction-level data on millions of users. Contributed to improving data infrastructure by identifying issues in data aggregation from high-traffic sources.

  11. Extracted insights from Amazon Marketing Cloud by developing complex SQL queries with multiple interrelated subquery components in the context of privacy restrictions and limited SQL functionality.