Stanislav

Verified

Contact this Expert

2+ Years

Computer Vision Engineer, Data Analyst

Relog.ai

Industry: Finance, IT & Software, Healthcare

Specialization: Fraud Detection, Computer Vision, Natural Language Processing

Bangkok, Thailand

$-

Credit Scoring Model - Default Risk Prediction
https://github.com/stanislavlia/dtsa5509_cred_scoring_fi nal_project

Overview: I designed a predictive model for the Home Credit Default Risk competition, focusing on assessing applicants' loan repayment capabilities, especially for those with limited credit histories. This involved various data preparation stages, including Exploratory Data Analysis (EDA), advanced big data tools for data merging, preprocessing, and model training & selection. The goal was to construct a reliable prediction system using an ensemble of machine learning techniques, ultimately integrated into a FastAPI application.

Result: The model effi ciently integrates diverse data sources, including credit card balances, previous loan records, and payment histories, to accurately forecast repayment probabilities. Performance evaluation using the AUC metric emphasized accuracy and reliability for potential loan issuance decisions.

Technologies used: Python, Scikit-Learn, Pandas, NumPy, Matplotlib, Seaborn, Optuna (Bayesian Optimization), Polars, Ensembles, Xgboost, Catboost, LightGBM, Docker, FastAPI, SHAP importances



IMAGE CAPTIONING SYSTEM
https://github.com/stanislavlia/dtsa5511_image_captioning

Overview:The project seamlessly integrates computer vision for visual content interpretation and natural language processing for generating coherent sentences. Leveraging a convolutional neural network, it extracts visual features from images, followed by a recurrent neural network or transformer-based architecture to produce relevant captions.

Result: Successfully developed and deployed an Image Captioning System based on the image2text model. Utilized cutting-edge deep learning tools and frameworks, facilitated distributed training across multiple GPUs, and streamlined deployment via Docker containers. Implemented experiment tracking with Weights & Biases, ensuring a reproducible training pipeline for iterative model refi nement. Introduced a modern architecture featuring Transformer Encoder/Decoder Blocks and Convolutional Neural Networks for effi cient feature extraction. The model is readily deployable via Docker Hub's pre-built containers.

Technologies used: Python, Tensorfl ow, Keras, Numpy, FastAPI, Docker, Effi cientNetB1, Transformer Architecture, Weights & Biases, Gradio


Analysis of Genes Expression Cancer Detection data
https://github.com/stanislavlia/tsne_genes_analysis

Overview: As part of the DTSA5510: Unsupervised ML class, this project delved into the analysis of high-dimensional gene expression data to uncover crucial genetic markers for breast cancer diagnosis and prognosis. Leveraging Bayesian optimization, the project refi ned the t-SNE dimensionality reduction technique, facilitating eff ective visualization and analysis of data extracted from 151 microarray samples sourced from the CuMiDa GSE45827 dataset.

Result: Employing meticulous statistical testing and mutual information criteria, the project meticulously sifted through a vast initial pool of 55,000 genes to identify 240 pivotal genes of utmost signifi cance. These selected genes underwent visualization using the optimized t-SNE algorithm, preserving both macroscopic and microscopic data structures. This approach provided profound insights into the spatial relationships and characteristics of genetic data pertaining to various cancer types.

Technologies used: Python, Unsupervised Learning, Matplotlib, PCA, t-SNE, Sklearn, SciPy, Logistic Regression, Bayesian Optimization, Statistical Tests (Chi-Square, Mutual Information), Model Interpretation, Multiple Hypothesis Testing)