ML & AI Projects

Master's Thesis: Predicting the Injury Risk of Acute Hamstring Injury

Python • XGBoost • Scikit-Learn • Pycatch22 • Pandas • Numpy • Matplotlib • Seaborn

Problem

Hamstring muscle injuries are frequent and recurrent in elite sports (athletics and soccer). Manually identifying rising injury risk is unreliable. Key ML challenges include severe class imbalance (injuries are rare events), sparse longitudinal monitoring data, and non-linear interactions between risk factors (training load, fatigue, strength, mobility, prior injury).

Dataset

Longitudinal elite athlete monitoring data from athletics (sprinters/jumpers) and soccer players. Features span three domains: training load (objective and subjective RPE), strength screenings (eccentric/concentric), and mobility/flexibility screenings. Target label is binary: acute hamstring injury within a defined look-ahead window.

Approach

End-to-end ML pipeline: (1) data cleaning and athlete filtering, (2) preprocessing via rolling-window aggregation or feature engineering, (3) missing-data imputation (MissForest or within-athlete MissForest), (4) feature selection (PCA, single-feature classifier ranking, or none), (5) XGBoost classification trained with Leave-One-Athlete-Out (LOAO) cross-validation to prevent data leakage across athletes. Class imbalance is handled via scale_pos_weight. Generalization is validated across sports and injury types.

Metrics

  • AUC-ROC
  • Average Precision (PR-AUC)
  • F1-score
  • Precision
  • Recall
  • Accuracy

Computer Vision: Crafting and Learning Features

Python • OpenCV • NumPy • Scikit-learn (PCA, t-SNE, SVM) • Matplotlib • Pandas

Problem

Explore and compare the efficacy of traditional handcrafted feature extraction (SIFT, PCA) versus modern learned features for face recognition, specifically identifying celebrities and their look-alikes in a low-data regime.

Dataset

A subset of the VGG Face Dataset consisting of 80 training images and 1,816 test images (e.g., Jesse Eisenberg, Mila Kunis).

Approach

Face detection using HAAR cascades; Dimensionality reduction via Eigenfaces (PCA); Feature extraction using SIFT; Distribution visualization with t-SNE; Classification using a Non-linear SVM with an RBF Kernel.

Metrics

0.78744 (Accuracy)

Computer Vision: In the Name of Deep Learning

Python • TensorFlow • Keras • OpenCV • NumPy • Pandas • Scikit-learn

Problem

Implementation and analysis of end-to-end deep learning pipelines for three distinct computer vision tasks: Multi-class classification, semantic segmentation, and evaluation of model robustness against adversarial attacks.

Dataset

PASCAL VOC 2009 (comprising 20 object classes including animals, vehicles, and indoor objects).

Approach

Deep CNN construction for image categorization; Semantic segmentation for pixel-level object masking (FCN/U-Net architecture); Generation of adversarial examples using FGSM to test model vulnerabilities.

Metrics

0.6906 (Accuracy)

Naive RAG System for Culinary Knowledge

Python • Scikit-learn • NLTK • Hugging Face Transformers • Pandas

Problem

Developing a pipeline to answer natural language queries using a specific recipe database, overcoming the limitations of general LLMs regarding private or niche domain data.

Dataset

Recipe Knowledge Base (Parquet format) containing fields like ingredients, tags, and preparation steps.

Approach

Implemented a retrieval-augmented generation (RAG) pipeline. Used a TF-IDF vectorizer with NLTK-based preprocessing (tokenization, stopword removal, lemmatization) and a Nearest Neighbors model for retrieval. Integrated with Llama-3.2-1B-Instruct for final answer generation.

Metrics

  • Precision
  • Recall
  • F1-Score
  • Mean Average Precision (MAP)

Advanced Semantic Retrieval and Re-ranking

Python • Sentence-Transformers • PyTorch • Scikit-learn • Datasets (Hugging Face)

Problem

Improving retrieval accuracy by addressing the semantic gap and vocabulary mismatch issues inherent in sparse retrieval methods like TF-IDF.

Dataset

Recipe Knowledge Base and the WikIR1k benchmark dataset.

Approach

Developed a two-stage retrieval system. Stage 1: Dense retrieval using a Bi-Encoder (Sentence-BERT 'all-MiniLM-L6-v2') to fetch the top 100 candidates. Stage 2: Re-ranking using a Cross-Encoder ('ms-marco-MiniLM-L6-v2') to optimize the final top-k results based on deep semantic relevance.

Metrics

  • Macro/Micro Precision
  • Recall
  • F1-Score
  • MAP
  • nDCG (Normalized Discounted Cumulative Gain)

Mobile Robot Planning, Control, and Estimation

Python • NumPy • SciPy • Pinocchio (robot dynamics) • Meshcat (3D visualization) • URDF modeling

Problem

The project involved developing a full navigation stack for a mobile robot to traverse a 2D environment from a start pose to a goal pose while avoiding obstacles. Challenges included handling non-holonomic constraints, managing noisy GPS sensor data, and transitioning from differential-drive kinematics to a car-like bicycle model.

Dataset

Simulated 2D environment containing dynamic obstacle configurations, including cylindrical pillars, walls, and boundary towers. Sensor data was generated via simulated GPS measurements with additive Gaussian noise.

Approach

Implemented a multi-stage robotics pipeline. Planning: Used a Probabilistic Roadmap (PRM) for global pathfinding, integrated with an optimization-based local planner using the SLSQP algorithm to find optimal velocities. Estimation: Developed an Extended Kalman Filter (EKF) to recursively estimate the robot's state (x, y, theta) by fusing noisy GPS data with control inputs. Control: Utilized trajectory tracking for differential drive and a Stanley controller for the car-like bicycle model to minimize heading and crosstrack errors.

Metrics

System performance was evaluated based on goal-reaching accuracy (Euclidean distance to goal ≤ 0.2m), collision-free path execution, and state estimation convergence under varying sensor noise levels.

Constraint Solver: Smart Scheduler for a Paint Shop

C# • Google OR-Tools (CP-SAT) • REST API

Problem

Scheduling beams in industrial paint shops is done entirely by hand, consuming significant time and expert effort. The problem is highly constrained: color change sequences, cleaning times between jobs, maximum total production hours per shift, and whether beams require manual or automatic painting must all be respected simultaneously. Finding an optimal schedule that maximises on-time order delivery is beyond practical manual optimisation.

Dataset

Real incoming order data from a paint shop operated by Coat IT, a software company building control systems for paint shops. Each order specifies beam dimensions, required color, painting mode (manual/automatic), and delivery deadline.

Approach

Built a constraint programming solver using Google OR-Tools (CP-SAT). The pipeline first splits incoming orders into atomic paint jobs, then assigns and sequences those jobs across available painting stations while enforcing all hard constraints (color-change penalties, cleaning intervals, shift capacity, manual vs. automatic lanes). The objective function maximises the number of orders completed by their deadline. API integration via a C# / Visual Studio backend was used to feed live order data into the solver and return the generated schedule.

Metrics

  • Number of on-time painted orders vs. manual schedule
  • Total schedule makespan reduction
  • Manual comparison with expert-crafted schedule (proof-of-concept evaluation)

Reinforcement Learning: Project Comming Soon

Problem

Still working on project

Dataset

Still working on project

Approach

Still working on project

Metrics

Still working on project

Profile

Thibault Willems

AI Engineer & Entrepreneur

MTEC Games

Immersive Experiences

ML & AI

Predictive Modeling & Data Science

Career

2025 - NowCo-Founder MTEC Games
2025Summer Job AI Engineer
2021 - Now (Until summer 2026)Student Engineering / Computer Science: AI
2015Topsport Athlete

Front/Back-end

Scalable Architecture

Skills

tensorflow
sql
pytorch
python
kubernetes
jupyter
huggingface
github
git
docker
tensorflow
sql
pytorch
python
kubernetes
jupyter
huggingface
github
git
docker
tensorflow
sql
pytorch
python
kubernetes
jupyter
huggingface
github
git
docker

Contact

Let's build something

@

Front & Back-end Projects

https://olavkttt.com

OlavKTTT

Full-stack e-commerce platform for video creator Olav Koslosky. Includes a webshop for purchasing video content, a community forum with restricted access, and multi-tier user role management.

Tech Stack

ViteReactTypeScriptTailwindSupabaseBunny.netStripeResend
https://mtecgames.com

MTEC Games

Frontend website for MTEC Games, a high-tech immersive escape room co-founded with my father. Showcases the experience with 3D WebGL visuals and interactive storytelling.

Tech Stack

Next.jsReactTypeScriptTailwindReact Three FiberBlender
https://thibaultwillems.com

Portfolio Thibault Willems

Personal portfolio showcasing past projects, skills and career journey. Built with a bento-grid layout, 3D models, and smooth Framer Motion animations.

Tech Stack

Next.jsReactTypeScriptTailwindReact Three FiberBlender