I am a third-year engineering student at ENSTA Paris and a Masterโs student in Data Science at รcole Polytechnique (IP Paris), pursuing a dual degree through the academic partnership with:
- ENIT โ National Engineering School of Tunis
- ENSTA Paris โ รcole Nationale Supรฉrieure des Techniques Avancรฉes
My background combines applied mathematics, probability, statistics, and optimization with advanced data science and machine learning, with a strong focus on real-world, high-impact applications.
- ๐ Statistical Learning & Data Science
- ๐ค Machine Learning & Deep Learning (MLP, CNN, RNN, Transformers)
- ๐ง NLP & Large Language Models (RAG, fine-tuning, prompt engineering)
- ๐ Time Series & Spatio-Temporal Modeling
- ๐ Extreme Value Theory & Dependence Modeling
(Copulas, Pickands dependence function, spatial extremes) - ๐ Climate, Risk & Energy Data
I am particularly interested in projects that connect theoretical modeling with large-scale, real-world data, emphasizing interpretability, robustness, and high-quality visualizations.
- Languages: Python, R, SQL, C/C++, Bash
- ML / DL: Scikit-learn, PyTorch, TensorFlow, Optuna
- NLP & LLMs: Transformers, LangChain, RAG, LoRA, Quantization, SFT / DPO
- Data & Visualization: Pandas, NumPy, Matplotlib, Seaborn, Power BI
- Scientific Computing: SciPy, Statsmodels, RStudio
- Tools: Git, GitHub, Docker (basics), Linux, LaTeX
Neural estimation of the Pickands dependence function for Gumbel, Galambos, and Tawn copulas, including convex projection, simulation studies, and applications to spatial climate data.
Spatio-temporal modeling of meteorological and climate data (Mรฉtรฉo-France), focusing on extreme events, dependence structures, and risk measures (VaR, TVaR).
Design of a Retrieval-Augmented Generation (RAG) system with dynamic knowledge bases, traceable answers, and interactive dashboards using LangChain, ChromaDB, and Streamlit.
End-to-end ML pipeline handling highly imbalanced data, feature selection, model comparison (XGBoost, Random Forest), and robust evaluation (ROC-AUC, F1-score).
Exploratory analysis and predictive modeling of fuel consumption (mpg) using the mtcars dataset: correlation analysis and visualization, multiple linear regression, stepwise AIC variable selection, and PCA + Principal Component Regression (PCR), with model comparison using ANOVA.
- Develop robust, interpretable, and scalable ML models
- Work on research-driven or high-impact applied data science projects
- Contribute to domains such as climate, risk, energy, and AI systems
- Maintain high standards in code quality, reproducibility, and visualization
- GitHub: you are here ๐
- LinkedIn: Rayen Zargui
- Email: rayen.zargui@ensta-paris.fr

