🌎 Location: Montréal, Québec, Canada
🔗 LinkedIn: https://www.linkedin.com/in/annthomygilles
📖 AI Engineering (∼20%)
📖 Designing Data-Intensive Applications (∼20%)
Senior Data Scientist and AI Engineer with a multidisciplinary background spanning computational biology and advanced analytics. I specialize in developing data-driven solutions that transform complex challenges into actionable insights across industries including finance, healthcare, automotive, and government.
With expertise in both technical implementation and strategic product development, I excel at bridging the gap between cutting-edge technology and business value. My experience includes:
- AI/ML Engineering: Designing and implementing machine learning models and workflows with a focus on NLP, generative AI, and network analysis
- Product Development: Leading full-lifecycle product creation from ideation to market deployment, including an Anti-Money Laundering application
- Data Architecture: Architecting robust data pipelines and governance frameworks for complex, multi-source environments
- Cross-functional Leadership: Collaborating effectively with stakeholders from technical and non-technical backgrounds to deliver impactful solutions
ChatGPT: Reflet du bullshit en entreprise
Comprendre les métiers de la Data le temps d'une pause café.
Is your organization TRULY data-driven? 12 questions to find out!
Le Temps Guérit Tout. Excepté Le Mauvais Code.
MarketPulseAI is my most ambitious side project to date - an advanced real-time analytics platform that combines traditional stock market data analysis with social media sentiment to provide holistic market insights. The system processes millions of data points per minute to detect market patterns and sentiment shifts that often precede price movements, giving users a potential edge in understanding market dynamics.
MarketPulseAI stands apart through its dual-analysis approach:
- Traditional Price Data Analysis: Deep learning models process market metrics to predict potential price movements
- Social Media Sentiment Analysis: NLP algorithms capture market mood and emotional drivers of price action
This combination delivers a more complete picture of what's driving stock prices, integrating both quantitative factors and human sentiment that influences market behavior.
- Ingestion: Apache Kafka + Kafka Connect
- Processing: Apache Spark (Stream Processing)
- Storage:
- Redis (real-time features/online store)
- Cassandra (historical market data/offline store)
- Elasticsearch (social media content/text data)
- Market Analysis: Custom deep learning models
- Text Processing: Advanced NLP for sentiment analysis
- Signal Integration: Weighted ensemble models
- Containerization: Docker
- Orchestration: Kubernetes
- Monitoring: Prometheus + Grafana
- API Layer: FastAPI
- Visualization: Streamlit dashboards
- Real-Time Updates: WebSockets
- Social sentiment shifts sometimes predict price movements 1-3 hours before they appear in market data
- The relative importance of technical vs. sentiment features varies dramatically based on market conditions
- Quality and consistency of input data proved far more important than model sophistication for prediction accuracy
MarketPulseAI is still in active development. I'm currently working on:
- Expanding data sources (options flow, institutional trading patterns)
- Optimizing the pipeline for improved throughput and reduced latency
- Developing enhanced visualization dashboards
- Preparing for cloud deployment with managed services
Disclaimer: MarketPulseAI is an EDUCATIONAL exploration, not an investment tool.
Dashboard Project - Private 📊💻
- Building a dashboard connected to a database using Flask, mySQL, and web scraping
- Implemented automatic notifications sent to Discord and Telegram
WhatsApp Integration - Private 📱💬
- Building a Python pipeline connected to a database using Flask, MongoDB, and Docker
- Implemented API integration with WhatsApp for automated messaging
Weather Data Aggregation with Kafka - Public ☁️🌡️
- Building a project to scrape weather data from different APIs
- Experimenting with Kafka to aggregate the data
- Integrating Spark for data analysis and processing
- Project is focused on learning Kafka and expanding knowledge of big data technologies
- Worked on a graph-based modelling project for COVID-19 infection spread and management
- Gained experience with Neo4j, ElasticSearch, PostgreSQL, MongoDB, Prefect, Dask, Python, Apache Airflow, Unit testing, CI/CD, JIRA, Agile, API, Pandas, and scikit-learn
- Worked on a DataOps project to clean and prepare data from car sensors for R&D use cases
- Gained experience with AWS services, Dask, Python, multi Unit testing, CI/CD, JIRA, Agile, API, Pandas, and scikit-learn
- Developed an automated tool for resume classification and summarization using NLP techniques
- Gained experience with Python, R, Shiny, MongoDB, TFIDF, word2vec, doc2vec, Random Forest, XGboost, and Docker
- Built a comprehensive web app dashboard for employee management and tracking
- Gained experience with Google Cloud Platform (GCP), Docker, web development, Firebase, R, JavaScript, MongoDB, Git, HTML, and CSS
- Worked on a decision support system for improving doctors prescribing behavior during infectious disease
- Gained experience with Python, R, inferential statistics, machine learning, dimensionality reduction, business intelligence, metagenomics, differential abundance analysis, nanopore technology, and SQL
- Developed a differential gene expression analysis workflow using Python, shell, and R languages
- Gained experience with Tuxedo suite, DeSEQ2, MEME suite, GATK, Picards-tools, Stringtie, Go enrichment, variant calling, and differential expression
- Characterized virulence factors and vaccine targets of a bacterial canine pathogen
- Gained experience with cell culture techniques, flow cytometry, genetic engineering, northern and western blotting, fluorescent and confocal microscopy, and PCR
| Category | Skills |
|---|---|
| Programming | 🐍 Python, R, 💻 Shell/Bash/Command line |
| Databases | 🍃 MongoDB, 🗃️ SQL, 🔗 Neo4J, 📊 Cassandra, 🔍 Elasticsearch |
| Big Data | 🔄 Apache Kafka, ⚡ Apache Spark, 🗄️ Redis, 📈 Stream Processing |
| Statistics & Machine Learning | 🔬 Inferential Statistics, 📈 Hypothesis testing, 📊 Regression methods, 🔄 Correlation, 📉 Descriptive Statistics, 🚦 Markov model, 🌐 Dimensionality reduction, 🧩 Clustering, 🌳 Decision tree, 🧠 KNN, 🎄 SVM, 🌱 Random forest |
| Tools | 🧰 Git, 📊 Matplotlib, 🔢 Numpy, 🐼 Pandas, 🍃 Pymongo, 🔬 Scipy, 🤖 Scikit-learn, 🌊 Seaborn, 🔗 SQLalchemy |
| Web Development | 🌐 HTML5/CSS3, 💻 Javascript, Typescript, NestJs, Prisma, 🌶️ Flask |
| DevOps & Cloud | 🐳 Docker, ☸️ Kubernetes, 🔍 Prometheus, 📊 Grafana, ☁️ AWS |
| Environment | 💻 High Performance Computing, 🐧 Linux |
| Data Science | 🛠️ Data Engineering, 🧑💼 Data Governance, 📈📉📊 Big Data, 🤖 Machine Learning, 📊 Data Analytics, 🍃MongoDB, 🐳 Docker, 🗃️ PostgreSQL, ☁️ Amazon Web Services (AWS), 📈 JIRA, 🌐 Web Development, 🧑🔬 NLP |
Master in Bioinformatics and Statistics (2015 - 2018)
- Three-year Research & Professional Master's Degree in Bioinformatics, Statistics and Mathematics
- Curriculum covers management, processing, and analysis of sequences and massive data
- Data science: supervised learning (Regression, Decision Tree, Random Forests, Markov Chains, SVM, KNN, Neural Network) and unsupervised learning (KNN, K-means, CAH)
Master's Degree in Bioengineering and Biomedical Engineering (2013 - 2015)
- Interdisciplinary education in biomedical research and engineering program from various backgrounds including bioengineering, cell and molecular biology, oncology, pharmacology, genetics, and microbiology
Bachelor's Degree (Licence) in Biochemistry and Biology (2010 - 2013)
- Curriculum covers biochemistry, cellular & molecular biology, immunology, physiology, biological statistics, organic chemistry

