Skip to content
View SmarthBakshi's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report SmarthBakshi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SmarthBakshi/README.md

πŸ‘‹ Hi, I'm Smarth

πŸ’‘ Data Scientist with experience across Sports Analytics, Biomedical AI, and Autonomous Vehicle domain
πŸ“œ Patent Holder – Patented algorithm in Data Science (based on my Master’s thesis)
πŸŽ“ M.Sc. Informatics (Machine Learning & Analytics) – Technical University of Munich
βš™οΈ Passionate about building production-grade ML systems that are scalable, modular, and impactful


πŸš€ About Me

I build scalable Python applications, real-time data pipelines, and end-to-end projects in machine learning, data science, and Generative AI. Every solution follows industry-standard best practices, combining technical depth with real-world impact to deliver results that are both robust and commercially viable.


πŸ“Œ Featured Projects

Tech Stack: Python, Poetry, Streamlit, MLflow, Optuna, Docker, CI/CD

  • Real-time match analytics and player pass network visualizations
  • End-to-end ML pipeline for pass outcome prediction, trained using match event data
  • Integrated with MLflow for model tracking & reproducibility, and Optuna for hyperparameter optimization
  • Modular architecture: separate data ingestion, processing, modeling, and visualization layers
  • Built with production-readiness in mind β€” includes structured logging, environment isolation, and CI workflows

πŸŽ₯ Watch Full Demo Video


2️⃣ ResearchAI (Work in Progress)

Tech Stack: FastAPI, Apache Airflow, OpenSearch (BM25 + vectors), PostgreSQL, MinIO, Gradio, Docker Compose, Poetry (Ollama planned)

  • Production-grade RAG with hybrid retrieval (BM25 + embeddings); infra-first Docker setup with service health checks
  • Automated ingestion (arXiv β†’ MinIO/Postgres) via Airflow; modular OOP design for swappable chunkers, embedders, and vector stores
  • API + UI: FastAPI /ask with cited answers (WIP), Gradio frontend, CI scaffolding; evals via Recall@k/MRR; VPS-ready deployment

πŸ“‹ Read the blog
πŸ’» Access the webapp


3️⃣ ClimaCast – ETL Weather Pipeline (Work in Progress)

Tech Stack: Apache Airflow, AWS S3, FastAPI, Pandas, Requests

  • Production-grade ETL for weather data
  • Orchestrated with Airflow DAGs, stored in S3 for scalability
  • Designed to showcase data engineering best practices

πŸ“¬ Connect with Me

LinkedIn
Email
Medium
Instagram


πŸ’‘ This profile is a growing showcase of my work in ML, Data Engineering, and GenAI. Each project follows industry-standard engineering practices, with a focus on scalability, modularity, and clarity.

Pinned Loading

  1. Research-AI Research-AI Public

    This project is an end-to-end Retrieval-Augmented Generation (RAG) system designed to ingest, parse, embed, and semantically search scientific papers from arXiv.

    Python 11

  2. Stream-Processor Stream-Processor Public

    Python 7

  3. ETL-Weather ETL-Weather Public

    Python

  4. Video-Screening-MVP Video-Screening-MVP Public

    Python