Skip to content
View soominmyung's full-sized avatar

Block or report soominmyung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
soominmyung/README.md

👋 Hi, I’m Soomin. Welcome to my GitHub!

Personal website: https://soominmyung.com


🤖 I build practical AI & ML systems that turn complex data into decisions.

I specialise in applied machine learning, deep learning architectures, and AI-driven workflow automation.
My work spans building Transformer-based models, time-series forecasting systems, and scalable ETL pipelines that support real operational decision-making.

I particularly enjoy transforming ambiguous business or behavioural problems into structured, ML-ready formulations — from signal engineering to modelling, evaluation, and deployment.


💼 What I Do as an Applied ML & Data Scientist

⚙️ Deep Learning & ML Modelling – Transformers, Siamese networks, CNNs, anomaly detection, preference learning, forecasting models.

📈 Behavioural & Time-Series Modelling – ARIMA/Prophet/Statsmodels, sequence modelling, pattern extraction, regime-aware signal engineering.

🤖 AI Workflow Automation – RAG prototypes (GPT, LangChain, FAISS), document summarisation, email drafting, automated reasoning flows.

Data Engineering Foundations – PySpark ETL, SQL Server pipelines, scheduling/automation, enterprise-scale data integration.

📊 Operational Analytics – BI dashboards, KPI systems, end-to-end reporting automation.

🌍 Scientific & Spatial Modelling (Legacy Work) – Hydrology, spatial interpolation, environmental modelling (QGIS, R).


📂 Featured Projects on GitHub

📌 Pinned

🧠 Siamese Transformer for Financial Preference Learning
: 8-layer Siamese Transformer modelling pairwise behavioural differences between financial time series. Achieved 0.81 test accuracy.

⚡ Stock ETL Pipeline (40M+ rows)
: PySpark ETL replacing Excel logs for daily stock history, integrating SAP 9.1/9.3 data and automating forecasting inputs.

🗄️ SAP B1 SQL Portfolio
: Anonymised SQL queries for inventory, sales, costing, and fraud detection built at Korea Foods.

📈 Multi-Vintage Time-Series Forecasting
: Automated pipeline for collecting, aligning, and forecasting revision-prone macroeconomic datasets.

🖼 CNN for CIFAR-10
: Convolutional model with hyperparameter tuning, achieving 73% → 81% accuracy.

🌍 Spatial & Environmental Modelling (GIS)
: Ecological and hydrological analysis using QGIS, R (terra, sf), Random Forest/SVM, ETo, cokriging, and Flood Modeller.


🗂etc.

🎲 Monopoly-SQL
: Full Monopoly game simulation using SQL triggers and procedures.

🚔 Police DB App
: PHP + MySQL web app with CRUD and role-based access.


⚙️ Core Skills

Deep Learning: PyTorch · Transformers · Siamese Networks · CNN · Attention Mechanisms
Machine Learning: scikit-learn · Statsmodels · ARIMA · Prophet · Forecasting · Anomaly Detection
AI Systems: RAG · LangChain · FAISS · GPT-based workflows · Retrieval pipelines
Data Engineering: PySpark · SQL Server (T-SQL) · ETL Pipelines · Automation (SQL Agent) · Parquet
Programming: Python · R · SQL · Version Control (Git)
Cloud & Infra: AWS (S3, SageMaker) · API Integration · FastAPI
Analytics & BI: Tableau · Power BI
Legacy Scientific Tools: QGIS · Spatial Modelling · Flood Modeller · Cokriging


📈 Focus Areas

Deep Learning · Applied ML · Behavioural Modelling · Sequence Modelling · AI Workflow Automation ·
Time-Series Forecasting · Anomaly Detection · Data Integration · ETL Engineering

Pinned Loading

  1. ETL_Stock ETL_Stock Public

    End-to-end ETL pipeline project using PySpark and Python — Excel to Parquet to SQL Server.

    Python

  2. CNN-CIFAR10 CNN-CIFAR10 Public

    Simple CNN for CIFAR-10 with hyperparameter tuning (epochs, batch size, kernel size, filters, layers, normalization).

    Jupyter Notebook

  3. Environmental-Modelling-and-Analysis Environmental-Modelling-and-Analysis Public

    Environmental modelling and analysis practical book

  4. Vacancy_time_series Vacancy_time_series Public

    Vintage time series forecasting

    Jupyter Notebook

  5. SAPb1-SQL-queries SAPb1-SQL-queries Public

    An anonymised collection of SAP Business One SQL queries originally developed and used at Korea Foods.

    TSQL

  6. Pairwise_Siamese_transformer Pairwise_Siamese_transformer Public

    Pairwise Preference Learning with Siamese Transformer Encoders

    Jupyter Notebook