Personal website: https://soominmyung.com
I specialise in applied machine learning, deep learning architectures, and AI-driven workflow automation.
My work spans building Transformer-based models, time-series forecasting systems, and scalable ETL pipelines that support real operational decision-making.
I particularly enjoy transforming ambiguous business or behavioural problems into structured, ML-ready formulations — from signal engineering to modelling, evaluation, and deployment.
⚙️ Deep Learning & ML Modelling – Transformers, Siamese networks, CNNs, anomaly detection, preference learning, forecasting models.
📈 Behavioural & Time-Series Modelling – ARIMA/Prophet/Statsmodels, sequence modelling, pattern extraction, regime-aware signal engineering.
🤖 AI Workflow Automation – RAG prototypes (GPT, LangChain, FAISS), document summarisation, email drafting, automated reasoning flows.
⚡ Data Engineering Foundations – PySpark ETL, SQL Server pipelines, scheduling/automation, enterprise-scale data integration.
📊 Operational Analytics – BI dashboards, KPI systems, end-to-end reporting automation.
🌍 Scientific & Spatial Modelling (Legacy Work) – Hydrology, spatial interpolation, environmental modelling (QGIS, R).
🧠 Siamese Transformer for Financial Preference Learning
: 8-layer Siamese Transformer modelling pairwise behavioural differences between financial time series. Achieved 0.81 test accuracy.
⚡ Stock ETL Pipeline (40M+ rows)
: PySpark ETL replacing Excel logs for daily stock history, integrating SAP 9.1/9.3 data and automating forecasting inputs.
🗄️ SAP B1 SQL Portfolio
: Anonymised SQL queries for inventory, sales, costing, and fraud detection built at Korea Foods.
📈 Multi-Vintage Time-Series Forecasting
: Automated pipeline for collecting, aligning, and forecasting revision-prone macroeconomic datasets.
🖼 CNN for CIFAR-10
: Convolutional model with hyperparameter tuning, achieving 73% → 81% accuracy.
🌍 Spatial & Environmental Modelling (GIS)
: Ecological and hydrological analysis using QGIS, R (terra, sf), Random Forest/SVM, ETo, cokriging, and Flood Modeller.
🎲 Monopoly-SQL
: Full Monopoly game simulation using SQL triggers and procedures.
🚔 Police DB App
: PHP + MySQL web app with CRUD and role-based access.
Deep Learning: PyTorch · Transformers · Siamese Networks · CNN · Attention Mechanisms
Machine Learning: scikit-learn · Statsmodels · ARIMA · Prophet · Forecasting · Anomaly Detection
AI Systems: RAG · LangChain · FAISS · GPT-based workflows · Retrieval pipelines
Data Engineering: PySpark · SQL Server (T-SQL) · ETL Pipelines · Automation (SQL Agent) · Parquet
Programming: Python · R · SQL · Version Control (Git)
Cloud & Infra: AWS (S3, SageMaker) · API Integration · FastAPI
Analytics & BI: Tableau · Power BI
Legacy Scientific Tools: QGIS · Spatial Modelling · Flood Modeller · Cokriging
Deep Learning · Applied ML · Behavioural Modelling · Sequence Modelling · AI Workflow Automation ·
Time-Series Forecasting · Anomaly Detection · Data Integration · ETL Engineering