Deep Learning – Attention Mechanisms & OCR (CNN + Transformer)

Repo lưu báo cáo + notebook + hướng dẫn chạy cho đồ án Deep Learning (2025), gồm 2 phần chính:

Attention trong LLMs: Self-Attention, Flash-Attention (simplified), Linear Attention, Sparse Attention (kèm code minh hoạ).
OCR: Nhận diện văn bản từ ảnh bằng CNN (ResNet34 backbone) + Transformer Decoder (kèm Spatial Attention, đánh giá BLEU).

Mục lục

Tổng quan
Cấu trúc repo
Yêu cầu môi trường
Dataset
Hướng dẫn chạy
Chi tiết từng phần
- Câu 1 – Attention (LLMs)
- Câu 2 – OCR (ResNet + Transformer Decoder)
Ghi chú tái lập kết quả
Nhóm thực hiện

Tổng quan

Report: trình bày lý thuyết + phân tích ưu/nhược điểm Attention; và mô hình OCR CNN + Transformer-Decoder.
Notebook:
- Cau1.ipynb: demo & so sánh attention mechanisms (ma trận attention, shape output, ví dụ input cố định/ngẫu nhiên).
- Cau2.ipynb: pipeline OCR (data preprocessing → model → train/eval), có tính BLEU score.

Cấu trúc repo

Khuyến nghị tổ chức lại khi public GitHub:

.
├─ README.md
├─ requirements.txt
├─ .gitignore
├─ report/
│  └─ 52200206_52200214_52200216.pdf
├─ notebooks/
│  ├─ cau1_attention_mechanisms.ipynb
│  └─ cau2_ocr_resnet_transformer.ipynb
└─ data/
   └─ Dataset.txt   # link tải dataset (Google Drive)

Bạn có thể giữ nguyên tên file, nhưng rename như trên sẽ “clean” hơn khi đưa lên GitHub.

Yêu cầu môi trường

Python 3.10+ (khuyến nghị)
PyTorch + TorchVision
Các thư viện xử lý ảnh, NLP metric, v.v.

Cài đặt:

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

pip install -r requirements.txt

Dataset

Trong Dataset.txt có link Google Drive chứa dữ liệu OCR.

Trong notebook OCR hiện đang dùng đường dẫn kiểu Kaggle (ví dụ: /kaggle/input/...). Khi chạy local, bạn cần:

Tải dataset về máy (theo link trong Dataset.txt)
Sửa lại biến đường dẫn trong Cau2.ipynb cho khớp với thư mục local, ví dụ:
- csv_path = "data/ocr/mcocr_train_df.csv"
- images_dir = "data/ocr/train_images/"

Lưu ý: vì dataset khá lớn, không nên commit ảnh lên GitHub. Nên để dataset ở Drive/Kaggle và hướng dẫn tải.

Hướng dẫn chạy

Mở notebook:

jupyter lab

Chạy theo thứ tự:

notebooks/cau1_attention_mechanisms.ipynb
notebooks/cau2_ocr_resnet_transformer.ipynb

Chi tiết từng phần

Câu 1 – Attention (LLMs)

Mục tiêu: minh hoạ và so sánh các cơ chế Attention phổ biến:

Self-Attention: Q/K/V → softmax(QKᵀ/√d) → weighted sum(V)
FlashAttention (simplified): tối ưu memory/compute bằng block-wise attention (bản notebook là mô phỏng ý tưởng)
Linear Attention: xấp xỉ giúp giảm độ phức tạp (thường từ O(n²) xuống ~O(n))
Sparse Attention: chỉ attend trên một phần token (window/strided/selected)

Output: in ma trận attention (head 0) và shape đầu ra để kiểm chứng.

Câu 2 – OCR (ResNet + Transformer Decoder)

Bài toán: nhận diện chuỗi ký tự từ ảnh văn bản (sequence generation).

Ý tưởng:

CNN backbone (ResNet34) trích xuất feature map từ ảnh
Spatial Attention làm nổi bật vùng chứa chữ
Transformer Decoder sinh chuỗi ký tự theo thời gian (teacher forcing với token <start>/<end>/<pad>/<unk>)

Tiền xử lý (theo notebook):

Resize ảnh về (32, 128)
Augmentation: random rotation + color jitter (tuỳ cấu hình)
Encode label theo từng ký tự (character-level vocab)

Đánh giá:

Tính BLEU score giữa predicted text và ground-truth text
Có biểu đồ loss/metrics (tuỳ cell)

Ghi chú tái lập kết quả

Nếu bạn chạy trên CPU: training OCR sẽ chậm hơn nhiều so với GPU.

Với nltk, đôi khi cần tải tokenizer data:

python -c "import nltk; nltk.download('punkt')"

Kết quả có thể dao động nhẹ do random seed & augmentation.

Nhóm thực hiện

52200206
52200214 – Trần Hồ Hoàng Vũ
52200216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning – Attention Mechanisms & OCR (CNN + Transformer)

Mục lục

Tổng quan

Cấu trúc repo

Yêu cầu môi trường

Dataset

Hướng dẫn chạy

Chi tiết từng phần

Câu 1 – Attention (LLMs)

Câu 2 – OCR (ResNet + Transformer Decoder)

Ghi chú tái lập kết quả

Nhóm thực hiện

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
report		report
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

tranhohoangvu/Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Deep Learning – Attention Mechanisms & OCR (CNN + Transformer)

Mục lục

Tổng quan

Cấu trúc repo

Yêu cầu môi trường

Dataset

Hướng dẫn chạy

Chi tiết từng phần

Câu 1 – Attention (LLMs)

Câu 2 – OCR (ResNet + Transformer Decoder)

Ghi chú tái lập kết quả

Nhóm thực hiện

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages