Skip to content

dionvou/squad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📌 SQuAD

A short project on SQuaD v2.0 Dataset.

banner

🚀 Features

  • ✨ Trained on SQuAD v2.0 dataset
  • 🔧 Question Answering with RoBERTa / BERT / Albert-v2
  • 📦 Streamlit-based interactive QA demo

📚 Table of Contents


📝 About

This project demonstrates a Question Answering (QA) system trained on the SQuAD v2.0 dataset.
It can answer questions based on user-provided text and identify when no answer exists within the paragraph.

The project includes:

  • A fine-tuned QA model
  • A Streamlit web interface
  • Easy-to-use inference pipeline

🎥 Demo

Click for Demo.

Demo GIF

🛠 Installation

1. Clone the Repository

git clone https://github.com/dionvou/squad.git cd squad

2. Download SQuAD Dataset

SQuAD v2.0 can be downloaded at:
🔗 https://rajpurkar.github.io/SQuAD-explorer/

Or, use the provided script included in the repository:

Make sure the script is executable

chmod +x download.sh

Run the script to download the dataset

./download.sh


📊 SQuAD v2.0 Statistics

The SQuAD v2.0 dataset contains a mixture of answerable and unanswerable questions, which makes it more challenging than v1.1.

  • Training set: 130,319 questions
    • Answerable: 86,821 (~67%)
    • Unanswerable: 43,498 (~33%)
  • Development set: 11,873 questions
    • Answerable: 5n928 (~50%)
    • Unanswerable: 5,945 (~50%)

This dataset introduces unanswerable questions to train models to identify when no answer exists. To handle the varying lengths of contexts, questions, and answers during tokenization, we analyzed the distributions of token lengths using a BERT tokenizer.

Token Length Distributions

The plot shows the number of BERT tokens for:

  • Context: The full paragraph
  • Question: Each question text
  • Answer: Each answer span

🛠 Preprocessing & Data Handling

Tokenization & Overflow

  • Used Hugging Face tokenizers with return_overflowing_tokens=True to handle long contexts
  • Split long paragraphs into smaller chunks to avoid truncation

Labeling Strategy

  • Initially, all non-answerable chunks were labeled as 0 (impossible)
  • This led to a high imbalance and caused the model to overfit on predicting zeros
  • To fix this:
    • Removed answerable question parts that did not contain answers after split
    • Kept only answerable portions for training
  • Result: better balance between answerable and unanswerable examples and reduced model collapse

Model

We use base bert, roberta, albert, distill-bert and spanbert for testing.

Experiments

To evaluate our models and select the best-performing architecture, we conducted a series of controlled experiments.
Due to time constraints, all evaluations were performed on a development split created from the original training set, using an 80% / 20% train–validation split.
After determining the strongest model, we retrained it on the full SQuAD training dataset.

All experiments employed early stopping with a patience of 3 epochs to prevent overfitting and reduce training time. The models were trained using a learning rate of 1e-5, a batch size of 64, a maximum sequence length of 384 tokens, and a document stride of 128.

Training curves

The results follow the trends observed in previous literature:
ALBERT consistently achieves the highest validation EM and F1 scores among the tested models.
Based on this outcome, we select ALBERT as the architecture for full fine-tuning on the complete dataset.

🔭 Next Steps

In future work, we aim to further enhance the performance of our QA system by implementing techniques from the research paper “Retrospective Reader for Machine Reading Comprehension” (arXiv:2001.09694). This method introduces a retrospective reader architecture that revisits previously attended context during prediction, improving comprehension of long passages and producing more accurate answer spans. By incorporating this approach, we hope to achieve state-of-the-art performance on SQuAD v2.0.

Additionally, we plan to explore model ensembling to boost overall accuracy.

📄 References

  • Know What You Don't Know: Unanswerable Questions for SQuAD
    Rajpurkar et al., 2018 – Introduces unanswerable questions in SQuAD 2.0 to improve model robustness.

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Devlin et al., 2019 – Introduces BERT, a deeply bidirectional transformer for language understanding tasks.

  • Question Answering on SQuAD 2.0: BERT Is All You Need
    Schwager et al., 2019 – Explores using BERT for SQuAD 2.0 and shows strong QA performance.

  • Really Paying Attention: A BERT + BiDAF Ensemble Model for Question Answering
    Yin et al., 2019 – Combines BERT with BiDAF in an ensemble to enhance QA accuracy on SQuAD. Ensemble Model for Question‑Answering`

About

A short project on SQuaD v2.0 Dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published