Skip to content

This is a repository containing code for analyzing educational texts in Northern Ireland. CO-PI: Dr. Jing Xu, University of Washington. Code Author: June Yang, University of Washington

Notifications You must be signed in to change notification settings

jyang32/Northern_Ireland_Education_Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Northern Ireland Education Text Analysis

This project performs comprehensive analysis of educational texts from Option2 and Option1 perspectives in Northern Ireland, including content analysis, topic modeling with BERTopic, and sentiment analysis using OpenAI. The analysis compares content across different document types including textbooks, policy documents, and teacher interviews.

Project Structure

Northern_Ireland_Education_Text/
β”œβ”€β”€ README.md
β”œβ”€β”€ environment.yml
β”œβ”€β”€ environment_setup.md
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ utils.py
β”‚   β”œβ”€β”€ file_reader.py
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ preprocess.py
β”‚   β”œβ”€β”€ stopwords.txt
β”‚   └── phrases.txt
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ BERTopic.ipynb
β”‚   β”œβ”€β”€ descriptives.ipynb
β”‚   β”œβ”€β”€ descriptives_no_interview.ipynb
β”‚   └── sentiment.ipynb
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ metadata.csv
β”‚   └── strand1/
β”‚       β”œβ”€β”€ both/
β”‚       β”‚   β”œβ”€β”€ GCSE History (2017)-specification-Standard.docx
β”‚       β”‚   └── Reconciled_interviews/
β”‚       β”‚       β”œβ”€β”€ TeacherF_reconciled.docx
β”‚       β”‚       └── TeacherI_reconciled.docx
β”‚       β”œβ”€β”€ option1/
β”‚       β”‚   β”œβ”€β”€ GCSE Planning Framework History Unit 1 Section B Option 1.docx
β”‚       β”‚   β”œβ”€β”€ Madden (2007) History for CCEA GCSE Revision Guide - Chapter 3 (Peace, War and Neutrality).docx
β”‚       β”‚   β”œβ”€β”€ Madden (2009) History for CCEA GCSE Second Edition - Chapter 2 (Peace, War and Neutrality) (1).docx
β”‚       β”‚   β”œβ”€β”€ Madden (2011) ccea revision guide chp 2 peace war neutrality.docx
β”‚       β”‚   β”œβ”€β”€ Madden and McBride History for CCEA GCSE Chapter 2.docx
β”‚       β”‚   β”œβ”€β”€ option1_combined all.docx
β”‚       β”‚   β”œβ”€β”€ TeacherC_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherD_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherE_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherH_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherL_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherN_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherO_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherP_reconciled.docx
β”‚       β”‚   β”œβ”€β”€ TeacherQ_reconciled.docx
β”‚       β”‚   └── Updated Johnston&Johnston no textboxes.docx
β”‚       └── option2/
β”‚           β”œβ”€β”€ Doherty (2001) Northern Ireland since c.1960 .docx
β”‚           β”œβ”€β”€ GCSE Planning Framework History Unit 1 Section B Option 2.docx
β”‚           β”œβ”€β”€ Madden (2007) History for CCEA GCSE Revision Guide - Chapter 4 (Changing Relationships) (2).docx
β”‚           β”œβ”€β”€ Madden (2009) History for CCEA GCSE Second Edition - Chapter 3 (Changing Relationships).docx
β”‚           β”œβ”€β”€ Madden (2011) CCEA revision guide Chp 3. Changing Relationships.docx
β”‚           β”œβ”€β”€ Madden and McBride History for CCEA GCSE Chapter 3.docx
β”‚           β”œβ”€β”€ option2_combined all.docx
β”‚           β”œβ”€β”€ TeacherA_reconciled.docx
β”‚           β”œβ”€β”€ TeacherB_reconciled.docx
β”‚           β”œβ”€β”€ TeacherG_reconciled.docx
β”‚           β”œβ”€β”€ TeacherJ_reconciled.docx
β”‚           β”œβ”€β”€ TeacherK_reconciled.docx
β”‚           β”œβ”€β”€ TeacherM_reconciled.docx
β”‚           └── Updated Doherty (2001) no textboxes.docx
β”œβ”€β”€ outputs/
β”‚   β”œβ”€β”€ processed_text_data.csv
β”‚   β”œβ”€β”€ cleaned_text_data.csv
β”‚   └── analysis_results/
β”‚       β”œβ”€β”€ models/
β”‚       β”œβ”€β”€ raw/
β”‚       β”œβ”€β”€ no_interview/
β”‚       β”œβ”€β”€ reduce_outlier/
β”‚       β”œβ”€β”€ all_visuals/
β”‚       β”œβ”€β”€ option1_sentiment_analysis_fixed/
β”‚       └── option2_sentiment_analysis_fixed/
  • option2/: All Option2 perspective documents (textbooks, teacher interviews, etc.)
  • option1/: All Option1 perspective documents (textbooks, teacher interviews, etc.)
  • both/: All shared/interview/policy documents (e.g., reconciled teacher interviews, policy docs)

Document Types

  • Textbooks: Educational materials by Madden, Doherty, Johnston
  • Policy Documents: GCSE Planning Frameworks and specifications
  • Combined Resources: Comprehensive resource collections
  • Teacher Interviews: Teacher interview transcripts (can be under option2/, option1/, or both/)

Quick Start for New Users

1. Environment Setup

This project uses conda for dependency management. Follow these steps to set up your environment:

# Clone or download the project
cd Northern_Ireland_Education_Text

# Create the conda environment (recommended)
conda env create -f environment.yml

# Activate the environment
conda activate bertopic_env

2. OpenAI API Setup (for Sentiment Analysis)

To use sentiment analysis features, you'll need an OpenAI API key:

  1. Get an API key from OpenAI
  2. Create a .env file in the project root:
# In your .env file
OPENAI_API_KEY=your_actual_api_key_here

3. Running the Analysis Pipeline

Step 1: Process Raw Data

# Read and process raw documents
python -m scripts.main

# Clean and preprocess text
python scripts/preprocess.py

Step 2: Descriptive Analysis

# Generate descriptive statistics
jupyter notebook analysis/descriptives.ipynb

Step 3: Topic Modeling

# Run Jupyter notebook for topic analysis
jupyter notebook analysis/BERTopic.ipynb

Step 4: Sentiment Analysis

# Run sentiment analysis on key terms
jupyter notebook analysis/sentiment.ipynb

4. Output Files

Your analysis will generate:

  • outputs/processed_text_data.csv - Processed document data
  • outputs/cleaned_text_data.csv - Cleaned text for analysis
  • outputs/analysis_results/ - Topic modeling results and visualizations; Sentiment analysis results

URL Processing with AI Fallback

The pipeline includes URL processing capabilities for combined documents:

  • Raw Content Fetching: Uses enhanced web scraping to fetch live content from URLs
  • AI Knowledge-Based Fallback: When raw fetching fails, uses OpenAI to generate summaries based on training data

Configuration

URL processing can be configured in scripts/config.py:

# URL processing parameters
FETCH_URLS = False  # Set this to False to skip all URL processing
MAX_URL_CHARS = 8000  # Maximum characters to extract from each URL
URL_TIMEOUT = 15  # Timeout for URL requests in seconds

# OpenAI fallback parameters
USE_OPENAI_FALLBACK = False  # Set this to False to disable AI completely
OPENAI_MODEL = "gpt-4o-mini"  # OpenAI model to use for summarization
# OpenAI API key will be loaded from .env file or environment variable
MAX_AI_SUMMARY_CHARS = 2000  # Maximum characters for AI-generated summaries

AI Fallback Setup

To use the AI fallback functionality:

  1. Install the required libraries:
pip install openai python-dotenv
  1. Set your OpenAI API key in the .env file:
# In your .env file
OPENAI_API_KEY=your-api-key-here
  1. The system will automatically:
    • Try to fetch raw content from URLs using enhanced web scraping
    • If raw fetching fails, use OpenAI to generate knowledge-based summaries
    • Focus on Northern Ireland education and history relevance
    • Provide summaries based on AI's training data about the domain

About

This is a repository containing code for analyzing educational texts in Northern Ireland. CO-PI: Dr. Jing Xu, University of Washington. Code Author: June Yang, University of Washington

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •