Skip to content

Using language models to generate additional training examples that improve sentiment classifier performance in low-resource scenarios.

License

Notifications You must be signed in to change notification settings

paramkpr/SentiSynth

Repository files navigation

SentiSynth: Synthetic Data Generation for Sentiment Analysis

SentiSynth explores how synthetic data can improve sentiment analysis models when labeled data is scarce.

Installation

# Clone the repository
git clone https://github.com/yourusername/sentisynth.git
cd sentisynth

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .
pip install -r requirements-dev.txt

Usage

More details coming soon!

Project Structure

  • sentisynth/: Main package
    • data/: Data loading and processing
    • models/: Model implementations
    • generation/: Synthetic data generation
    • evaluation/: Evaluation metrics and analysis
  • tests/: Unit tests
  • notebooks/: Jupyter notebooks for exploration
  • scripts/: Utility scripts

Training

To run on weftdrive:

  nohup /srv/gpurun.pl python src/senti_synth/cli/01_train_teacher.py configs/teacher/stt2_hf.yaml > ~/scratch/senti_synth/logs/$(date +%Y%m%d_%H%M).log 2>&1 &

Setting up on weftdrive

  1. SSH into weftdrive: ssh paramkapur@weftdrive.private.reed.edu
  2. Git clone the repository: git clone https://github.com/paramkpr/senti_synth.git
  3. Setup the conda environment /srv/conda/bin/conda init and source ~/.bashrc
  4. Enter the conda environment conda activate deep-learning
    1. Check what packages are installed conda list
    2. Install the packages for the project pip install -r requirements.txt
    3. Install the project pip install -e .
  5. SCP data/clean to weftdrive:~/scratch/data/clean: scp -r data/clean paramkapur@weftdrive.private.reed.edu:~/scratch/paramkapur/data/clean
    1. Ensure that the config file points to the correct path: dataset_path: "~/scratch/data/clean"
  6. Setup W&B:
    1. export WANDB_API_KEY="..."
    2. python -m wandb login
  7. Create the logs directory and file: mkdir -p ~/scratch/paramkapur/logs and touch ~/scratch/paramkapur/logs/$(date +%Y%m%d_%H%M).log
  8. Run the training script: nohup /srv/gpurun.pl python src/cli/01_train_teacher.py configs/teacher/sst2_hf.yaml > ~/scratch/paramkapur/logs/$(date +%Y%m%d_%H%M).log 2>&1 &

/scratch/paramkapur/data/clean/clean

nohup /srv/gpurun.pl python src/cli/01_train_teacher.py configs/teacher/sst2_hf.yaml > ~/scratch/paramkapur/logs/$(date +%Y%m%d_%H%M).log 2>&1 &

About

Using language models to generate additional training examples that improve sentiment classifier performance in low-resource scenarios.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published