Skip to content

Few Shot LLM Training of GPT2 Dialogue models to output empathetic responses

Notifications You must be signed in to change notification settings

derekn4/EmpatheticLLM

Repository files navigation


Virtual Empathy: The Illusion of Conscientiousness in Conversational LMs

Training a GPT2 model with conversational data to demonstrate empathetic dialogues.
Explore the docs »

Table of Contents
  1. About The Project
  2. DioloGPT Code: How it works
  3. trltoxic Code: How it works
  4. Contact

About The Project

Dataset Curation using Few-shot learning for Empathetic Conversational Agents. In this project, we aimed to build a conversational agent for mental health applications that is more “human-like” in its approach and has the following characteristics:

  • Is empathetic
  • Has a sense of morality
  • Is self-aware (Knows when to not respond)
  • Doesn’t generate triggering responses

Specifically, we graded the final model on these categories through Human Evaluations:

  • Natural Flow
  • Context Dependence
  • Topic Consistency
  • Speaker Consistency
  • Specificity
  • Interestingness

Our primary objective is to develop an empathetic conversational agent that is specifically tailored for self-care and emotional support settings.

(back to top)

Built With

  • Python

(back to top)

Libraries Used

  • HuggingFace
  • Transformers
  • Pytorch
  • Pandas
  • Sklearn
  • numpy

(back to top)

DioloGPT Code: How it works

  • Install all necessary libraries to run DialoGPT.ipynb

Data Processing

  • Dataset is pulled from local storage "FB_Multi_Train.csv"
  • Preprocessing of Data required:
  • Tokenization
  • End-of-Sentence Token addition
  • Flattening Conversations
  • Padding
  • Caching Features

Args Class

  • After importing the Transformers Library and various Pytorch imports, the Args() class is initialized
    • This class defines a set of parameters for configuring the training process.
    • Parameters include:
      • paths to model, tokenizer, and output directories
      • batch sizes
      • learning rates
      • gradient accumulation steps
      • number of epochs
      • and various other training hyperparameters.

Construct Conversations Function

  • "construct_conv" function:
    • This function takes a conversation row, a tokenizer, and an optional argument eos (end-of-sentence)
    • Encodes each utterance in the conversation using the tokenizer and appends an end-of-sentence token if "eos" is True.
    • Conversation is flattened into a single list of token IDs
    • Function returns the flattened list of token IDs representing the conversation.

ConversationDataset Class

  • This class inherits from "Dataset", which is a PyTorch class for representing datasets in PyTorch.
  • The "init" method initializes the dataset.
    • Takes parameters including a tokenizer, args (training arguments), df (Dataframe containing conversation data), and an optional "block_size" (sets the maximum length of the sequence)

    • Checks if cached features exist and loads them if overwrite_cache parameter is set to False

      • Otherwise, it creates features from the dataset and saves them.
    • Constructs examples from the dataset by iterating over each row in the DataFrame.

    • Encodes the conversation using the "construct_conv" function and adds it to the examples list if its length is less than block_size.

    • len method returns the total number of examples in the dataset.

    • getitem method retrieves an item from the dataset. It returns a PyTorch tensor containing the token IDs of the conversation at index item.

Train Function

  • This function is responsible for training the model.
  • Initializes a TensorBoard writer for logging
  • Sets up training batch size and collation function for the DataLoader
  • Calculates total number of optimization steps based on the number of training examples, gradient accumulation steps, and number of epochs.
  • Initializes optimizer and scheduler for learning rate scheduling
    • loads optimizer and scheduler states if they already exist
  • Initializes mixed precision training if "args.fp16" is enabled.
  • Sets up multi-GPU and distributed training if multiple GPUs are available.
  • Iterates through epochs and batches, calculates loss, performs backpropagation, and updates model parameters.
  • Logs training progress, evaluates the model periodically, and saves checkpoints.
  • Manages the maximum number of steps for training.
  • Closes the TensorBoard writer.

Evalute Function

  • This function evaluates the model performance on a validation dataset.
  • Sets up the evaluation batch size and collation function for the DataLoader.
  • Initializes a DataLoader for the evaluation dataset.
  • Performs evaluation by iterating through batches, calculating loss, and accumulating evaluation metrics.
  • Computes the perplexity metric based on the evaluation loss.
  • Logs evaluation results and writes them to an output file.
  • Returns the evaluation results as a dictionary.

(back to top)

trltoxic Code: How it works

This script performs fine-tuning of a language model using the Proximal Policy Optimization (PPO) algorithm to generate less toxic text. Hence, "trl" for Transformer Reinforcement Learning.

Below are some key steps and components of the script:

Script Arguments and Configuration

  • The script uses dataclass to define script arguments such as the model name, learning rate, mini-batch size, etc.
  • It uses HfArgumentParser to parse the arguments and configure the PPO training.

Dataset Building

  • The build_dataset function is defined to prepare the dataset for training.
  • It loads the data from a CSV file, tokenizes it, filters out short samples, and splits it into training and validation sets.

Model Initialization

  • The script loads a pretrained language model for causal language modeling (LM).
  • It then creates a value head for the LM using AutoModelForCausalLMWithValueHead.

PPO Trainer Initialization

  • It initializes a PPOTrainer object, which orchestrates the PPO training process.
  • This includes setting up the model, reference model, tokenizer, optimizer, and dataset.

Reward Pipeline Setup

  • The script loads a toxicity detection model (RoBERTa) and tokenizer.
  • It defines the generation arguments and output length sampler.

PPO Training Loop and Model saving

  • Inside the training loop, it iterates over the dataset and generates responses using the policy model.
  • Sentiment scores (toxicity labels) are computed for the generated responses using the toxicity model.
  • PPO steps are performed to optimize the policy based on the generated responses and rewards.
  • Training statistics are logged, and the model is periodically saved during training.
  • After training, the script saves the trained PPO model.

Contact

Derek Nguyen

(back to top)

About

Few Shot LLM Training of GPT2 Dialogue models to output empathetic responses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors