CS420 Assignment 3: Prompt Engineering For In-Context Learning

1. Introduction

An analysis of various prompting strategies using two large language models: GPT-4 and Mistral Large. To evaluate the quality of the outputs, a third model (Llama 4 Maverick) is used as a judge, providing comparative assessments and insights into model behavior and performance.

2. Getting Started

This project is implemented in Python 3.9+ and is compatible with macOS, Linux, and Windows.

2.1 Preparations

Clone the repository to your workspace:

git clone https://github.com/jdkuffa/cs420-assignment3.git

Navigate into the repository:

cd cs420-assignment3

Set up a virtual environment and activate it:

For macOS/Linux:

python -m venv ./venv/

source venv/bin/activate

For Windows:

Install virtualenv:

pip install virtualenv

Create a virtual environment:

python -m virtualenv venv

Activate the environment

venv\Scripts\activate

The name of your virtual environment should now appear within parentheses just before your commands.

To deactivate the virtual environment, use the command:

deactivate

2.2 Install Packages

Install the required dependencies:

pip install -r requirements.txt

2.3 Run Program

Add GitHub token

Create a file named token.txt and go to this link to make a fine-grained PAT to add to this file.

You can do this manually or from the command line as shown below (feel free to replace nano with your preferred editor):

touch token.txt && nano token.txt

Run data_automation.py

To process the incoming data.csv file containing prompts and problems.

python3 data_automation.py

Run judge_model.py

To add a column to the output database containing the judge model's analyses.

python3 judge_model.py

Run evaluation_metrics.py

To add a column to the output database containing the exact match, BLEU, or embedding-based similarity scores.

python3 evaluation_metrics.py

3. Report

The assignment report is available in the root directory, labelled as "analysis-report.pdf".

4. Extra Credit

For the extra credit, we included Llama 4 Maverick 17B 128E Instruct FP8 as a judge model.

We wrote the judge_model.py script to output the resulting comparison and analysis from the model to the output_db.csv under the column "Output Model 3: Meta Llama 4 Maverick."

The metrics have also been added to the analyses sections of the report under each task's table for each prompt.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis-report.md		analysis-report.md
analysis-report.pdf		analysis-report.pdf
data_automation.py		data_automation.py
evaluation_metrics.py		evaluation_metrics.py
judge_model.py		judge_model.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS420 Assignment 3: Prompt Engineering For In-Context Learning

1. Introduction

2. Getting Started

2.1 Preparations

For macOS/Linux:

For Windows:

2.2 Install Packages

2.3 Run Program

3. Report

4. Extra Credit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

gkdot/cs420-assignment3

Folders and files

Latest commit

History

Repository files navigation

CS420 Assignment 3: Prompt Engineering For In-Context Learning

1. Introduction

2. Getting Started

2.1 Preparations

For macOS/Linux:

For Windows:

2.2 Install Packages

2.3 Run Program

3. Report

4. Extra Credit

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages