GenAI for Software Development (N-gram Model)

1 Introduction
2 Getting Started
3 Report

1. Introduction

This project explores code completion in Java, leveraging N-gram language modeling. The N-gram model predicts the next token in a sequence by learning the probability distributions of token occurrences in training data. The model selects the most probable token based on learned patterns, making it a fundamental technique in natural language processing and software engineering automation.

The extracted data and the training, testing, and evaluating sets are pre-generated in this repository. The code used for the dataset collection and data splitting is still available in ngram.py.

2. Getting Started

This project is implemented in Python 3.9+ and is compatible with macOS, Linux, and Windows.

2.1 Preparations

(1) Clone the repository to your workspace:

~ $ git clone https://github.com/jdkuffa/cs420-assignment1.git


(2) Navigate into the repository:

~ $ cd cs420-assignment1
~/cs420-assignment1 $

(3) Set up a virtual environment and activate it:

For macOS/Linux:

~/cs420-assignment1 $ python -m venv ./venv/
~/cs420-assignment1 $ source venv/bin/activate
(venv) ~/cs420-assignment1 $ 


To deactivate the virtual environment, use the command:


(venv) $ deactivate

2.2 Install Packages

Install the required dependencies:

(venv) ~/cs420-assignment1 $ pip install -r requirements.txt

2.3 Run N-gram

(1) Run N-gram Demo

This script creates a new N-gram model using the corpus provided and selects the best-performing model based on our eval set. It then evaluates the model on the same eval set and generate the JSON output results_teacher_model.json.

(venv) ~/cs420-assignment1 $ python ngram.py corpus.txt

3. Report

The assignment report is available in the file "Assignment_Report.pdf."

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
__pycache__		__pycache__
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
Assignment_Report.pdf		Assignment_Report.pdf
LICENSE		LICENSE
README.md		README.md
ngram.py		ngram.py
requirements.txt		requirements.txt
results_student_model.json		results_student_model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI for Software Development (N-gram Model)

1. Introduction

2. Getting Started

2.1 Preparations

2.2 Install Packages

2.3 Run N-gram

3. Report

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

gkdot/cs420-assignment1

Folders and files

Latest commit

History

Repository files navigation

GenAI for Software Development (N-gram Model)

1. Introduction

2. Getting Started

2.1 Preparations

2.2 Install Packages

2.3 Run N-gram

3. Report

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages