Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Project files for pattern recognition group assignment

Notifications You must be signed in to change notification settings

fqixiang/PatternRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PatternRecognition

Project files for pattern recognition group assignment

Files

Currently contains the following files:

  1. data/WikiEssentials_L4.7z: output file of the WikiVitalArticles program. Each document is included in its entirety (but split by paragraph).
  2. preprocess_utils.py: preprocessing functions for Wiki data.
  3. model_utils.py: various utility functions used for modeling (e.g. loading embeddings).
  4. 1_preprocess_raw_data.py: preprocessing of raw input data. Currently shortens each article to first 8 sentences.
  5. 2_baseline_model.py: tokenization, vectorization of input data and baseline model (1-layer NN with softmax classifier).

Setup

  1. Download and install Anaconda Python 3
  2. Download latest version of Rstudio. Need this to run python scripts in Rstudio.
  3. In a terminal, go to this repository's folder and set up the Conda environment
conda env create -f environment.yml
  1. Install PyTorch with cuda 9.2 support
conda activate VitalWikiClassifier
conda install pytorch torchvision cudatoolkit=9.2 -c pytorch -c defaults -c numba/label/dev
  1. In R, install the reticulate library:
install.packages("reticulate")
  1. Check the .Rprofile file to ensure that R knows where to find your anaconda distribution.

About

Project files for pattern recognition group assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •