The project is based on Deep Learning Neural Network. The code has been tested on Ubuntu 16.04 LTS.
- Python 3.3+ or Python 2.7
- macOS or Linux or Windows
- Python
- Keras
- Pandas
- Matplotlib
- Hdf5
- H5py
- Nltk.corpus
- re
- Pickle
- Numpy
- Glove Word2Vec
-
Place glove folder in extracted form in the same directory of codes.
-
Download quora dataset from “http://qim.ec.quoracdn.net/quora_duplicate_questions.tsv”. Place it in the same directory of codes in extracted form. Name of this dataset should be different from “train.csv” and “test.csv”
-
python make_test_train_data_from_given_quora_dataset <dataset_name.csv>
-
For LSTM with word embedding:
Training : python siamese_lstm_word.py train
Testing : python siamese_lstm_word.py test- For LSTM with char embedding:
python char_embedding.py
Training : python siamese_lstm_char.py train
Testing : python siamese_lstm_char.py test- For BiLSTM with word embedding:
Training : python siamese_BiLSTM_word.py train
Testing : python siamese_BiLSTM_word.py testThe documentation of the latest released version of Duplicate Question Pair Detection is available here.