Insurance QA data formatted as Python objects and pickled.
Clone locally
git clone https://github.com/codekansas/insurance_qa_python.git
cd insurance_qa_python
pwd # where files are storedGetting QA format with the files
import pickle
def get_pickle(filename):
return pickle.load(open(filename, 'rb'))
vocab = get_pickle('vocabulary')
def translate_sent(sent):
return [vocab[word] for word in sent]
dev = get_pickle('dev')
answers = get_pickle('answers')
def get_answer(answer_id):
return translate_sent(answers[answer_id])
for data_item in dev:
for bad_answer in data_item['bad']:
print('Question:', translate_sent(data_item['question']))
print('Good Answer:', get_answer(data_item['good'][0]))
print('Bad Answer: ', get_answer(bad_answer), '\n============')
About files:
vocabulary:dictobject of(word index <int> -> word <str>)relationshipsanswers:dictobject of(answer index <int> -> word indices <list of ints>)relationshipstrain:listofdict(onedictper entry), where eachdicthas:question: the word indices for the questionanswers: the answer indices for each of the question's ground truth
dev / test1 / test2:listofdict(onedictper entry), where eachdicthas:question: the word indices for the questiongood: the ground truthbad: the other answers from the dataset
Applying Deep Learning to Answer Selection: A Study and An Open Task
Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou ASRU 2015