Social Computing Assignment 2

All these files are named as per the guidelines provided in the Assignment. Following are the details of various files and the models they are using :-

------------------- Intent Recognition --------------------

Initial Dataset setup logic is same in almost all the files. So, some initial codes of these files are almost same. For details, please visit the code.

ir_svm_en.ipynb : English, TF-IDF Vectorization, SVM

ir_rob_en.ipynb : English, "roberta-base" as model checkpoint

ir_bcbert_en.ipynb : English, "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

ir_xlmr_hi.ipynb : Hindi, "xlm-roberta-base" as model checkpoint

ir_dtrans_hi.ipynb : Hindi, GoogleTranslator(Hindi to English), "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

ir_bridge_hi.ipynb : Bengali, GoogleTranslator(Bengali to Hindi), GoogleTranslator(Hindi to English), "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

------------------- Entity Extraction --------------------

For entity extraction, matching of below tags were done through fuzzywuzzy library's threshold string matching by calculating a fuzzy ratio:

label__ = {
    'O': 0,
    'B-treatment': 1,
    'I-treatment': 2,
    'B-disease': 3,
    'I-disease': 4,
    'B-drug': 5,
    'I-drug': 6
}

Initial Dataset setup logic is same in almost all the files. So, some initial codes of these files are almost same. For details, please visit the code.

ee_svm_en.ipynb : English, TF-IDF Vectorization, SVM

ee_rob_en.ipynb : English, "roberta-base" as model checkpoint

ee_bcbert_en.ipynb : English, "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

ee_xlmr_hi.ipynb : Hindi, "xlm-roberta-base" as model checkpoint

ee_dtrans_hi.ipynb : Hindi, GoogleTranslator(Hindi to English), "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

ee_bridge_hi.ipynb : Bengali, GoogleTranslator(Bengali to Hindi), GoogleTranslator(Hindi to English), "emilyalsentzer/Bio_ClinicalBERT" as model checkpoint

Running these codes

Keep the dataset "indic-health-demo" in the parent folder only.
Open Jupyter notebook and start running the code one by one.
Try to run the code in one go itself because the dataset table values [DataFrame] get updated at run time which might cause some issues.

Some common errors that might occur while running the code

Uninstalled libraries. I have added 'pip install' in almost all files to install required libraries. If not present just install the missing library.
Cuda memory limit exceed problem. If this problem persists, use CPU instead of GPU by just hard coding device = "cpu" in the code.
Dataset must be present within the same folder as the files. Use the dataset I have kept here itself "indic-health-demo".
To run it on Google Colab corresponding changes to dataset path must be done.
If facing any-other problem while running the code, please contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Social Computing Assignment 2

Running these codes

Some common errors that might occur while running the code

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
indic-health-demo		indic-health-demo
.gitattributes		.gitattributes
.gitignore		.gitignore
DeepSeek Reading Grp.pdf		DeepSeek Reading Grp.pdf
LICENSE		LICENSE
README.md		README.md
detox.py		detox.py
ee_bcbert_en.ipynb		ee_bcbert_en.ipynb
ee_bridge_hi.ipynb		ee_bridge_hi.ipynb
ee_dtrans_hi.ipynb		ee_dtrans_hi.ipynb
ee_rob_en.ipynb		ee_rob_en.ipynb
ee_svm_en.ipynb		ee_svm_en.ipynb
ee_xlmr_hi.ipynb		ee_xlmr_hi.ipynb
ir_bcbert_en.ipynb		ir_bcbert_en.ipynb
ir_bridge_hi.ipynb		ir_bridge_hi.ipynb
ir_dtrans_hi.ipynb		ir_dtrans_hi.ipynb
ir_rob_en.ipynb		ir_rob_en.ipynb
ir_svm_en.ipynb		ir_svm_en.ipynb
ir_xlmr_hi.ipynb		ir_xlmr_hi.ipynb

License

naquee-rizwan/Social-Computing

Folders and files

Latest commit

History

Repository files navigation

Social Computing Assignment 2

Running these codes

Some common errors that might occur while running the code

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages