This repository is supporting the paper "NoCoLA: The Norwegian Corpus of Linguistic Acceptability" by Matias Jentoft and David Samuel at University of Oslo, Language Technology Group. NoCoLa are two datasets: "class" consisting of Norwegian language sentences with their binary acceptability judgements, and "zero" with pairs of unacceptable sentences with their acceptable counterparts.
NoCoLA is also available on HuggingFace at https://huggingface.co/datasets/ltg/nocola
The two datasets for linguistic acceptability are published here, for the -class version we have pre-made a split of 80/10/10 for training purposes.
If you wish to test a Norwegian Language Model for its competence in Norwegian grammar, all the necessary code is available in this repository.