-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Det kan virke som at noe går galt et sted. I den fila jeg prøver å konvertere med korp_mono.py, ligger det analyser av typen
"<1024x768>"
"1024x" Err/MissingSpace"768" Num @HNOUN #2->0
Som da gjør at scriptet krasjer med følgende melding (de første tre linjene har jeg skrevet ut slik at jeg skulle finne ut hvordan inputtet så ut.
anders@debian:~/corpus/corpus-fao$ korp_mono --skip-existing --ncpus most analysed/blogs/web_mix.txt.xml
--skip-existing given. Skipping 0 files that are already processed
Processing 1 files in parallel (9 workers)
word_form='1024x768'
lemma='1024x_∞_@HNOUN #2->0'
rest_cohort='\t"1024x" Err/MissingSpace"768" Num @HNOUN #2->0'
[1/1 FAILED: /home/anders/corpus/corpus-fao/analysed/blogs/web_mix.txt.xml
list index out of range
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/anders/.pyenv/versions/3.11.1/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 528, in process_file
make_vrt_xml(file, analysed_file.lang),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 547, in make_vrt_xml
make_sentences(valid_sentences(old_root.find(".//body/dependency").text), lang)
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 888, in make_sentences
return [
^
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 889, in <listcomp>
make_sentence(current_sentence, current_lang) for current_sentence in sentences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 879, in make_sentence
[
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 880, in <listcomp>
make_analysis_tuple(word_form, rest_cohort, current_lang)
File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 840, in make_analysis_tuple
maybe_pos = parts[1].replace("_∞_", "").strip()
~~~~~^^^
IndexError: list index out of range
Metadata
Metadata
Assignees
Labels
No labels