Skip to content

korp_mono.py list index out of range #3

@Phaqui

Description

@Phaqui

Det kan virke som at noe går galt et sted. I den fila jeg prøver å konvertere med korp_mono.py, ligger det analyser av typen

"<1024x768>"
	"1024x" Err/MissingSpace"768" Num @HNOUN #2->0

Som da gjør at scriptet krasjer med følgende melding (de første tre linjene har jeg skrevet ut slik at jeg skulle finne ut hvordan inputtet så ut.

anders@debian:~/corpus/corpus-fao$ korp_mono --skip-existing --ncpus most analysed/blogs/web_mix.txt.xml
--skip-existing given. Skipping 0 files that are already processed
Processing 1 files in parallel (9 workers)
word_form='1024x768'
lemma='1024x_∞_@HNOUN #2->0'
rest_cohort='\t"1024x" Err/MissingSpace"768" Num @HNOUN #2->0'
[1/1 FAILED: /home/anders/corpus/corpus-fao/analysed/blogs/web_mix.txt.xml
list index out of range
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/anders/.pyenv/versions/3.11.1/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 528, in process_file
    make_vrt_xml(file, analysed_file.lang),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 547, in make_vrt_xml
    make_sentences(valid_sentences(old_root.find(".//body/dependency").text), lang)
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 888, in make_sentences
    return [
           ^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 889, in <listcomp>
    make_sentence(current_sentence, current_lang) for current_sentence in sentences
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 879, in make_sentence
    [
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 880, in <listcomp>
    make_analysis_tuple(word_form, rest_cohort, current_lang)
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 840, in make_analysis_tuple
    maybe_pos = parts[1].replace("_∞_", "").strip()
                ~~~~~^^^
IndexError: list index out of range

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions