Question about running extract_docs_from_index.py 

I try to run the extract_docs_from_index.py  with this command and the index is pre-index provided by Pyserini:
awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py lucene index-robust04-20191213/ > data/robust/documents.tsv

but I get an error:
![image](https://user-images.githubusercontent.com/44088154/111870022-29eaa580-89bd-11eb-82e3-fdffbf6526ac.png)
and I do not change any code in the file.

my java version is:
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)
Do I have the correct java?

Could you give some advice on this error?
Thanks a lot! 

------
I index the Robust04 document files myself and run the extract_docs_from_index.py successfully!
Then I check the document.tsv file with pandas package and found that there are 73855 records here. I don't know how many files should be there and I appreciate that if you can tell me the correct number of records here!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about running extract_docs_from_index.py #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about running extract_docs_from_index.py #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions