"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py

Hi, when I ran the `data/download.sh` script, the command
`cat augmented_data/augmented_zips.zip.z* > augmented_train.json.zip`
raised an error:
`cat: augmented_data/augmented_zips.zip.z*: No such file or directory`

I then changed `augmented_zips.zip.z* `
to
`cat augmented_data/augmented_zips.z01 augmented_data/augmented_zips.z02 augmented_data/augmented_zips.z03 augmented_data/augmented_zips.z04 augmented_data/augmented_zips.z05 augmented_data/augmented_zips.z06 augmented_data/augmented_zips.z07 augmented_data/augmented_zips.z08 augmented_data/augmented_zips.z09 augmented_data/augmented_zips.z10 augmented_data/augmented_zips.zip > augmented_train.json.zip`

But am unable to run `featurize.py` successfully afterwards, encountering `UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte` when running line 139 of `vocab.py` (`gensim.models.KeyedVectors.load_word2vec_format(path, binary=True)`) during "Building word embedding matrix..."

Is there any advice on what modifications I can make?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions