Skip to content

"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py #6

@maoredman

Description

@maoredman

Hi, when I ran the data/download.sh script, the command
cat augmented_data/augmented_zips.zip.z* > augmented_train.json.zip
raised an error:
cat: augmented_data/augmented_zips.zip.z*: No such file or directory

I then changed augmented_zips.zip.z*
to
cat augmented_data/augmented_zips.z01 augmented_data/augmented_zips.z02 augmented_data/augmented_zips.z03 augmented_data/augmented_zips.z04 augmented_data/augmented_zips.z05 augmented_data/augmented_zips.z06 augmented_data/augmented_zips.z07 augmented_data/augmented_zips.z08 augmented_data/augmented_zips.z09 augmented_data/augmented_zips.z10 augmented_data/augmented_zips.zip > augmented_train.json.zip

But am unable to run featurize.py successfully afterwards, encountering UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte when running line 139 of vocab.py (gensim.models.KeyedVectors.load_word2vec_format(path, binary=True)) during "Building word embedding matrix..."

Is there any advice on what modifications I can make?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions