Skip to content

How is the dataset size measured? #14

@hansbogert

Description

@hansbogert

When looking at the size of the Rankings table, the website/docs say it is 6.38GB, however it seems it is more in the range of 5.2GB when looking at the on disk size as well as Spark's reported size.
The format I downloaded was the txt-format, which I expected would be close to the reported 6.38GB since there is no compression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions