A command line utility to work with the Discogs data dumps.
It makes it super easy to:
- List data dumps
- Download a specific dumps
- Convert dumps to ndjson or parquet
- Import a dump into a PostgreSQL database
dgtools [global options] command [command options] [arguments...]
--discogs-bucket- The URL of the Discogs data dumps (default: "https://discogs-data-dumps.s3.us-west-2.amazonaws.com")
Work with Discogs data dump files.
List the files in the Discogs data dumps.
dgtools dump list [options]
Options:
--year- Filter by year--month- Filter by month--type- Filter by data type--no-table- Don't print the table (output filenames only)
Dump the structure of an XML file.
dgtools dump structure <file> [options]
Arguments:
file- The file to dump the structure of
Options:
--stop-after X- Stops analysis after X records
Download a Discogs data dump.
dgtools dump download [options] <name>
Arguments:
name- The file to download
Options:
--out-dir- The output directory (default: ".")--overwrite- Force the download even if the file already exists--checksum- Check the checksum of the file after downloading (default: true)
Convert a dump to a different format
dgtools dump convert <name> --out <name> [options]
Arguments:
name- The file to convert
Options:
--out- The output file--stop-after X- Stop conversion after X records
Work with a database.
Options:
--database-url- The URL of the database to connect to (default: "postgres://$USER@localhost:5432/dgtools", can be set via DATABASE_URL environment variable)
Prepare the database for import by running migrations.
dgtools db prepare
Import data from a dump file to the database.
dgtools db import <file>
Arguments:
file- The file to import the data from
Nuke the database by rolling back all migrations.
dgtools db nuke
Please see LICENSE.md