Skip to content

Optimize CourtListener ingress scripts #50

@ProbablyFaiz

Description

@ProbablyFaiz

A few things we can do

  • Perform the upsert queries as we reach the requisite batch size while reading JSON files, not all at the end (which leads to significant memory consumption for a lot of opinion data)
  • Download CL tar files to disk rather than holding them in memory so as to consume less memory
  • Parallelize downloads but not tar extraction (faster downloads but don't kill RAM usage)

Metadata

Metadata

Assignees

Labels

infrastructureDependency upgrades, refactors, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions