Skip to content

Slow image extraction #4

@scriptcoded

Description

@scriptcoded

The current process for extracting images goes as follows:

  1. Find images within the document
  2. Render each page of the document as an image
  3. Use sharp to extract the images from the rendered pages

This process is both slow and requires an external library (sharp), that in turn has a native dependency (libvips).

In an ideal world we would extract the images directly from the PDF. As of right now I've not found a way to do this, but perhaps we could get some clues from pdfcpu (https://github.com/pdfcpu/pdfcpu/blob/6b2e3b4ba26ed6839410ca2fd00f21cb4649efbe/pkg/pdfcpu/extract.go#L51).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions