Skip to content

OCR is picky about file format #247

@r-tae

Description

@r-tae

Currently the UI says:

File names must match the form testing_ocr-{sequence number} where sequence number can be any sequence of digits followed by letters (a-z,A-Z) or _.

Accepted images types: tiff and jpg.

Files larger than 10MB are not able to be processed with OCR

I've had .jpg files fail pretty consistently, there may be specific restrictions on encoding, and I've had a 21MB .tiff file work perfectly; its possible that Textract has changed since this was written.

Also uploading a PNG is allowed, the only thing that happens is the OCR usually fails (I think I remember it working with a PNG a couple of times; but I would need to double check). If PNG is really not supported we should stop people from picking them in the file picker and throw up an error for drag-n-drop.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions