-
-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
Text extractionTransformation matrixAbsolute position of items on a pageAbsolute position of items on a pagebugSomething isn't workingSomething isn't working
Description
Is there an existing issue for this?
- I have searched the existing issues and found no similar reports
Are you using the latest version of this package?
- The issue I'm reporting exists in the latest release
Can other PDF readers read the file?
- The PDF I'm trying to read opens correctly in at least one other PDF reader
When running this snippet
$text = (new PdfParser())->parseFile('/path/to/file.pdf')->getText();I run into the following issue/exception (Please attach the pdf)
@PrinsFrank I hope you are doing good, didn't hear from you since your last mail.
There are characters missing in some words and it seems these characters are being "moved" to the next line. For instance, the line Frühwald, Norbert, Hemau looks like:
rühwald, orbert, emau
FNH
whereas the FNH are the first letters of the words above the line.
Raw extracted text
Related part in the PDF
PDFs
Here is the related PR for your PDF samples repository: PrinsFrank/pdf-samples#8
Uploaded PDF: BAnz AT 08.10.2025 B1.pdf
Online version of the PDF: https://www.bundesanzeiger.de/pub/publication/pifGpbbuJiFDBbgfH0P/content/pifGpbbuJiFDBbgfH0P/BAnz%20AT%2008.10.2025%20B1.pdf?inline
Do you allow attachment files to be used in tests to prevent regressions?
- Yes, I give permission to use this file as a test file to prevent future regressions (And am authorized to give this permission) -- the file is provided by the German government is meant for the public.
Metadata
Metadata
Assignees
Labels
Text extractionTransformation matrixAbsolute position of items on a pageAbsolute position of items on a pagebugSomething isn't workingSomething isn't working