Skip to content

[Bug]: Many words are split and sometimes characters are "moved" to other lines #249

@k00ni

Description

@k00ni

Is there an existing issue for this?

  • I have searched the existing issues and found no similar reports

Are you using the latest version of this package?

  • The issue I'm reporting exists in the latest release

Can other PDF readers read the file?

  • The PDF I'm trying to read opens correctly in at least one other PDF reader

When running this snippet

$text = (new PdfParser())->parseFile('/path/to/file.pdf')->getText();

I run into the following issue/exception (Please attach the pdf)

@PrinsFrank I hope you are doing good, didn't hear from you since your last mail.

There are characters missing in some words and it seems these characters are being "moved" to the next line. For instance, the line Frühwald, Norbert, Hemau looks like:

rühwald, orbert, emau 
FNH

whereas the FNH are the first letters of the words above the line.

Raw extracted text

Image

Related part in the PDF

Image

PDFs

Here is the related PR for your PDF samples repository: PrinsFrank/pdf-samples#8

Uploaded PDF: BAnz AT 08.10.2025 B1.pdf

Online version of the PDF: https://www.bundesanzeiger.de/pub/publication/pifGpbbuJiFDBbgfH0P/content/pifGpbbuJiFDBbgfH0P/BAnz%20AT%2008.10.2025%20B1.pdf?inline

Do you allow attachment files to be used in tests to prevent regressions?

  • Yes, I give permission to use this file as a test file to prevent future regressions (And am authorized to give this permission) -- the file is provided by the German government is meant for the public.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions