-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
I want to extract text from PDF for Gurmukhi script which is punjabi laguage
but characters wrongly read while extracting the text from pdf
`pdf_path='/content/Punjab2_new.pdf'
doc = Document(pdf_path)
text_control=TextControl("physical",insert_bom=True)
for page in range(len(doc)):
out_res=doc[page].text((0,90,155,700),text_control)
print('\n_______________New_page_output_________________________\n')
print(out_res)`
here are my expected and actual result images
expected image is sample of my input :
and with text function I am having false charecter recognition issue:
PDF
download.pdf
It will be a great help if any parameters of pyxpdf solve the issue
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels

