Text Breaking when used for Gurmukhi(punjabi) script

I want to extract text from PDF for Gurmukhi script which is punjabi laguage   
but characters wrongly read while extracting the text from pdf 

`pdf_path='/content/Punjab2_new.pdf'
doc = Document(pdf_path)

text_control=TextControl("physical",insert_bom=True)
for page in range(len(doc)):
  out_res=doc[page].text((0,90,155,700),text_control)
  print('\n_______________New_page_output_________________________\n')
  print(out_res)`


here are my expected and actual result images 
expected image is sample of my input :

![expected_text](https://user-images.githubusercontent.com/91603328/185781904-2ade1044-e4d5-4612-b308-d345cacae3e1.png)

and with text function I am having false charecter recognition issue: 

![actual_output](https://user-images.githubusercontent.com/91603328/185781910-0b86d6ba-2361-4b40-b6a1-6169898c821c.png)

PDF 
[download.pdf](https://github.com/ashutoshvarma/pyxpdf/files/9389033/download.pdf)

It will be a great help if any parameters of pyxpdf solve the issue



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Breaking when used for Gurmukhi(punjabi) script #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Text Breaking when used for Gurmukhi(punjabi) script #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions