Skip to content

TypeError: 'PDFObjRef' object is not subscriptable #92

@sgpinkus

Description

@sgpinkus

Getting "TypeError: 'PDFObjRef' object is not subscriptable" with some PDFs from the Internet.

Test script:

import requests
import pdfquery
res = requests.get('https://www.aph.gov.au/Senators_and_Members/Members/Register/-/media/03_Senators_and_Members/32_Members/Register/47P/AB/Albanese_47P.pdf?la=en&hash=E76C6FAA27171CFB2A95FC26EA0A1E45084F69C1')
with open('test.pdf', 'wb') as f: f.write(res.content)
pdf = pdfquery.PDFQuery('test.pdf')
pdf.load()

Gives:

Traceback (most recent call last):
  File "/tmp/test2.py", line 6, in <module>
    pdf.load()
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 385, in load
    self.tree = self.get_tree(*_flatten(page_numbers))
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 487, in get_tree
    for n, page in pages:
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 608, in <genexpr>
    return (self.get_layout(page) for page in self._cached_pages())
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 603, in get_layout
    layout = self._add_annots(layout, page.annots)
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 647, in _add_annots
    annot = self._set_hwxy_attrs(annot)
  File "/tmp/venv/lib/python3.9/site-packages/pdfquery/pdfquery.py", line 665, in _set_hwxy_attrs
    attr['x0'] = bbox[0]
TypeError: 'PDFObjRef' object is not subscriptable

Can open the PDF test.pdf with multiple viewers installed on system no problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions