pdf2docx.page.RawPageFitz module#

A wrapper of PyMuPDF Page as page engine.

class pdf2docx.page.RawPageFitz.RawPageFitz(page_engine=None)#

Bases: RawPage

A wrapper of fitz.Page to extract source contents.

extract_raw_dict(**settings)#

Extract source data with page engine. Return a dict with the following structure: ```

{

“width” : w, “height”: h, “blocks”: [{…}, {…}, …], “shapes” : [{…}, {…}, …]

}

```