pdf2docx.image.Image module#

Image object.

Data structure defined in link https://pymupdf.readthedocs.io/en/latest/textpage.html:

{
    'type': 1,
    'bbox': (x0,y0,x1,y1),
    'width': w,
    'height': h,
    'image': b'',

    # --- discard properties ---
    'ext': 'png',
    'colorspace': n,
    'xref': xref, 'yref': yref, 'bpc': bpc
}
class pdf2docx.image.Image.Image(raw: Optional[dict] = None)#

Bases: Element

Base image object.

bbox: fitz.Rect#
from_image(image)#

Update with image block/span.

Args:

image (Image): Target image block/span.

make_docx(paragraph)#

Add image span to a docx paragraph.

plot(page, color: tuple)#

Plot image bbox with diagonal lines (for debug purpose).

Args:

page (fitz.Page): Plotting page.

store()#

Store image with base64 encode.

  • Encode image bytes with base64 -> base64 bytes

  • Decode base64 bytes -> str -> so can be serialized in json format

property text#

Get an image placeholder <image>.