pdf2docx.text.Char module#

Char object based on PDF raw dict extracted with PyMuPDF.

Data structure refer to this link:

{
    'bbox'  : (x0, y0, x1, y1), 
    'c'     : str, 
    'origin': (x,y)
}
class pdf2docx.text.Char.Char(raw: Optional[dict] = None)#

Bases: Element

Object representing a character.

bbox: fitz.Rect#
contained_in_rect(rect: Shape, horizontal: bool = True)#

Detect whether it locates in a rect.

Args:

rect (Shape): Target rect to check. horizontal (bool, optional): Text direction is horizontal if True. Defaults to True.

Returns:

bool: Whether a Char locates in target rect.

Note

It’s considered as contained in the target rect if the intersection is larger than half of the char bbox.

store()#

Store properties in raw dict.