pdf2docx.text.Char module#
Char object based on PDF raw dict extracted with PyMuPDF
.
Data structure refer to this link:
{
'bbox' : (x0, y0, x1, y1),
'c' : str,
'origin': (x,y)
}
- class pdf2docx.text.Char.Char(raw: Optional[dict] = None)#
Bases:
Element
Object representing a character.
- bbox: fitz.Rect#
- contained_in_rect(rect: Shape, horizontal: bool = True)#
Detect whether it locates in a rect.
- Args:
rect (Shape): Target rect to check. horizontal (bool, optional): Text direction is horizontal if True. Defaults to True.
- Returns:
bool: Whether a Char locates in target rect.
Note
It’s considered as contained in the target rect if the intersection is larger than half of the char bbox.
- store()#
Store properties in raw dict.