pdf2docx.common.Element module#
Object with a bounding box, e.g. Block, Line, Span.
Based on PyMuPDF
, the coordinates (e.g. bbox of page.get_text('rawdict')
) are generally
provided relative to the un-rotated page; while this pdf2docx
library works under real page
coordinate system, i.e. with rotation considered. So, any instances created by this Class are
always applied a rotation matrix automatically.
Therefore, the bbox parameter used to create Element
instance MUST be relative to un-rotated
CS. If final coordinates are provided, should update it after creating an empty object:
Element().update_bbox(final_bbox)
Note
An exception is page.get_drawings()
, the coordinates are converted to real page CS already.
- class pdf2docx.common.Element.Element(raw: Optional[dict] = None, parent=None)#
Bases:
IText
Boundary box with attribute in fitz.Rect type.
- ROTATION_MATRIX = Matrix(1.0, 0.0, -0.0, 1.0, 0.0, 0.0)#
- contains(e: Element, threshold: float = 1.0)#
Whether given element is contained in this instance, with margin considered.
- Args:
e (Element): Target element threshold (float, optional): Intersection rate.
Defaults to 1.0. The larger, the stricter.
- Returns:
bool: [description]
- copy()#
make a deep copy.
- get_expand_bbox(dt: float)#
Get expanded bbox with margin in both x- and y- direction.
- Args:
dt (float): Expanding margin.
- Returns:
fitz.Rect: Expanded bbox.
Note
This method creates a new bbox, rather than changing the bbox of itself.
- get_main_bbox(e, threshold: float = 0.95)#
If the intersection with
e
exceeds the threshold, return the union of these two elements; else return None.- Args:
e (Element): Target element. threshold (float, optional): Intersection rate. Defaults to 0.95.
- Returns:
fitz.Rect: Union bbox or None.
- horizontally_align_with(e, factor: float = 0.0, text_direction: bool = True)#
Check whether two Element instances have enough intersection in horizontal direction, i.e. along the reading direction.
- Args:
e (Element): Element to check with factor (float, optional): threshold of overlap ratio, the larger it is, the higher
probability the two bbox-es are aligned.
text_direction (bool, optional): consider text direction or not. True by default.
Examples:
+--------------+ | | L1 +--------------------+ +--------------+ | | L2 +--------------------+
An enough intersection is defined based on the minimum width of two boxes:
L1+L2-L>factor*min(L1,L2)
- in_same_row(e)#
Check whether in same row/line with specified Element instance. With text direction considered.
Taking horizontal text as an example:
yes: the bottom edge of each box is lower than the centerline of the other one;
otherwise, not in same row.
- Args:
e (Element): Target object.
Note
The difference to method
horizontally_align_with
: they may not in same line, though aligned horizontally.
- property parent#
- plot(page, stroke: tuple = (0, 0, 0), width: float = 0.5, fill: Optional[tuple] = None, dashes: Optional[str] = None)#
Plot bbox in PDF page for debug purpose.
- classmethod pure_rotation_matrix()#
Pure rotation matrix used for calculating text direction after rotation.
- classmethod set_rotation_matrix(rotation_matrix)#
Set global rotation matrix.
- Args:
Rotation_matrix (fitz.Matrix): target matrix
- store()#
Store properties in raw dict.
- union_bbox(e)#
Update current bbox to the union with specified Element.
- Args:
e (Element): The target to get union
- Returns:
Element: self
- update_bbox(rect)#
Update current bbox to specified
rect
.- Args:
- rect (fitz.Rect or list): bbox-like
(x0, y0, x1, y1)
, in real page CS (with rotation considered).
- rect (fitz.Rect or list): bbox-like
- vertically_align_with(e, factor: float = 0.0, text_direction: bool = True)#
Check whether two Element instances have enough intersection in vertical direction, i.e. perpendicular to reading direction.
- Args:
e (Element): Object to check with factor (float, optional): Threshold of overlap ratio, the larger it is, the higher
probability the two bbox-es are aligned.
text_direction (bool, optional): Consider text direction or not. True by default.
- Returns:
bool: [description]
Examples:
+--------------+ | | +--------------+ L1 +-------------------+ | | +-------------------+ L2
An enough intersection is defined based on the minimum width of two boxes:
L1+L2-L>factor*min(L1,L2)