pdf2docx.shape.Path module#
Objects representing PDF path (stroke and filling) extracted from pdf drawings and annotations.
Data structure based on results of page.get_drawings()
:
{
'color': (x,x,x) or None, # stroke color
'fill' : (x,x,x) or None, # fill color
'width': float, # line width
'closePath': bool, # whether to connect last and first point
'rect' : rect, # page area covered by this path
'items': [ # list of draw commands: lines, rectangle or curves.
("l", p1, p2), # a line from p1 to p2
("c", p1, p2, p3, p4), # cubic Bézier curve from p1 to p4, p2 and p3 are the control points
("re", rect), # a rect represented with two diagonal points
("qu", quad) # a quad represented with four corner points
],
...
}
- References:
Note
The coordinates extracted by page.get_drawings()
is based on real page CS, i.e. with rotation
considered. This is different from page.get_text('rawdict')
.
- class pdf2docx.shape.Path.C(item)#
Bases:
Segment
Bezier curve path with source
("c", p1, p2, p3, p4)
.
- class pdf2docx.shape.Path.L(item)#
Bases:
Segment
Line path with source
("l", p1, p2)
.- property length#
- to_strokes(width: float, color: list)#
Convert to stroke dict.
- Args:
width (float): Specify width for the stroke. color (list): Specify color for the stroke.
- Returns:
list: A list of
Stroke
dicts.
Note
A line corresponds to one stroke, but considering the consistence, the return stroke dict is append to a list. So, the length of list is always 1.
- class pdf2docx.shape.Path.Path(raw: dict)#
Bases:
object
Path extracted from PDF, consist of one or more
Segments
.- property is_fill#
- property is_iso_oriented#
It is iso-oriented when all contained segments are iso-oriented.
- property is_stroke#
- plot(canvas)#
Plot path for debug purpose.
- Args:
canvas:
PyMuPDF
drawing canvas bypage.new_shape()
.
Reference:
- to_shapes()#
Convert path to
Shape
raw dicts.- Returns:
list: A list of
Shape
dict.
- class pdf2docx.shape.Path.R(item)#
Bases:
Segment
Rect path with source
("re", rect)
.- to_strokes(width: float, color: list)#
Convert each edge to stroke dict.
- Args:
width (float): Specify width for the stroke. color (list): Specify color for the stroke.
- Returns:
list: A list of
Stroke
dicts.
Note
One Rect path is converted to a list of 4 stroke dicts.
- class pdf2docx.shape.Path.Segment(item)#
Bases:
object
A segment of path, e.g. a line or a rectangle or a curve.
- to_strokes(width: float, color: list)#
- class pdf2docx.shape.Path.Segments(items: list, close_path=False)#
Bases:
object
A sub-path composed of one or more segments.
- property area#
Calculate segments area with Green formulas. Note the boundary of Bezier curve is simplified with its control points.
- property bbox#
Calculate segments bbox.
- property is_iso_oriented#
ISO-oriented criterion: the ratio of real area to bbox exceeds 0.9.
- property points#
Connected points of segments.
- to_fill(color: list)#
Convert segment closed area to a
Fill
dict.- Args:
color (list): Specify fill color.
- Returns:
dict:
Fill
dict.
- to_strokes(width: float, color: list)#
Convert each segment to a
Stroke
dict.- Args:
width (float): Specify stroke width. color (list): Specify stroke color.
- Returns:
list: A list of
Stroke
dicts.