pdf2docx.shape.Shapes module#

A group of Shape instances.

class pdf2docx.shape.Shapes.Shapes(instances: Optional[list] = None, parent=None)#

Bases: ElementCollection

A collection of Shape instances: Stroke or Fill.

assign_to_tables(tables: list)#

Add Shape to associated cells of given tables.

Args:

tables (list): A list of TableBlock instances.

clean_up(max_border_width: float, shape_min_dimension: float)#

Clean rectangles.

  • Delete shapes out of page.

  • Delete small shapes (either width or height).

  • Merge shapes with same filling color.

  • Detect semantic type.

Args:

max_border_width (float): The max border width. shape_min_dimension (float): Ignore shape if both width and height

is lower than this value.

property fillings#

Fill Shapes, including cell shading and highlight.

Hyperlink Shapes.

plot(page)#

Plot shapes for debug purpose. Different colors are used to display the shapes in detected semantic types, e.g. yellow for text based shape (stroke, underline and highlight). Due to overlaps between Stroke and Fill related groups, some shapes are plot twice.

Args:

page (fitz.Page): pdf page.

restore(raws: list)#

Clean current instances and restore them from source dicts.

property strokes#

Stroke Shapes, including table border, text underline and strike-through.

property table_fillings#

Potential table shadings.

property table_strokes#

Potential table borders.

property text_style_shapes#

Potential text style based shapes, e.g. underline, strike-through, highlight and hyperlink.