pdf2docx.table.TablesConstructor module#
Parsing table blocks.
lattice table
: explicit borders represented by strokes.stream table
: borderless table recognized from layout of text blocks.
Terms definition:
From appearance aspect, we say
stroke
andfill
, the former looks like a line, while the later an area.From semantic aspect, we say
border
(cell border) andshading
(cell shading).An explicit border is determined by a certain stroke, while a stroke may also represent an underline of text.
An explicit shading is determined by a fill, while a fill may also represent a highlight of text.
Border object is introduced to determin borders of stream table. Border instance is a virtual border adaptive in a certain range, then converted to a stroke once finalized, and finally applied to detect table border.
- class pdf2docx.table.TablesConstructor.TablesConstructor(parent)#
Bases:
object
Object parsing
TableBlock
for specifiedLayout
.- lattice_tables(connected_border_tolerance: float, min_border_clearance: float, max_border_width: float)#
Parse table with explicit borders/shadings represented by rectangle shapes.
- Args:
connected_border_tolerance (float): Two borders are intersected if the gap lower than this value. min_border_clearance (float): The minimum allowable clearance of two borders. max_border_width (float): Max border width.
- stream_tables(min_border_clearance: float, max_border_width: float, line_separate_threshold: float)#
Parse table with layout of text/image blocks, and update borders with explicit borders represented by rectangle shapes.
Refer to
lattice_tables
for arguments description.