pdf2docx.table.TableBlock module#
Table block object parsed from raw image and text blocks.
Data Structure:
{
'type': int
'bbox': (x0, y0, x1, y1),
'rows': [
{
"bbox": (x0, y0, x1, y1),
"height": float,
"cells": [
{
'bbox': (x0, y0, x1, y1),
'border_color': (sRGB,,,), # top, right, bottom, left
'bg_color': sRGB,
'border_width': (,,,),
'merged_cells': (x,y), # this is the bottom-right cell of merged region: x rows, y cols
'blocks': [ {text blocks} ]
}, # end of cell
{},
None, # merged cell
...
]
}, # end of row
{...} # more rows
] # end of row
}
- class pdf2docx.table.TableBlock.TableBlock(raw: Optional[dict] = None)#
Bases:
Block
Table block.
- append(row: Row)#
Append row to table and update bbox accordingly.
- Args:
row (Row): Target row to add.
- assign_blocks(blocks: list)#
Assign
blocks
to associated cell.- Args:
blocks (list): A list of text/table blocks.
- assign_shapes(shapes: list)#
Assign
shapes
to associated cell.- Args:
shapes (list): A list of Shape.
- bbox: fitz.Rect#
- make_docx(table)#
Create docx table.
- Args:
table (Table):
python-docx
table instance.
- property num_cols#
Count of columns.
- property num_rows#
Count of rows.
- property outer_bbox#
Outer bbox with border considered.
- parse(**settings)#
Parse layout under cell level.
- Args:
settings (dict): Layout parsing parameters.
- plot(page)#
Plot table block, i.e. cell/line/span, for debug purpose.
- Args:
page (fitz.Page): pdf page. content (bool): Plot text blocks contained in cells if True. style (bool): Plot cell style if True, e.g. border width, shading. color (bool): Plot border stroke color if
style=False
.
- store()#
Store attributes in json format.
- property text#
Get text contained in each cell.
- Returns:
list: 2D-list with each element representing text in cell.