pdf2docx.table.TableBlock module#

Table block object parsed from raw image and text blocks.

Data Structure:

{
    'type': int
    'bbox': (x0, y0, x1, y1),
    'rows': [
        {
            "bbox": (x0, y0, x1, y1),
            "height": float,
            "cells": [
                {
                    'bbox': (x0, y0, x1, y1),
                    'border_color': (sRGB,,,), # top, right, bottom, left
                    'bg_color': sRGB,
                    'border_width': (,,,),
                    'merged_cells': (x,y), # this is the bottom-right cell of merged region: x rows, y cols
                    'blocks': [ {text blocks} ]
                }, # end of cell
                {},
                None, # merged cell
                ...
            ]
        }, # end of row
        {...} # more rows
    ] # end of row
}
class pdf2docx.table.TableBlock.TableBlock(raw: Optional[dict] = None)#

Bases: Block

Table block.

append(row: Row)#

Append row to table and update bbox accordingly.

Args:

row (Row): Target row to add.

assign_blocks(blocks: list)#

Assign blocks to associated cell.

Args:

blocks (list): A list of text/table blocks.

assign_shapes(shapes: list)#

Assign shapes to associated cell.

Args:

shapes (list): A list of Shape.

bbox: fitz.Rect#
make_docx(table)#

Create docx table.

Args:

table (Table): python-docx table instance.

property num_cols#

Count of columns.

property num_rows#

Count of rows.

property outer_bbox#

Outer bbox with border considered.

parse(**settings)#

Parse layout under cell level.

Args:

settings (dict): Layout parsing parameters.

plot(page)#

Plot table block, i.e. cell/line/span, for debug purpose.

Args:

page (fitz.Page): pdf page. content (bool): Plot text blocks contained in cells if True. style (bool): Plot cell style if True, e.g. border width, shading. color (bool): Plot border stroke color if style=False.

store()#

Store attributes in json format.

property text#

Get text contained in each cell.

Returns:

list: 2D-list with each element representing text in cell.