pdf2docx.text.Line module#

Text Line objects based on PDF raw dict extracted with PyMuPDF.

Data structure of line in text block referring to this link:

{
    'bbox': (x0,y0,x1,y1),
    'wmode': m,
    'dir': [x,y],
    'spans': [ spans ]
}

class pdf2docx.text.Line.Line(raw: Optional[dict] = None)#

Object representing a line in text block.

add(span_or_list)#

Add span list to current Line.

intersects(rect)#

Create new Line object with spans contained in given bbox.

property text#: Joining span text. Note image is translated to a placeholder <image>.

property text_direction#

Get text direction. Consider LEFT_RIGHT and LEFT_RIGHT only.

property white_space_only#: If this line contains only white space or not. If True, this line is safe to be removed.