pdf2docx.common.docx module#

docx operation methods based on python-docx.

pdf2docx.common.docx.add_float_image(p, image_path_or_stream, width, pos_x=None, pos_y=None)#

Add float image behind text.

Args:

p (Paragraph): python-docx Paragraph object this picture belongs to. image_path_or_stream (str, bytes): Image path or stream. width (float): Displaying width of picture, in unit Pt. pos_x (float): X-position (English Metric Units) to the top-left point of page valid region pos_y (float): Y-position (English Metric Units) to the top-left point of page valid region

Create a hyperlink within a paragraph object.

Reference:

Args:

paragraph (Paragraph): python-docx paragraph adding the hyperlink to. url (str): The required url. text (str): The text displayed for the url.

Returns:

Run: A Run object containing the hyperlink.

pdf2docx.common.docx.add_image(p, image_path_or_stream, width, height)#

Add image to paragraph.

Args:

p (Paragraph): python-docx paragraph instance. image_path_or_stream (str, bytes): Image path or stream. width (float): Image width in Pt. height (float): Image height in Pt.

pdf2docx.common.docx.delete_paragraph(paragraph)#

Delete a paragraph.

Reference:

https://github.com/python-openxml/python-docx/issues/33#issuecomment-77661907

pdf2docx.common.docx.indent_table(table, indent: float)#

Indent a table.

Args:

table (Table): python-docx Table object. indent (float): Indent value, the basic unit is 1/20 pt.

pdf2docx.common.docx.reset_paragraph_format(p, line_spacing: float = 1.05)#

Reset paragraph format, especially line spacing.

Two kinds of line spacing, corresponding to the setting in MS Office Word:

  • line_spacing=1.05: single or multiple

  • line_spacing=Pt(1): exactly

Args:

p (Paragraph): python-docx paragraph instance. line_spacing (float, optional): Line spacing. Defaults to 1.05.

Returns:

paragraph_format: Paragraph format.

pdf2docx.common.docx.set_cell_border(cell: _Cell, **kwargs)#

Set cell`s border.

Reference:
Args:

cell (_Cell): python-docx Cell instance you want to modify. kwargs (dict): Dict with keys: top, bottom, start, end.

Usage:

set_cell_border(
    cell,
    top={"sz": 12, "val": "single", "color": "#FF0000", "space": "0"},
    bottom={"sz": 12, "color": "#00FF00", "val": "single"},
    start={"sz": 24, "val": "dashed", "shadow": "true"},
    end={"sz": 12, "val": "dashed"},
)
pdf2docx.common.docx.set_cell_margins(cell: _Cell, **kwargs)#

Set cell margins. Provided values are in twentieths of a point (1/1440 of an inch).

Reference:

Args:

cell (_Cell): python-docx Cell instance you want to modify. kwargs (dict): Dict with keys: top, bottom, start, end.

Usage:

set_cell_margins(cell, top=50, start=50, bottom=50, end=50)    
pdf2docx.common.docx.set_cell_shading(cell: _Cell, srgb: int)#

Set cell background-color.

Reference:

https://stackoverflow.com/questions/26752856/python-docx-set-table-cell-background-and-text-color

Args:

cell (_Cell): python-docx Cell instance you want to modify srgb (int): RGB color value.

pdf2docx.common.docx.set_char_scaling(p_run, scale: float = 1.0)#

Set character spacing: scaling.

Manual operation in MS Word: Font | Advanced | Character Spacing | Scaling.

Args:

p_run (docx.text.run.Run): Proxy object wrapping <w:r> element. scale (float, optional): scaling factor. Defaults to 1.0.

pdf2docx.common.docx.set_char_shading(p_run, srgb: int)#

Set character shading color, in case the color is out of highlight color scope.

Reference:

http://officeopenxml.com/WPtextShading.php

Args:

p_run (docx.text.run.Run): Proxy object wrapping <w:r> element. srgb (int): Color value.

pdf2docx.common.docx.set_char_spacing(p_run, space: float = 0.0)#

Set character spacing.

Manual operation in MS Word: Font | Advanced | Character Spacing | Spacing.

Args:

p_run (docx.text.run.Run): Proxy object wrapping <w:r> element. space (float, optional): Spacing value in Pt. Expand if positive else condense. Defaults to 0.0.

pdf2docx.common.docx.set_char_underline(p_run, srgb: int)#

Set underline and color.

Args:

p_run (docx.text.run.Run): Proxy object wrapping <w:r> element. srgb (int): Color value.

pdf2docx.common.docx.set_columns(section, width_list: list, space=0)#

Set section column count and space.

Args:

section : python-docx Section instance. width_list (list|tuple): Width of each column. space (int, optional): Space between adjacent columns. Unit: Pt. Defaults to 0.

Scheme:

<w:cols w:num="2" w:space="0" w:equalWidth="0">
    <w:col w:w="2600" w:space="0"/>
    <w:col w:w="7632"/>
</w:cols>
pdf2docx.common.docx.set_equal_columns(section, num=2, space=0)#

Set section column count and space. All the columns have same width.

Args:

section : python-docx Section instance. num (int): Column count. Defaults to 2. space (int, optional): Space between adjacent columns. Unit: Pt. Defaults to 0.

pdf2docx.common.docx.set_hidden_property(p)#

Hide paragraph. This method just sets the paragraph property, while the added text must be hided explicitly.

r = p.add_run() r.text = “Hidden” r.font.hidden = True

Args:

p (Paragraph): python-docx created paragraph.

pdf2docx.common.docx.set_vertical_cell_direction(cell: _Cell, direction: str = 'btLr')#

Set vertical text direction for cell.

Reference:

https://stackoverflow.com/questions/47738013/how-to-rotate-text-in-table-cells

Args:

direction (str): Either “tbRl” (top to bottom) or “btLr” (bottom to top).