pdf2docx.common.share module#

Common methods.

class pdf2docx.common.share.BlockType(value)#

Bases: Enum

Block types.

FLOAT_IMAGE = 4#
IMAGE = 1#
LATTICE_TABLE = 2#
STREAM_TABLE = 3#
TEXT = 0#
UNDEFINED = -1#
class pdf2docx.common.share.IText#

Bases: object

Text related interface considering text direction.

property is_horizontal_text#

Check whether text direction is from left to right.

property is_mix_text#

Check whether text direction is either from left to right or from bottom to top.

property is_vertical_text#

Check whether text direction is from bottom to top.

property text_direction#

Text direction is from left to right by default.

class pdf2docx.common.share.RectType(value)#

Bases: Enum

Shape type in context.

BORDER = 16#
HIGHLIGHT = 1#
SHADING = 32#
STRIKE = 4#
UNDERLINE = 2#
class pdf2docx.common.share.TextAlignment(value)#

Bases: Enum

Text alignment.

Note

The difference between NONE and UNKNOWN:

  • NONE: none of left/right/center align -> need TAB stop

  • UNKNOWN: can’t decide, e.g. single line only

CENTER = 2#
JUSTIFY = 4#
LEFT = 1#
NONE = -1#
RIGHT = 3#
UNKNOWN = 0#
class pdf2docx.common.share.TextDirection(value)#

Bases: Enum

Text direction. * LEFT_RIGHT: from left to right within a line, and lines go from top to bottom * BOTTOM_TOP: from bottom to top within a line, and lines go from left to right * MIX : a mixture if LEFT_RIGHT and BOTTOM_TOP * IGNORE : neither LEFT_RIGHT nor BOTTOM_TOP

BOTTOM_TOP = 1#
IGNORE = -1#
LEFT_RIGHT = 0#
MIX = 2#
pdf2docx.common.share.cmyk_to_rgb(c: float, m: float, y: float, k: float, cmyk_scale: float = 100)#

CMYK components to GRB value.

pdf2docx.common.share.debug_plot(title: str, show=True)#

Plot the returned objects of inner function.

Args:

title (str): Page title. show (bool, optional): Don’t plot if show==False. Default to True.

Note

Prerequisite of the inner function:
  • the first argument is a BasePage instance.

  • the last argument is configuration parameters in dict type.

pdf2docx.common.share.decode(s: str)#

Try to decode a unicode string.

pdf2docx.common.share.flatten(items, klass)#

Yield items from any nested iterable.

pdf2docx.common.share.is_number(str_number)#

Whether can be converted to a float.

class pdf2docx.common.share.lazyproperty(func)#

Bases: object

Calculate only once and cache property value.

pdf2docx.common.share.lower_round(number: float, ndigits: int = 0)#

Round number to lower bound with specified digits, e.g. lower_round(1.26, 1)=1.2

pdf2docx.common.share.new_page(doc, width: float, height: float, title: str)#

Insert a new page with given title.

Args:

doc (fitz.Document): pdf document object. width (float): Page width. height (float): Page height. title (str): Page title shown in page.

pdf2docx.common.share.rgb_component(srgb: int)#

srgb value to R,G,B components, e.g. 16711680 -> (255, 0, 0).

Equal to PyMuPDF built-in method:

[int(255*x) for x in fitz.sRGB_to_pdf(x)]
pdf2docx.common.share.rgb_component_from_name(name: str = '')#

Get a named RGB color (or random color) from fitz predefined colors, e.g. ‘red’ -> (1.0,0.0,0.0).

pdf2docx.common.share.rgb_to_value(rgb: list)#

RGB components to decimal value, e.g. (1,0,0) -> 16711680.

pdf2docx.common.share.rgb_value(components: list)#

Gray/RGB/CMYK mode components to color value.