pdf2docx.main module#

Entry for pdf2docx command line.

class pdf2docx.main.PDF2DOCX#

Bases: object

Command line interface for pdf2docx.

static convert(pdf_file: str, docx_file: Optional[str] = None, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#

Convert pdf file to docx file.

Args:

pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.

Note

Refer to convert() for detailed description on above arguments.

static debug(pdf_file: str, password: Optional[str] = None, page: int = 0, docx_file: Optional[str] = None, debug_pdf: Optional[str] = None, layout_file: str = 'layout.json', **kwargs)#

Convert one PDF page and plot layout information for debugging.

Args:

pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. page (int, optional): Page index to convert. docx_file (str, optional): docx filename to write to. debug_pdf (str, optional): Filename for new pdf storing layout information.

Defaults to same name with pdf file.

layout_file (str, optional): Filename for new json file storing parsed layout data.

Defaults to layout.json.

kwargs (dict) : Configuration parameters.

static gui()#

Simple user interface.

static table(pdf_file, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#

Extract table content from pdf pages.

Args:

pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None.

pdf2docx.main.main()#

Command line entry.

pdf2docx.main.parse(pdf_file: str, docx_file: Optional[str] = None, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#

Convert pdf file to docx file.

Args:

pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.

Note

Refer to convert() for detailed description on above arguments.