pdf2docx.main module#
Entry for pdf2docx
command line.
- class pdf2docx.main.PDF2DOCX#
Bases:
object
Command line interface for
pdf2docx
.- static convert(pdf_file: str, docx_file: Optional[str] = None, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#
Convert pdf file to docx file.
- Args:
pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.
Note
Refer to
convert()
for detailed description on above arguments.
- static debug(pdf_file: str, password: Optional[str] = None, page: int = 0, docx_file: Optional[str] = None, debug_pdf: Optional[str] = None, layout_file: str = 'layout.json', **kwargs)#
Convert one PDF page and plot layout information for debugging.
- Args:
pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. page (int, optional): Page index to convert. docx_file (str, optional): docx filename to write to. debug_pdf (str, optional): Filename for new pdf storing layout information.
Defaults to same name with pdf file.
- layout_file (str, optional): Filename for new json file storing parsed layout data.
Defaults to
layout.json
.
kwargs (dict) : Configuration parameters.
- static gui()#
Simple user interface.
- static table(pdf_file, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#
Extract table content from pdf pages.
- Args:
pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None.
- pdf2docx.main.main()#
Command line entry.
- pdf2docx.main.parse(pdf_file: str, docx_file: Optional[str] = None, password: Optional[str] = None, start: int = 0, end: Optional[int] = None, pages: Optional[list] = None, **kwargs)#
Convert pdf file to docx file.
- Args:
pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.
Note
Refer to
convert()
for detailed description on above arguments.