Convert PDF#

We can use either the Converter class, or a wrapped method parse() to convert all/specified pdf pages to docx. Multi-processing is supported in case pdf file with a large number of pages.

Example 1: convert all pages#

from pdf2docx import Converter

pdf_file = '/path/to/sample.pdf'
docx_file = 'path/to/sample.docx'

# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file)      # all pages by default
cv.close()

An alternative using parse method:

from pdf2docx import parse

pdf_file = '/path/to/sample.pdf'
docx_file = 'path/to/sample.docx'

# convert pdf to docx
parse(pdf_file, docx_file)

Example 2: convert specified pages#

  • Specify pages range by start (from the first page if omitted) and end (to the last page if omitted):

    # convert from the second page to the end (by default)
    cv.convert(docx_file, start=1)
    
    # convert from the first page (by default) to the third (end=3, excluded)
    cv.convert(docx_file, end=3)
    
    # convert from the second page and the third
    cv.convert(docx_file, start=1, end=3)
    
  • Alternatively, set separate pages by pages:

    # convert the first, third and 5th pages
    cv.convert(docx_file, pages=[0,2,4])
    

Note

Refer to convert() for detailed description on the input arguments.

Example 3: multi-Processing#

Turn on multi-processing with default count of CPU:

cv.convert(docx_file, multi_processing=True)

Specify the count of CPUs:

cv.convert(docx_file, multi_processing=True, cpu_count=4)

Note

Multi-processing works for continuous pages specified by start and end only.

Example 4: convert encrypted pdf#

Provide password to open and convert password protected pdf:

cv = Converter(pdf_file, password)
cv.convert(docx_file)
cv.close()

This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of that license. Refer to licensing information at artifex.com or contact Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States for further information.

Discord logo