Convert PDF#
We can use either the Converter
class, or
a wrapped method parse()
to convert all/specified
pdf pages to docx. Multi-processing is supported in case pdf file with a
large number of pages.
Example 1: convert all pages#
from pdf2docx import Converter
pdf_file = '/path/to/sample.pdf'
docx_file = 'path/to/sample.docx'
# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file) # all pages by default
cv.close()
An alternative using parse
method:
from pdf2docx import parse
pdf_file = '/path/to/sample.pdf'
docx_file = 'path/to/sample.docx'
# convert pdf to docx
parse(pdf_file, docx_file)
Example 2: convert specified pages#
Specify pages range by
start
(from the first page if omitted) andend
(to the last page if omitted):# convert from the second page to the end (by default) cv.convert(docx_file, start=1) # convert from the first page (by default) to the third (end=3, excluded) cv.convert(docx_file, end=3) # convert from the second page and the third cv.convert(docx_file, start=1, end=3)
Alternatively, set separate pages by
pages
:# convert the first, third and 5th pages cv.convert(docx_file, pages=[0,2,4])
Note
Refer to convert()
for detailed description
on the input arguments.
Example 3: multi-Processing#
Turn on multi-processing with default count of CPU:
cv.convert(docx_file, multi_processing=True)
Specify the count of CPUs:
cv.convert(docx_file, multi_processing=True, cpu_count=4)
Note
Multi-processing works for continuous pages specified by start
and end
only.
Example 4: convert encrypted pdf#
Provide password
to open and convert password protected pdf:
cv = Converter(pdf_file, password)
cv.convert(docx_file)
cv.close()