Welcome to pdf2docx#

pdf2docx is a Python library to extract data from PDF with PyMuPDF, parse layout with rule, and generate docx files with python-docx.

pdf2docx is hosted on GitHub and registered on PyPI.


_images/intro.png

API DOCUMENTATION

Indices and tables#


This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of that license. Refer to licensing information at artifex.com or contact Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States for further information.

Discord logo