Processing PDFs with Mathpix Python SDK

The Mathpix Python SDK allows you to process entire PDF files and extract content such as text, LaTeX, Mathpix Markdown (MMD), and more.

Code Example

from mpxpy.mathpix_client import MathpixClient

client = MathpixClient(
    app_id="your-app-id", 
    app_key="your-app-key"
)

# Process a PDF file
pdf_file = client.pdf_new(
    file_url="http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf",
    conversion_formats={
        "md": True
    }
)

# Wait until the processing is complete
pdf_file.wait_until_complete(timeout=60)

# Download the converted files to a local folder
pdf_file.download_output_to_local_path("md", "./output")

Rendered Output Example: From PDF to MMD

This example shows how the Mathpix Python SDK can process a PDF and convert its content into structured Markdown with math formatting.

The converted result includes structured text, LaTeX math, and even tables extracted from the original document.

This is a great way to turn scientific PDFs into clean, editable Markdown that you can post-process or publish.