Snip Snipping Tool Chrome Extension Convert API Secure Conversion Service
Make Documents Accessible Process Chemical Documents Collaborate on Documents Developer Solutions Train Language Models Support Academic Research Artificial Intelligence Fintech Edtech Pharma & Chemical Universities & Schools
Handwriting Recognition Digital Ink On-prem PDF Cloud Mathpix Markdown All Supported Languages Image Conversion PDF Conversion Markdown Conversion Table OCR Mathpix CLI PDF Search PDF Reader PDF Data Extraction Chrome Extension View Conversion Gallery
Snip Convert API SCS
Mobile Desktop Web Chrome Extension
Mathpix Snip Apps Convert API Mathpix Markdown Python SDK
About Blog Careers Contact
Get Started
← Back to Blog

Update for image and PDF API: new layout model, document tree outputs, parse text inside charts and diagrams

2025-04-16 · api, updates

Model update

We have released our latest layout parsing model, SuperNet. This model is larger and more capable than its predecessors. It combines several of our special purpose models (e.g., table layout parser, PDF page parser) into one larger model. The new approach improves accuracy and will enable us to ship improvements more rapidly moving forward.
Among the many improvements:
  • New capability to parse text fragments from diagrams (see below for more details)
  • Improved PDF parsing for distorted and imperfect scans
  • Improved general table parsing accuracy
  • Improved code/pseudocode recognition
  • Improved handling of rotated images and rotated PDF pages
  • Improved recognition of +/-90 degrees rotated table cells
  • Basic support for recognition of diagonally split table cells
Note that we have tested this model extensively on our internal benchmarks: this is our most well-tested model update yet.

Document tree information in line-by-line outputs

We have added new fields to our line-by-line data outputs in order to provide a detailed document tree output for each page.
In particular, we have added id, parent_id, and children_ids to the line-by-line outputs for PDFs and images. This is critical for diagram text extraction (see below), as we want to be able to associate text fragments with their parent containers. Diagrams can themselves have diagrams as children.
For API clients to be able to differentiate between “search only” text and normal “conversion” text, we have added a new conversion_output boolean field to each line, which is false whenever the line is not directly included in the converted Markdown / DOCX / HTML / LaTeX outputs. Diagrams are still included as images in outputs and have conversion_output=true, but the text inside of them, which will have conversion_output=false, can now be parsed if you specify "include_diagram_text": true in the request options.
For tables, the parent table has conversion_output=true but the children cells (if available) have conversion_output=false. The cells themselves may have children (e.g., diagrams, text lines), and so on.
For a more detailed breakdown of the updates to the line by line data, see our changelog. The updates to the API, including new returned fields and a full breakdown of line types, are documented at Mathpix Docs.

Parse text inside charts and diagrams

To parse text fragments inside charts and diagrams, use "include_diagram_text": true in your request options. Note that for the image API (v3/text), you need to specify "include_line_data": true to get the lines data.
For example, this image:
Example diagram
will get parsed as:
Example diagram, parsed
with lines outputs that looks like this (fields and the full list were omitted for brevity):
[
  {
    "type": "text",
    "conversion_output": true,
    "id": "b36ee632ed9f40a98be59f58876fc2c1",
    "text": "Figure 16.2.1. Current architecture for Lustre",
  },
  {
    "type": "diagram",
    "conversion_output": false,
    "id": "5ffc764d50ea4108a034d2842413b293",
    "children_ids": [
      "dfa08e22c9f34dbe864833b4b95685b8",
      "ef84af0a9e9642f5a8c49b4d566118a1",
      "b3f4a8cff5eb46e5af0b438bf348548d",
      "cb4e078fd4ab4e69a1ac754d96b9f7e2",
      "b217826830094fdcab48c657ccc48a87",
      "61558bb3f115410ba9fd192f78046cef",
      "2f769041cd7d4850a1f51b9590b6af08"
    ],
    "error_id": "image_not_supported"
  },
  {
    "type": "chart_info",
    "conversion_output": false,
    "id": "dfa08e22c9f34dbe864833b4b95685b8",
    "parent_id": "5ffc764d50ea4108a034d2842413b293",
    "children_ids": [
      "8d68793a3e6e4fcd951a98da1569e787",
      "2d7c06b837ea46a191c6b808c0fa3bba"
    ],
    "text": "Lustre Lite filesystem"
  },
  {
    "type": "text",
    "conversion_output": false,
    "id": "8d68793a3e6e4fcd951a98da1569e787",
    "parent_id": "dfa08e22c9f34dbe864833b4b95685b8",
    "text": "Lustre Lite"
  },
  {
    "type": "text",
     "conversion_output": false,
    "id": "2d7c06b837ea46a191c6b808c0fa3bba",
    "parent_id": "dfa08e22c9f34dbe864833b4b95685b8",
    "text": " filesystem",
  },
  ...
]

Additional considerations

  • We are deprecating the triangle shape associated with "include_geometry_data"; diagram label outputs are unchanged
  • There is a chance of increased latency for very large tables due to the improved algorithm being more compute-intensive
  • Table content is now left-aligned by default for improved readability