Table of Contents

Extract content from PDF as JSON

Extracts content from a PDF file and returns it as a JSON string.

Note

If you want to programatically query the document structure of a PDF document, consider using the Extract content from PDF as document tree action instead.

Properties

Name Type Description
fileContent Required Byte array of the PDF file to be extracted.
outputFormat Required Defines the structure of the extracted result. Options: JSON_Raw, JSON_Simplified, JSON_Hierarchical.
fileData Optional Name of the variable to store the extracted result. Defaults to fileData.

Output format options

JSON_Raw

  • Contains low-level layout and styling metadata such as bounding boxes, fonts, position coordinates, justification, line height, etc.
  • Best for use cases where exact PDF layout reproduction or fine-grained analysis is needed.

JSON_Simplified

  • Extracts only plain text, organized by page.
  • Suitable for basic text search and lightweight parsing.

JSON_Hierarchical

  • Outputs a tree structure reflecting logical document structure (Document -> Page -> heading, paragraph, table, etc.).
  • Best choice for semantically meaningful documents like theses, reports, contracts.

Examples

JSON_Raw format

{
  "elements": [
    {
      "Bounds": [126.02, 315.34, 488.76, 358.76],
      "Font": {
        "family_name": "Times New Roman",
        "weight": 700
      },
      "Lang": "en",
      "page": 6,
      "path": "/Document/P[6]",
      "text": "Results: Results showed decrease in the intensity of the symptoms of Attention-Deficit/Hyperactivity Disorder...",
      "TextAlign": "Justify"
    }
  ]
}

JSON_Simplified format

{
  "elements": [
    {
      "page": 6,
      "path": "/Document/P[6]",
      "text": "Results: Results showed decrease in the intensity of the symptoms of Attention-Deficit/Hyperactivity Disorder..."
    }
  ]
}

JSON_Hierarchical format

{
  "elementType": "Document",
  "children": [    
    {
      "elementType": "Page",
      "page": 1,
      "children": [        
        {
          "elementType": "Sect",
          "page": 1,
          "children": [
            {
              "elementType": "H1",
              "value": "Competitive Analysis",
              "page": 1,
              "children": []
            },
            {
              "elementType": "P",
              "value": "Our company’s market share in the travel industry has been steadily increasing since the introduction of our company in 2011, and currently hovers around approximately 15% of US sales, 10% of European sales, and 7% of Asian sales. We do believe, however, that increased marketing efforts are needed to maintain this growth, due to ever-increasing competition from other travel brands.",
              "page": 1,
              "children": []
            },
            {
              "elementType": "Figure",
              "value": "15% 85% Market Share: US Our Company Combined Competitors 10% 90% Market Share: Europe Our Company Combined Competitors",
              "page": 1,
              "children": []
            },
            {
              "elementType": "Sect",
              "page": 1,
              "children": [
                {
                  "elementType": "H2",
                  "value": "Market Share: Asia",
                  "page": 1,
                  "children": []
                },
                {
                  "elementType": "Figure",
                  "value": "7% 93% Our Company Combined Competitors",
                  "page": 1,
                  "children": []
                }
              ]
            }
          ]
        }
      ]
    }    
  ]
}