Affinda · Arazzo Workflow

Affinda Split a Document and Re-parse

Version 1.0.0

Upload a multi-page document, wait for parsing, then split it into separate documents that re-parse.

1 workflow 1 source API 1 provider
View Spec View on GitHub AIArtificial IntelligenceDocument ProcessingIntelligent Document ProcessingIDPOCRResume ParsingInvoice ParsingReceipt ParsingDocument ExtractionDocument ClassificationDocument SplittingRecruitmentBankingInsuranceLogisticsHealthcareGovernmentArazzoWorkflows

Provider

affinda

Workflows

split-document-and-reparse
Upload a document, wait for parsing, then split its pages into new documents.
Uploads a file, polls until ready, and submits a split request that carves the document into the supplied page groups, each of which is re-parsed.
3 steps inputs: file, splits, workspace outputs: childDocuments, parentIdentifier
1
uploadDocument
createDocument
Upload the file with wait=false so an identifier is returned for polling.
2
pollUntilReady
getDocument
Poll the document until meta.ready becomes true.
3
splitPages
editDocumentPages
Split the document into the supplied page groups, which triggers re-parsing of each resulting child document.

Source API Descriptions

Arazzo Workflow Specification

affinda-split-document-and-reparse-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Affinda Split a Document and Re-parse
  summary: Upload a multi-page document, wait for parsing, then split it into separate documents that re-parse.
  description: >-
    Handles documents that bundle several logical documents together. A file is
    uploaded and polled until ready, then the split endpoint is called to break it
    into multiple child documents along page boundaries, which triggers re-parsing
    of each resulting document. Every step spells out its request inline so the
    flow can be read and executed without opening the underlying OpenAPI
    description.
    Note: the page-splitting endpoint is marked deprecated in the v3 specification
    but remains the documented way to split, merge, or rotate document pages.
  version: 1.0.0
sourceDescriptions:
- name: affindaV3Api
  url: ../openapi/affinda-v3-openapi.yml
  type: openapi
workflows:
- workflowId: split-document-and-reparse
  summary: Upload a document, wait for parsing, then split its pages into new documents.
  description: >-
    Uploads a file, polls until ready, and submits a split request that carves the
    document into the supplied page groups, each of which is re-parsed.
  inputs:
    type: object
    required:
    - workspace
    - file
    - splits
    properties:
      workspace:
        type: string
        description: The workspace identifier to upload the document into.
      file:
        type: string
        description: The multi-page document file contents (binary) to upload.
      splits:
        type: array
        description: The split definitions describing how pages should be grouped into documents.
        items:
          type: object
  steps:
  - stepId: uploadDocument
    description: Upload the file with wait=false so an identifier is returned for polling.
    operationId: createDocument
    requestBody:
      contentType: multipart/form-data
      payload:
        file: $inputs.file
        workspace: $inputs.workspace
        wait: false
    successCriteria:
    - condition: $statusCode == 201
    outputs:
      identifier: $response.body#/meta/identifier
  - stepId: pollUntilReady
    description: Poll the document until meta.ready becomes true.
    operationId: getDocument
    parameters:
    - name: identifier
      in: path
      value: $steps.uploadDocument.outputs.identifier
    successCriteria:
    - condition: $statusCode == 200
    - context: $response.body
      condition: $.meta.ready == true
      type: jsonpath
    outputs:
      identifier: $response.body#/meta/identifier
  - stepId: splitPages
    description: >-
      Split the document into the supplied page groups, which triggers re-parsing
      of each resulting child document.
    operationId: editDocumentPages
    parameters:
    - name: identifier
      in: path
      value: $steps.uploadDocument.outputs.identifier
    requestBody:
      contentType: application/json
      payload:
        splits: $inputs.splits
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      childDocuments: $response.body
  outputs:
    parentIdentifier: $steps.uploadDocument.outputs.identifier
    childDocuments: $steps.splitPages.outputs.childDocuments