sensible-so · Arazzo Workflow

Sensible Classify Then Extract

Version 1.0.0

Classify a document to discover its best-fit document type, then submit an asynchronous extraction against that type.

1 workflow 2 source APIs 1 provider
View Spec View on GitHub ArazzoWorkflows

Provider

sensible-so

Workflows

classify-then-extract
Classify a document and then asynchronously extract it using the discovered document type.
Classifies the document bytes synchronously to resolve the best-fit document type, then submits an asynchronous URL extraction for that type and the supplied config and returns the extraction id.
2 steps inputs: apiKey, configName, documentBytes, documentUrl outputs: documentTypeName, extractionId
1
classifyDocument
classify-document-sync
Classify the document bytes synchronously to discover which document type in the account the document is most similar to.
2
extractWithClassifiedType
provide-a-download-url-with-config
Submit an asynchronous URL extraction using the document type discovered by classification and the supplied config.

Source API Descriptions

Arazzo Workflow Specification

sensible-so-classify-then-extract-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Sensible Classify Then Extract
  summary: Classify a document to discover its best-fit document type, then submit an asynchronous extraction against that type.
  description: >-
    A routing-then-extraction flow. The workflow first classifies the document
    bytes synchronously to discover which document type in the account the
    document most resembles, then feeds that document type name into an
    asynchronous extract-from-url request so the document is extracted using
    the right type and config. The classify step posts the raw document bytes,
    while the extraction step references a URL where the same document is
    hosted. Every step spells out its request inline, including the Bearer
    authorization.
  version: 1.0.0
sourceDescriptions:
- name: classificationApi
  url: ../openapi/sensible-classification-api-openapi.yml
  type: openapi
- name: extractionsApi
  url: ../openapi/sensible-extractions-api-openapi.yml
  type: openapi
workflows:
- workflowId: classify-then-extract
  summary: Classify a document and then asynchronously extract it using the discovered document type.
  description: >-
    Classifies the document bytes synchronously to resolve the best-fit
    document type, then submits an asynchronous URL extraction for that type
    and the supplied config and returns the extraction id.
  inputs:
    type: object
    required:
    - apiKey
    - documentBytes
    - documentUrl
    - configName
    properties:
      apiKey:
        type: string
        description: Sensible API key used as the Bearer token.
      documentBytes:
        type: string
        description: The raw (non-encoded) document bytes to classify.
      documentUrl:
        type: string
        description: A publicly accessible or presigned URL returning the same document bytes for extraction.
      configName:
        type: string
        description: The config to use for extraction once the document type is resolved.
  steps:
  - stepId: classifyDocument
    description: >-
      Classify the document bytes synchronously to discover which document type
      in the account the document is most similar to.
    operationId: classify-document-sync
    parameters:
    - name: Authorization
      in: header
      value: "Bearer $inputs.apiKey"
    requestBody:
      contentType: application/pdf
      payload: $inputs.documentBytes
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      documentTypeName: $response.body#/document_type/name
      documentTypeId: $response.body#/document_type/id
  - stepId: extractWithClassifiedType
    description: >-
      Submit an asynchronous URL extraction using the document type discovered
      by classification and the supplied config.
    operationId: provide-a-download-url-with-config
    parameters:
    - name: Authorization
      in: header
      value: "Bearer $inputs.apiKey"
    - name: document_type
      in: path
      value: $steps.classifyDocument.outputs.documentTypeName
    - name: config_name
      in: path
      value: $inputs.configName
    requestBody:
      contentType: application/json
      payload:
        document_url: $inputs.documentUrl
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      extractionId: $response.body#/id
      status: $response.body#/status
  outputs:
    documentTypeName: $steps.classifyDocument.outputs.documentTypeName
    extractionId: $steps.extractWithClassifiedType.outputs.extractionId