Adobe Creative Suite · Arazzo Workflow

Adobe PDF Services OCR PDF

Version 1.0.0

Upload a scanned PDF, apply OCR, poll the job, and get the searchable output download URI.

1 workflow 1 source API 1 provider

View Spec View on GitHub CreativeDesignGraphicsPhotographyVideoArazzoWorkflows

Provider

adobe-creative-suite

Workflows

ocr-pdf

Upload a scanned PDF, apply OCR, and resolve the searchable download URI.

Registers a source PDF asset, submits an OCR operation in the chosen locale, polls the job to a terminal state, and returns the download URI of the searchable PDF on success.

5 steps inputs: ocrLang outputs: errors, jobID, outputDownloadUri, sourceAssetID

createUpload

uploadAsset

Request an upload asset for the scanned source PDF.

submitOCR

ocrPDF

Submit an OCR operation against the uploaded PDF asset in the chosen locale. The API returns a job id for polling.

pollOperation

getOperationStatus

Poll the OCR operation status, repeating while it is in progress and branching once it is done or failed.

getOutput

getAsset

Retrieve the searchable PDF asset metadata and a fresh temporary download URI for the output file.

reportFailure

getOperationStatus

Surface the error details from the failed OCR operation.

Source API Descriptions

openapi

pdfServicesApi https://raw.githubusercontent.com/api-evangelist/adobe-creative-suite/refs/heads/main/openapi/adobe-creative-suite-pdf-services-openapi.yml

Arazzo Workflow Specification

arazzo: 1.0.1
info:
  title: Adobe PDF Services OCR PDF
  summary: Upload a scanned PDF, apply OCR, poll the job, and get the searchable output download URI.
  description: >-
    Applies optical character recognition to a scanned or image-based PDF to
    make its text searchable and selectable, using the Adobe PDF Services
    asynchronous job model. The workflow registers the source PDF by requesting
    an upload asset, submits an OCR operation with the chosen recognition
    locale, polls the operation status until it is done or failed, and resolves
    a temporary download URI for the searchable output. Every step spells out
    its request inline so the flow can be read and executed without opening the
    underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: pdfServicesApi
  url: ../openapi/adobe-creative-suite-pdf-services-openapi.yml
  type: openapi
workflows:
- workflowId: ocr-pdf
  summary: Upload a scanned PDF, apply OCR, and resolve the searchable download URI.
  description: >-
    Registers a source PDF asset, submits an OCR operation in the chosen locale,
    polls the job to a terminal state, and returns the download URI of the
    searchable PDF on success.
  inputs:
    type: object
    properties:
      ocrLang:
        type: string
        description: BCP 47 locale for OCR language model selection (e.g. en-US).
  steps:
  - stepId: createUpload
    description: Request an upload asset for the scanned source PDF.
    operationId: uploadAsset
    requestBody:
      contentType: application/json
      payload:
        mediaType: application/pdf
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      assetID: $response.body#/assetID
      uploadUri: $response.body#/uploadUri
  - stepId: submitOCR
    description: >-
      Submit an OCR operation against the uploaded PDF asset in the chosen
      locale. The API returns a job id for polling.
    operationId: ocrPDF
    requestBody:
      contentType: application/json
      payload:
        assetID: $steps.createUpload.outputs.assetID
        ocrLang: $inputs.ocrLang
    successCriteria:
    - condition: $statusCode == 201
    outputs:
      jobID: $response.body#/jobID
  - stepId: pollOperation
    description: >-
      Poll the OCR operation status, repeating while it is in progress and
      branching once it is done or failed.
    operationId: getOperationStatus
    parameters:
    - name: jobId
      in: path
      value: $steps.submitOCR.outputs.jobID
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      status: $response.body#/status
      assetID: $response.body#/asset/assetID
    onSuccess:
    - name: operationDone
      type: goto
      stepId: getOutput
      criteria:
      - context: $response.body
        condition: $.status == "done"
        type: jsonpath
    - name: operationFailed
      type: goto
      stepId: reportFailure
      criteria:
      - context: $response.body
        condition: $.status == "failed"
        type: jsonpath
    - name: stillRunning
      type: retry
      stepId: pollOperation
      retryAfter: 5
      retryLimit: 30
      criteria:
      - context: $response.body
        condition: $.status == "in progress"
        type: jsonpath
  - stepId: getOutput
    description: >-
      Retrieve the searchable PDF asset metadata and a fresh temporary download
      URI for the output file.
    operationId: getAsset
    parameters:
    - name: assetID
      in: path
      value: $steps.pollOperation.outputs.assetID
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      downloadUri: $response.body#/downloadUri
      size: $response.body#/size
    onSuccess:
    - name: done
      type: end
  - stepId: reportFailure
    description: Surface the error details from the failed OCR operation.
    operationId: getOperationStatus
    parameters:
    - name: jobId
      in: path
      value: $steps.submitOCR.outputs.jobID
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      errors: $response.body#/errors
  outputs:
    sourceAssetID: $steps.createUpload.outputs.assetID
    jobID: $steps.submitOCR.outputs.jobID
    outputDownloadUri: $steps.getOutput.outputs.downloadUri
    errors: $steps.reportFailure.outputs.errors