Bright Data · Arazzo Workflow

Bright Data Web Archive Search and Deliver

Version 1.0.0

Submit a Web Archive search, poll it until ready, and deliver the corpus to cloud.

1 workflow 1 source API 1 provider

View Spec View on GitHub Web DataWeb ScrapingProxyResidential ProxyDatacenter ProxyISP ProxyMobile ProxySERPWeb UnlockerScraping BrowserDataset MarketplaceMCPAI AgentsArazzoWorkflows

Provider

bright-data

Workflows

search-and-deliver-archive

Submit an archive search, poll until ready, then deliver the corpus.

Submits a historical web archive search for a domain, polls the search status until it is ready, and schedules delivery of the matching corpus to a configured cloud destination.

3 steps inputs: apiToken, bucket, credentials, destinationType, domain, format, query outputs: deliveryResult, searchId, status

submitSearch

submitArchiveSearch

Submit the archive search for the domain, returning a search id used to poll status and deliver.

pollSearch

getArchiveSearch

Poll the search status until it reaches a terminal state. A ready status means the matching corpus can be delivered.

deliverArchive

deliverArchiveToCloud

Schedule delivery of the matching corpus to the configured cloud destination in the requested format.

Source API Descriptions

openapi

webArchiveApi https://raw.githubusercontent.com/api-evangelist/bright-data/refs/heads/main/openapi/bright-data-web-archive-api-openapi.yml

Arazzo Workflow Specification

arazzo: 1.0.1
info:
  title: Bright Data Web Archive Search and Deliver
  summary: Submit a Web Archive search, poll it until ready, and deliver the corpus to cloud.
  description: >-
    The historical Web Archive corpus pattern. The workflow submits a domain
    search against Bright Data's historical web index, polls the search status
    until it reaches a terminal state, and then schedules delivery of the
    matching corpus to a cloud destination. Every step spells out its request
    inline so the flow can be read and executed without opening the underlying
    OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: webArchiveApi
  url: ../openapi/bright-data-web-archive-api-openapi.yml
  type: openapi
workflows:
- workflowId: search-and-deliver-archive
  summary: Submit an archive search, poll until ready, then deliver the corpus.
  description: >-
    Submits a historical web archive search for a domain, polls the search
    status until it is ready, and schedules delivery of the matching corpus to
    a configured cloud destination.
  inputs:
    type: object
    required:
    - apiToken
    - domain
    - destinationType
    - bucket
    properties:
      apiToken:
        type: string
        description: Bright Data API token used as a Bearer credential.
      domain:
        type: string
        description: Domain to search the historical web index for.
      query:
        type: string
        description: Optional full-text query to constrain the search.
      destinationType:
        type: string
        description: Delivery destination type (s3, azure, gcs).
      bucket:
        type: string
        description: Destination bucket or container name.
      credentials:
        type: object
        description: Destination credentials object.
      format:
        type: string
        description: Delivery format (json, ndjson, parquet).
  steps:
  - stepId: submitSearch
    description: >-
      Submit the archive search for the domain, returning a search id used to
      poll status and deliver.
    operationId: submitArchiveSearch
    parameters:
    - name: Authorization
      in: header
      value: "Bearer $inputs.apiToken"
    requestBody:
      contentType: application/json
      payload:
        domain: $inputs.domain
        query: $inputs.query
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      searchId: $response.body#/search_id
  - stepId: pollSearch
    description: >-
      Poll the search status until it reaches a terminal state. A ready status
      means the matching corpus can be delivered.
    operationId: getArchiveSearch
    parameters:
    - name: Authorization
      in: header
      value: "Bearer $inputs.apiToken"
    - name: search_id
      in: path
      value: $steps.submitSearch.outputs.searchId
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      status: $response.body#/status
      records: $response.body#/records
    onSuccess:
    - name: searchReady
      type: goto
      stepId: deliverArchive
      criteria:
      - context: $response.body
        condition: $.status == "ready"
        type: jsonpath
    - name: keepPolling
      type: goto
      stepId: pollSearch
      criteria:
      - context: $response.body
        condition: $.status == "pending" || $.status == "running"
        type: jsonpath
  - stepId: deliverArchive
    description: >-
      Schedule delivery of the matching corpus to the configured cloud
      destination in the requested format.
    operationId: deliverArchiveToCloud
    parameters:
    - name: Authorization
      in: header
      value: "Bearer $inputs.apiToken"
    requestBody:
      contentType: application/json
      payload:
        search_id: $steps.submitSearch.outputs.searchId
        destination:
          type: $inputs.destinationType
          bucket: $inputs.bucket
          credentials: $inputs.credentials
        format: $inputs.format
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      deliveryResult: $response.body
  outputs:
    searchId: $steps.submitSearch.outputs.searchId
    status: $steps.pollSearch.outputs.status
    deliveryResult: $steps.deliverArchive.outputs.deliveryResult