Hyperbrowser · Arazzo Workflow

Hyperbrowser Scrape and Retrieve

Version 1.0.0

Start a scrape job for a single URL, poll its status, then fetch the result.

1 workflow 1 source API 1 provider
View Spec View on GitHub Headless BrowserBrowser InfrastructureWeb ScrapingWeb CrawlingData ExtractionAI AgentsBrowser AutomationComputer UseStealthProxiesCAPTCHA SolvingMCPHyperAgentX402ArazzoWorkflows

Provider

hyperbrowser

Workflows

scrape-and-retrieve
Scrape one page asynchronously and return its extracted content.
Submits a scrape job, waits for it to complete by polling status, and pulls the scraped page data once the job finishes.
3 steps inputs: apiKey, url outputs: jobId, links, markdown
1
startScrape
post-api-scrape
Submit a scrape job for the supplied URL and capture the returned jobId used to track and retrieve results.
2
pollStatus
get-api-scrape-id-status
Poll the scrape job status. The status moves through pending and running before reaching completed, failed, or stopped; loop back while still in progress and branch out on a terminal state.
3
getResult
get-api-scrape-id
Fetch the completed scrape job to return the extracted markdown, html, links, and page metadata.

Source API Descriptions

Arazzo Workflow Specification

hyperbrowser-scrape-and-retrieve-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Hyperbrowser Scrape and Retrieve
  summary: Start a scrape job for a single URL, poll its status, then fetch the result.
  description: >-
    The canonical asynchronous scrape pattern. The workflow starts a scrape job
    for a single URL with the requested output formats, polls the lightweight
    status endpoint until the job reaches a terminal state, and then fetches the
    full job record containing the scraped markdown, html, links, and metadata.
    Every step spells out its request inline so the flow can be read and
    executed without opening the underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: scrapeApi
  url: ../openapi/hyperbrowser-scrape-api-openapi.yml
  type: openapi
workflows:
- workflowId: scrape-and-retrieve
  summary: Scrape one page asynchronously and return its extracted content.
  description: >-
    Submits a scrape job, waits for it to complete by polling status, and pulls
    the scraped page data once the job finishes.
  inputs:
    type: object
    required:
    - apiKey
    - url
    properties:
      apiKey:
        type: string
        description: Hyperbrowser account API key sent in the x-api-key header.
      url:
        type: string
        description: The URL of the page to scrape.
  steps:
  - stepId: startScrape
    description: >-
      Submit a scrape job for the supplied URL and capture the returned jobId
      used to track and retrieve results.
    operationId: post-api-scrape
    parameters:
    - name: x-api-key
      in: header
      value: $inputs.apiKey
    requestBody:
      contentType: application/json
      payload:
        url: $inputs.url
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      jobId: $response.body#/jobId
  - stepId: pollStatus
    description: >-
      Poll the scrape job status. The status moves through pending and running
      before reaching completed, failed, or stopped; loop back while still in
      progress and branch out on a terminal state.
    operationId: get-api-scrape-id-status
    parameters:
    - name: x-api-key
      in: header
      value: $inputs.apiKey
    - name: id
      in: path
      value: $steps.startScrape.outputs.jobId
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      status: $response.body#/status
    onSuccess:
    - name: scrapeComplete
      type: goto
      stepId: getResult
      criteria:
      - context: $response.body
        condition: $.status == "completed"
        type: jsonpath
    - name: scrapeRunning
      type: goto
      stepId: pollStatus
      criteria:
      - context: $response.body
        condition: $.status == "pending" || $.status == "running"
        type: jsonpath
  - stepId: getResult
    description: >-
      Fetch the completed scrape job to return the extracted markdown, html,
      links, and page metadata.
    operationId: get-api-scrape-id
    parameters:
    - name: x-api-key
      in: header
      value: $inputs.apiKey
    - name: id
      in: path
      value: $steps.startScrape.outputs.jobId
    successCriteria:
    - condition: $statusCode == 200
    - context: $response.body
      condition: $.status == "completed"
      type: jsonpath
    outputs:
      status: $response.body#/status
      markdown: $response.body#/data/markdown
      links: $response.body#/data/links
      metadata: $response.body#/data/metadata
  outputs:
    jobId: $steps.startScrape.outputs.jobId
    markdown: $steps.getResult.outputs.markdown
    links: $steps.getResult.outputs.links