Dataiku · Arazzo Workflow

Dataiku Build Dataset and Poll Job

Version 1.0.0

Start a build job for a dataset output and poll the job until it reaches a terminal state.

1 workflow 1 source API 1 provider
View Spec View on GitHub AnalyticsArtificial IntelligenceData PlatformData ScienceMachine LearningArazzoWorkflows

Provider

dataiku

Workflows

build-dataset-job
Start a dataset build job and poll it to a terminal state.
Starts a build job for one dataset output and polls getJob until the job is DONE, FAILED, or ABORTED.
2 steps inputs: apiKey, datasetName, projectKey outputs: finalState, jobId
1
startBuild
startJob
Start a build job targeting the dataset as a single output.
2
pollJob
getJob
Poll the job status. While the job is NOT_STARTED or RUNNING, loop back and poll again; once it reaches DONE, FAILED, or ABORTED, the workflow branches to completion.

Source API Descriptions

Arazzo Workflow Specification

dataiku-build-dataset-job-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Dataiku Build Dataset and Poll Job
  summary: Start a build job for a dataset output and poll the job until it reaches a terminal state.
  description: >-
    Builds a dataset in Dataiku DSS and waits for the build to finish. The
    workflow starts a build job targeting a single dataset output, then polls
    the job status endpoint, looping while the job is NOT_STARTED or RUNNING and
    branching to completion when the job reaches DONE, FAILED, or ABORTED. Every
    step inlines its request so the flow can be executed without opening the
    underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: dssPublicApi
  url: ../openapi/dataiku-public-api-openapi.yml
  type: openapi
workflows:
- workflowId: build-dataset-job
  summary: Start a dataset build job and poll it to a terminal state.
  description: >-
    Starts a build job for one dataset output and polls getJob until the job is
    DONE, FAILED, or ABORTED.
  inputs:
    type: object
    required:
    - apiKey
    - projectKey
    - datasetName
    properties:
      apiKey:
        type: string
        description: DSS API key passed as a Bearer token in the Authorization header.
      projectKey:
        type: string
        description: Project key that owns the dataset to build.
      datasetName:
        type: string
        description: Name of the dataset to build (used as the output id).
  steps:
  - stepId: startBuild
    description: Start a build job targeting the dataset as a single output.
    operationId: startJob
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.apiKey
    - name: projectKey
      in: path
      value: $inputs.projectKey
    requestBody:
      contentType: application/json
      payload:
        outputs:
        - projectKey: $inputs.projectKey
          id: $inputs.datasetName
          type: DATASET
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      jobId: $response.body#/id
  - stepId: pollJob
    description: >-
      Poll the job status. While the job is NOT_STARTED or RUNNING, loop back
      and poll again; once it reaches DONE, FAILED, or ABORTED, the workflow
      branches to completion.
    operationId: getJob
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.apiKey
    - name: projectKey
      in: path
      value: $inputs.projectKey
    - name: jobId
      in: path
      value: $steps.startBuild.outputs.jobId
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      state: $response.body#/state
      endTime: $response.body#/endTime
    onSuccess:
    - name: stillBuilding
      type: goto
      stepId: pollJob
      criteria:
      - context: $response.body
        condition: $.state == "RUNNING" || $.state == "NOT_STARTED"
        type: jsonpath
    - name: terminal
      type: end
      criteria:
      - context: $response.body
        condition: $.state == "DONE" || $.state == "FAILED" || $.state == "ABORTED"
        type: jsonpath
  outputs:
    jobId: $steps.startBuild.outputs.jobId
    finalState: $steps.pollJob.outputs.state