DataHub · Arazzo Workflow

DataHub Tag a Dataset

Version 1.0.0

Confirm a dataset exists, then write its globalTags aspect to apply governance tags.

1 workflow 1 source API 1 provider
View Spec View on GitHub Data CatalogData DiscoveryData GovernanceData LineageMetadataArazzoWorkflows

Provider

datahub

Workflows

tag-dataset
Verify a dataset entity and apply a globalTags aspect to it.
Reads the latest aspects for a dataset URN to confirm it exists, then writes the globalTags aspect to associate one or more tags with the dataset.
2 steps inputs: entityUrn, globalTags, token outputs: taggedUrn
1
confirmDataset
getEntityLatestAspects
Retrieve the latest aspects for the dataset URN to confirm the entity exists before applying tags.
2
applyTags
upsertEntities
Upsert the globalTags aspect for the confirmed dataset URN to associate the supplied tags with the dataset.

Source API Descriptions

Arazzo Workflow Specification

datahub-tag-dataset-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: DataHub Tag a Dataset
  summary: Confirm a dataset exists, then write its globalTags aspect to apply governance tags.
  description: >-
    A common cataloging task in DataHub: applying tags to a dataset for
    discovery and governance. The workflow first reads the latest aspects for the
    target dataset URN to confirm the entity exists, and then upserts the
    globalTags aspect carrying the desired tag associations. Every step spells out
    its request inline so the flow can be read and executed without opening the
    underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: datahubApi
  url: ../openapi/datahub-openapi-openapi.yml
  type: openapi
workflows:
- workflowId: tag-dataset
  summary: Verify a dataset entity and apply a globalTags aspect to it.
  description: >-
    Reads the latest aspects for a dataset URN to confirm it exists, then writes
    the globalTags aspect to associate one or more tags with the dataset.
  inputs:
    type: object
    required:
    - token
    - entityUrn
    - globalTags
    properties:
      token:
        type: string
        description: DataHub personal access token passed as a Bearer token.
      entityUrn:
        type: string
        description: The dataset URN to tag.
      globalTags:
        type: object
        description: The globalTags aspect value containing the tag associations to apply.
  steps:
  - stepId: confirmDataset
    description: >-
      Retrieve the latest aspects for the dataset URN to confirm the entity
      exists before applying tags.
    operationId: getEntityLatestAspects
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.token
    - name: urns
      in: query
      value: $inputs.entityUrn
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      confirmedUrn: $response.body#/0/entityUrn
    onSuccess:
    - name: datasetExists
      type: goto
      stepId: applyTags
      criteria:
      - context: $response.body
        condition: $.length > 0
        type: jsonpath
  - stepId: applyTags
    description: >-
      Upsert the globalTags aspect for the confirmed dataset URN to associate
      the supplied tags with the dataset.
    operationId: upsertEntities
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.token
    requestBody:
      contentType: application/json
      payload:
      - entityUrn: $steps.confirmDataset.outputs.confirmedUrn
        entityType: dataset
        aspectName: globalTags
        aspect: $inputs.globalTags
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      taggedUrn: $response.body#/0/entityUrn
  outputs:
    taggedUrn: $steps.applyTags.outputs.taggedUrn