DataHub · Arazzo Workflow

DataHub Upsert Dataset and Verify

Version 1.0.0

Write a dataset's properties aspect into the metadata graph, then read the entity back to confirm the write landed.

1 workflow 1 source API 1 provider
View Spec View on GitHub Data CatalogData DiscoveryData GovernanceData LineageMetadataArazzoWorkflows

Provider

datahub

Workflows

upsert-dataset
Create or update a dataset's datasetProperties aspect and read it back.
Writes the datasetProperties aspect for a dataset URN and then fetches the latest aspects for that same URN to verify the entity is present.
2 steps inputs: createEntityIfNotExists, datasetProperties, entityUrn, token outputs: aspects, entityUrn
1
writeDataset
upsertEntities
Upsert the datasetProperties aspect for the supplied dataset URN into the DataHub metadata graph.
2
readDataset
getEntityLatestAspects
Retrieve the latest aspects for the dataset URN that was just written to confirm the entity exists in the metadata graph.

Source API Descriptions

Arazzo Workflow Specification

datahub-upsert-dataset-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: DataHub Upsert Dataset and Verify
  summary: Write a dataset's properties aspect into the metadata graph, then read the entity back to confirm the write landed.
  description: >-
    A foundational catalog ingestion pattern for DataHub. The workflow upserts a
    dataset entity by writing its datasetProperties aspect through the OpenAPI
    entities endpoint, then immediately retrieves the latest aspects for the same
    URN to confirm the entity now exists in the metadata graph. Every step spells
    out its request inline so the flow can be read and executed without opening
    the underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: datahubApi
  url: ../openapi/datahub-openapi-openapi.yml
  type: openapi
workflows:
- workflowId: upsert-dataset
  summary: Create or update a dataset's datasetProperties aspect and read it back.
  description: >-
    Writes the datasetProperties aspect for a dataset URN and then fetches the
    latest aspects for that same URN to verify the entity is present.
  inputs:
    type: object
    required:
    - token
    - entityUrn
    - datasetProperties
    properties:
      token:
        type: string
        description: DataHub personal access token passed as a Bearer token.
      entityUrn:
        type: string
        description: The dataset URN to upsert (e.g. urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)).
      datasetProperties:
        type: object
        description: The datasetProperties aspect value to write for the dataset.
      createEntityIfNotExists:
        type: boolean
        description: When true, only create the entity if it does not already exist.
  steps:
  - stepId: writeDataset
    description: >-
      Upsert the datasetProperties aspect for the supplied dataset URN into the
      DataHub metadata graph.
    operationId: upsertEntities
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.token
    - name: createEntityIfNotExists
      in: query
      value: $inputs.createEntityIfNotExists
    requestBody:
      contentType: application/json
      payload:
      - entityUrn: $inputs.entityUrn
        entityType: dataset
        aspectName: datasetProperties
        aspect: $inputs.datasetProperties
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      writtenUrn: $response.body#/0/entityUrn
  - stepId: readDataset
    description: >-
      Retrieve the latest aspects for the dataset URN that was just written to
      confirm the entity exists in the metadata graph.
    operationId: getEntityLatestAspects
    parameters:
    - name: Authorization
      in: header
      value: Bearer $inputs.token
    - name: urns
      in: query
      value: $steps.writeDataset.outputs.writtenUrn
    - name: aspectNames
      in: query
      value:
      - datasetProperties
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      entityUrn: $response.body#/0/entityUrn
      aspects: $response.body#/0/aspects
  outputs:
    entityUrn: $steps.readDataset.outputs.entityUrn
    aspects: $steps.readDataset.outputs.aspects