Amazon SageMaker · Arazzo Workflow

Amazon SageMaker Train Model and Poll Job

Version 1.0.0

Start a SageMaker training job and poll its status until it reaches a terminal state.

1 workflow 1 source API 1 provider
View Spec View on GitHub AIInferenceMachine LearningMLOpsTrainingArazzoWorkflows

Provider

amazon-sagemaker

Workflows

train-and-poll-job
Create a training job and poll it to a terminal status.
Submits a training job with the supplied algorithm, role, data, and compute configuration, then describes the job and loops until the training status is no longer InProgress.
2 steps inputs: inputDataConfig, instanceCount, instanceType, maxRuntimeInSeconds, roleArn, s3OutputPath, trainingImage, trainingInputMode, trainingJobName, volumeSizeInGB outputs: modelArtifacts, trainingJobArn, trainingJobStatus
1
createTrainingJob
CreateTrainingJob
Start a model training job using the supplied algorithm image, IAM role, input data, output location, and compute resources.
2
pollTrainingJob
DescribeTrainingJob
Describe the training job to read its current status. Repeat this step while the status remains InProgress and continue once it reaches a terminal state.

Source API Descriptions

Arazzo Workflow Specification

amazon-sagemaker-train-and-poll-job-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Amazon SageMaker Train Model and Poll Job
  summary: Start a SageMaker training job and poll its status until it reaches a terminal state.
  description: >-
    The foundational SageMaker training pattern. The workflow submits a training
    job with an algorithm specification, input and output data configuration,
    and compute resources, then repeatedly describes the job to watch its status
    transition through InProgress until it reaches a terminal Completed, Failed,
    or Stopped state. Each step spells out its request inline so the flow can be
    read and executed without opening the underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: sagemakerApi
  url: ../openapi/amazon-sagemaker-openapi.yml
  type: openapi
workflows:
- workflowId: train-and-poll-job
  summary: Create a training job and poll it to a terminal status.
  description: >-
    Submits a training job with the supplied algorithm, role, data, and compute
    configuration, then describes the job and loops until the training status is
    no longer InProgress.
  inputs:
    type: object
    required:
    - trainingJobName
    - trainingImage
    - roleArn
    - s3OutputPath
    - instanceType
    - instanceCount
    - volumeSizeInGB
    - maxRuntimeInSeconds
    properties:
      trainingJobName:
        type: string
        description: A unique name for the training job.
      trainingImage:
        type: string
        description: The registry path of the Docker image that contains the training algorithm.
      trainingInputMode:
        type: string
        description: The input mode the algorithm supports (Pipe or File).
        default: File
      roleArn:
        type: string
        description: The ARN of the IAM role SageMaker assumes to perform training.
      inputDataConfig:
        type: array
        description: The input data channels for the training job.
        items:
          type: object
      s3OutputPath:
        type: string
        description: The S3 path where SageMaker stores the model artifacts.
      instanceType:
        type: string
        description: The ML compute instance type to use for training.
      instanceCount:
        type: integer
        description: The number of ML compute instances to use.
      volumeSizeInGB:
        type: integer
        description: The size of the ML storage volume attached to each instance, in GB.
      maxRuntimeInSeconds:
        type: integer
        description: The maximum length of time, in seconds, that the training job can run.
  steps:
  - stepId: createTrainingJob
    description: >-
      Start a model training job using the supplied algorithm image, IAM role,
      input data, output location, and compute resources.
    operationId: CreateTrainingJob
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.CreateTrainingJob
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        TrainingJobName: $inputs.trainingJobName
        AlgorithmSpecification:
          TrainingImage: $inputs.trainingImage
          TrainingInputMode: $inputs.trainingInputMode
        RoleArn: $inputs.roleArn
        InputDataConfig: $inputs.inputDataConfig
        OutputDataConfig:
          S3OutputPath: $inputs.s3OutputPath
        ResourceConfig:
          InstanceType: $inputs.instanceType
          InstanceCount: $inputs.instanceCount
          VolumeSizeInGB: $inputs.volumeSizeInGB
        StoppingCondition:
          MaxRuntimeInSeconds: $inputs.maxRuntimeInSeconds
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      trainingJobArn: $response.body#/TrainingJobArn
  - stepId: pollTrainingJob
    description: >-
      Describe the training job to read its current status. Repeat this step
      while the status remains InProgress and continue once it reaches a
      terminal state.
    operationId: DescribeTrainingJob
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.DescribeTrainingJob
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        TrainingJobName: $inputs.trainingJobName
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      trainingJobStatus: $response.body#/TrainingJobStatus
      secondaryStatus: $response.body#/SecondaryStatus
      modelArtifacts: $response.body#/ModelArtifacts/S3ModelArtifacts
      failureReason: $response.body#/FailureReason
    onSuccess:
    - name: stillTraining
      type: goto
      stepId: pollTrainingJob
      criteria:
      - context: $response.body
        condition: $.TrainingJobStatus == "InProgress"
        type: jsonpath
    - name: trainingTerminal
      type: end
      criteria:
      - context: $response.body
        condition: $.TrainingJobStatus != "InProgress"
        type: jsonpath
  outputs:
    trainingJobArn: $steps.createTrainingJob.outputs.trainingJobArn
    trainingJobStatus: $steps.pollTrainingJob.outputs.trainingJobStatus
    modelArtifacts: $steps.pollTrainingJob.outputs.modelArtifacts