Amazon SageMaker · Arazzo Workflow

Amazon SageMaker Train Then Deploy

Version 1.0.0

Train a model to completion, then register it from the produced artifacts and stand up a hosted endpoint.

1 workflow 1 source API 1 provider
View Spec View on GitHub AIInferenceMachine LearningMLOpsTrainingArazzoWorkflows

Provider

amazon-sagemaker

Workflows

train-then-deploy
Train a model and, on success, deploy the trained artifacts to an endpoint.
Submits a training job, polls it to a terminal status, and branches: when training completes it registers a model from the produced artifacts, creates an endpoint configuration and endpoint, and polls the endpoint to service; otherwise it ends.
6 steps inputs: endpointConfigName, endpointName, hostingInstanceType, inferenceImage, initialInstanceCount, inputDataConfig, instanceCount, instanceType, maxRuntimeInSeconds, modelName, roleArn, s3OutputPath, trainingImage, trainingInputMode, trainingJobName, variantName, volumeSizeInGB outputs: endpointArn, endpointStatus, modelArn, modelArtifacts, trainingJobArn
1
createTrainingJob
CreateTrainingJob
Start a model training job using the supplied algorithm image, IAM role, input data, output location, and compute resources.
2
pollTrainingJob
DescribeTrainingJob
Describe the training job to read its current status. Repeat while the status remains InProgress; on a terminal status branch on whether training completed.
3
createModel
CreateModel
Register a model from the inference container and the S3 artifacts the training job produced.
4
createEndpointConfig
CreateEndpointConfig
Define an endpoint configuration that places the trained model on a single production variant.
5
createEndpoint
CreateEndpoint
Create a hosted endpoint from the endpoint configuration so SageMaker provisions resources and deploys the trained model.
6
pollEndpoint
DescribeEndpoint
Describe the endpoint and loop while it remains in the Creating state, continuing once it reaches a terminal state.

Source API Descriptions

Arazzo Workflow Specification

amazon-sagemaker-train-then-deploy-workflow.yml Raw ↑
arazzo: 1.0.1
info:
  title: Amazon SageMaker Train Then Deploy
  summary: Train a model to completion, then register it from the produced artifacts and stand up a hosted endpoint.
  description: >-
    The end-to-end SageMaker lifecycle in a single flow. The workflow starts a
    training job, polls it until training completes, and only when the job
    reaches Completed does it register a model from the produced S3 artifacts,
    build an endpoint configuration, and create a hosted endpoint that it then
    polls to service. If training does not complete the flow ends without
    deploying. Each step spells out its request inline so the flow can be read
    and executed without opening the underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: sagemakerApi
  url: ../openapi/amazon-sagemaker-openapi.yml
  type: openapi
workflows:
- workflowId: train-then-deploy
  summary: Train a model and, on success, deploy the trained artifacts to an endpoint.
  description: >-
    Submits a training job, polls it to a terminal status, and branches: when
    training completes it registers a model from the produced artifacts, creates
    an endpoint configuration and endpoint, and polls the endpoint to service;
    otherwise it ends.
  inputs:
    type: object
    required:
    - trainingJobName
    - trainingImage
    - roleArn
    - s3OutputPath
    - instanceType
    - instanceCount
    - volumeSizeInGB
    - maxRuntimeInSeconds
    - modelName
    - inferenceImage
    - endpointConfigName
    - variantName
    - initialInstanceCount
    - hostingInstanceType
    - endpointName
    properties:
      trainingJobName:
        type: string
        description: A unique name for the training job.
      trainingImage:
        type: string
        description: The registry path of the Docker image that contains the training algorithm.
      trainingInputMode:
        type: string
        description: The input mode the algorithm supports (Pipe or File).
        default: File
      roleArn:
        type: string
        description: The ARN of the IAM role SageMaker assumes for training and hosting.
      inputDataConfig:
        type: array
        description: The input data channels for the training job.
        items:
          type: object
      s3OutputPath:
        type: string
        description: The S3 path where SageMaker stores the model artifacts.
      instanceType:
        type: string
        description: The ML compute instance type to use for training.
      instanceCount:
        type: integer
        description: The number of ML compute instances to use for training.
      volumeSizeInGB:
        type: integer
        description: The size of the ML storage volume attached to each training instance, in GB.
      maxRuntimeInSeconds:
        type: integer
        description: The maximum length of time, in seconds, that the training job can run.
      modelName:
        type: string
        description: A unique name for the model to register from the trained artifacts.
      inferenceImage:
        type: string
        description: The registry path of the Docker image that contains the inference code.
      endpointConfigName:
        type: string
        description: A unique name for the endpoint configuration.
      variantName:
        type: string
        description: The name of the production variant.
      initialInstanceCount:
        type: integer
        description: The initial number of instances to launch for the variant.
      hostingInstanceType:
        type: string
        description: The ML compute instance type to deploy for hosting.
      endpointName:
        type: string
        description: A unique name for the endpoint.
  steps:
  - stepId: createTrainingJob
    description: >-
      Start a model training job using the supplied algorithm image, IAM role,
      input data, output location, and compute resources.
    operationId: CreateTrainingJob
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.CreateTrainingJob
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        TrainingJobName: $inputs.trainingJobName
        AlgorithmSpecification:
          TrainingImage: $inputs.trainingImage
          TrainingInputMode: $inputs.trainingInputMode
        RoleArn: $inputs.roleArn
        InputDataConfig: $inputs.inputDataConfig
        OutputDataConfig:
          S3OutputPath: $inputs.s3OutputPath
        ResourceConfig:
          InstanceType: $inputs.instanceType
          InstanceCount: $inputs.instanceCount
          VolumeSizeInGB: $inputs.volumeSizeInGB
        StoppingCondition:
          MaxRuntimeInSeconds: $inputs.maxRuntimeInSeconds
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      trainingJobArn: $response.body#/TrainingJobArn
  - stepId: pollTrainingJob
    description: >-
      Describe the training job to read its current status. Repeat while the
      status remains InProgress; on a terminal status branch on whether training
      completed.
    operationId: DescribeTrainingJob
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.DescribeTrainingJob
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        TrainingJobName: $inputs.trainingJobName
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      trainingJobStatus: $response.body#/TrainingJobStatus
      modelArtifacts: $response.body#/ModelArtifacts/S3ModelArtifacts
      failureReason: $response.body#/FailureReason
    onSuccess:
    - name: stillTraining
      type: goto
      stepId: pollTrainingJob
      criteria:
      - context: $response.body
        condition: $.TrainingJobStatus == "InProgress"
        type: jsonpath
    - name: trainingCompleted
      type: goto
      stepId: createModel
      criteria:
      - context: $response.body
        condition: $.TrainingJobStatus == "Completed"
        type: jsonpath
    - name: trainingFailed
      type: end
      criteria:
      - context: $response.body
        condition: $.TrainingJobStatus != "Completed" && $.TrainingJobStatus != "InProgress"
        type: jsonpath
  - stepId: createModel
    description: >-
      Register a model from the inference container and the S3 artifacts the
      training job produced.
    operationId: CreateModel
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.CreateModel
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        ModelName: $inputs.modelName
        PrimaryContainer:
          Image: $inputs.inferenceImage
          ModelDataUrl: $steps.pollTrainingJob.outputs.modelArtifacts
        ExecutionRoleArn: $inputs.roleArn
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      modelArn: $response.body#/ModelArn
  - stepId: createEndpointConfig
    description: >-
      Define an endpoint configuration that places the trained model on a single
      production variant.
    operationId: CreateEndpointConfig
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.CreateEndpointConfig
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        EndpointConfigName: $inputs.endpointConfigName
        ProductionVariants:
        - VariantName: $inputs.variantName
          ModelName: $inputs.modelName
          InitialInstanceCount: $inputs.initialInstanceCount
          InstanceType: $inputs.hostingInstanceType
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      endpointConfigArn: $response.body#/EndpointConfigArn
  - stepId: createEndpoint
    description: >-
      Create a hosted endpoint from the endpoint configuration so SageMaker
      provisions resources and deploys the trained model.
    operationId: CreateEndpoint
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.CreateEndpoint
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        EndpointName: $inputs.endpointName
        EndpointConfigName: $inputs.endpointConfigName
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      endpointArn: $response.body#/EndpointArn
  - stepId: pollEndpoint
    description: >-
      Describe the endpoint and loop while it remains in the Creating state,
      continuing once it reaches a terminal state.
    operationId: DescribeEndpoint
    parameters:
    - name: X-Amz-Target
      in: header
      value: SageMaker.DescribeEndpoint
    requestBody:
      contentType: application/x-amz-json-1.1
      payload:
        EndpointName: $inputs.endpointName
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      endpointStatus: $response.body#/EndpointStatus
      failureReason: $response.body#/FailureReason
    onSuccess:
    - name: stillCreating
      type: goto
      stepId: pollEndpoint
      criteria:
      - context: $response.body
        condition: $.EndpointStatus == "Creating"
        type: jsonpath
    - name: endpointTerminal
      type: end
      criteria:
      - context: $response.body
        condition: $.EndpointStatus != "Creating"
        type: jsonpath
  outputs:
    trainingJobArn: $steps.createTrainingJob.outputs.trainingJobArn
    modelArtifacts: $steps.pollTrainingJob.outputs.modelArtifacts
    modelArn: $steps.createModel.outputs.modelArn
    endpointArn: $steps.createEndpoint.outputs.endpointArn
    endpointStatus: $steps.pollEndpoint.outputs.endpointStatus