Amazon Polly · Arazzo Workflow

Amazon Polly List Voices and Synthesize Speech

Version 1.0.0

Pick an available voice for a language and synthesize speech with it.

1 workflow 1 source API 1 provider

View Spec View on GitHub AIMachine LearningSpeech SynthesisText-To-SpeechTTSVoiceSSMLNeural EngineGenerative AIArazzoWorkflows

Provider

amazon-polly

Workflows

list-voices-synthesize-speech

Discover an available voice for a language and synthesize speech with it.

Lists the voices available for the requested engine and language, selects the first match, and synthesizes the supplied text into an audio stream using that voice and output format.

2 steps inputs: amzDate, authorization, contentSha256, engine, languageCode, outputFormat, securityToken, text, textType outputs: requestCharacters, selectedVoiceId

describeVoices

DescribeVoices

List the voices available for the requested engine and language so a concrete voice id can be selected for synthesis.

synthesizeSpeech

SynthesizeSpeech

Synthesize the supplied text into an audio stream using the voice selected from the DescribeVoices response.

Source API Descriptions

openapi

pollyApi https://raw.githubusercontent.com/api-evangelist/amazon-polly/refs/heads/main/openapi/amazon-polly-openapi-original.yaml

Arazzo Workflow Specification

arazzo: 1.0.1
info:
  title: Amazon Polly List Voices and Synthesize Speech
  summary: Pick an available voice for a language and synthesize speech with it.
  description: >-
    The most common Amazon Polly text-to-speech pattern. The workflow first
    calls DescribeVoices, optionally filtered by an engine and language code, to
    discover an available voice. It captures the first returned voice id and
    then calls SynthesizeSpeech to turn the supplied input text into an audio
    stream using that voice. Every step spells out its request inline, including
    the AWS Signature Version 4 signing headers, so the flow can be read and
    executed without opening the underlying OpenAPI description.
  version: 1.0.0
sourceDescriptions:
- name: pollyApi
  url: ../openapi/amazon-polly-openapi-original.yaml
  type: openapi
workflows:
- workflowId: list-voices-synthesize-speech
  summary: Discover an available voice for a language and synthesize speech with it.
  description: >-
    Lists the voices available for the requested engine and language, selects
    the first match, and synthesizes the supplied text into an audio stream
    using that voice and output format.
  inputs:
    type: object
    required:
    - amzDate
    - authorization
    - text
    - outputFormat
    properties:
      amzDate:
        type: string
        description: The X-Amz-Date timestamp (e.g. 20260604T120000Z) used to sign the requests.
      authorization:
        type: string
        description: The full SigV4 Authorization header value for the request.
      contentSha256:
        type: string
        description: The X-Amz-Content-Sha256 hex digest of the request payload.
      securityToken:
        type: string
        description: Optional X-Amz-Security-Token for temporary credentials.
      engine:
        type: string
        description: Engine to filter voices and to use for synthesis (standard or neural).
      languageCode:
        type: string
        description: ISO language code to filter voices by (e.g. en-US).
      text:
        type: string
        description: The input text (plain text or SSML) to synthesize.
      textType:
        type: string
        description: Whether the input text is plain text or ssml.
      outputFormat:
        type: string
        description: The audio output format (mp3, ogg_vorbis, pcm, or json).
  steps:
  - stepId: describeVoices
    description: >-
      List the voices available for the requested engine and language so a
      concrete voice id can be selected for synthesis.
    operationId: DescribeVoices
    parameters:
    - name: Engine
      in: query
      value: $inputs.engine
    - name: LanguageCode
      in: query
      value: $inputs.languageCode
    - name: X-Amz-Date
      in: header
      value: $inputs.amzDate
    - name: Authorization
      in: header
      value: $inputs.authorization
    - name: X-Amz-Content-Sha256
      in: header
      value: $inputs.contentSha256
    - name: X-Amz-Security-Token
      in: header
      value: $inputs.securityToken
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      voices: $response.body#/Voices
      selectedVoiceId: $response.body#/Voices/0/Id
      nextToken: $response.body#/NextToken
  - stepId: synthesizeSpeech
    description: >-
      Synthesize the supplied text into an audio stream using the voice selected
      from the DescribeVoices response.
    operationId: SynthesizeSpeech
    parameters:
    - name: X-Amz-Date
      in: header
      value: $inputs.amzDate
    - name: Authorization
      in: header
      value: $inputs.authorization
    - name: X-Amz-Content-Sha256
      in: header
      value: $inputs.contentSha256
    - name: X-Amz-Security-Token
      in: header
      value: $inputs.securityToken
    requestBody:
      contentType: application/json
      payload:
        Engine: $inputs.engine
        LanguageCode: $inputs.languageCode
        OutputFormat: $inputs.outputFormat
        Text: $inputs.text
        TextType: $inputs.textType
        VoiceId: $steps.describeVoices.outputs.selectedVoiceId
    successCriteria:
    - condition: $statusCode == 200
    outputs:
      contentType: $response.body#/ContentType
      requestCharacters: $response.body#/RequestCharacters
  outputs:
    selectedVoiceId: $steps.describeVoices.outputs.selectedVoiceId
    requestCharacters: $steps.synthesizeSpeech.outputs.requestCharacters