NVIDIA NIM · Arazzo Workflow
NVIDIA NIM Generate Image And Caption
Version 1.0.0
Generate an image from a text prompt, then caption the generated image with a vision-language model.
View Spec
View on GitHub
AIArtificial IntelligenceInferenceMicroservicesLLMFoundation ModelsGPUKubernetesNVIDIAOpenAI CompatibleArazzoWorkflows
Provider
Workflows
generate-image-and-caption
Generate an image, then describe it with a VLM to produce a caption.
Generates an image from a prompt with a visual generative model, then passes the returned base64 artifact to a vision-language model for captioning.
1
generateImage
generateImage
Generate an image from the text prompt with the selected publisher/model visual generative NIM.
2
captionImage
createVisionChatCompletion
Caption the freshly generated image by passing its base64 artifact as a data-URI image_url into a vision-language model.
Source API Descriptions
openapi