NVIDIA NIM · Arazzo Workflow
NVIDIA NIM Vision Describe And Summarize
Version 1.0.0
Describe an image with a vision-language model, then condense the description into a short caption with an LLM.
View Spec
View on GitHub
AIArtificial IntelligenceInferenceMicroservicesLLMFoundation ModelsGPUKubernetesNVIDIAOpenAI CompatibleArazzoWorkflows
Provider
Workflows
vision-describe-and-summarize
Describe an image with a VLM, then summarize the description into a caption.
Sends an image to a vision-language model for a detailed description, then asks a text chat model to condense that description into a short caption.
1
describeImage
createVisionChatCompletion
Ask the vision-language model to produce a detailed description of the image supplied as an image_url content part.
2
summarizeDescription
createChatCompletion
Condense the detailed image description into a single short caption using a text-only chat model.
Source API Descriptions
openapi