Describe Scenes
What it does
Use this when you are preparing a video from storyboard frames, keyframes, or generated visuals and you need clean scene text before editing starts. Send an ordered list of scene images and the API returns one short description per scene in the same order, so you can turn loose visual assets into structured scene data for shot planning, timeline prep, or the next postproduction step.
If you also provide metadata.narrationText, the output stays more consistent with the story you want the final video to tell. That is especially useful when an editor or AI agent needs scene text that is brief, stable, and ready to reuse instead of writing those descriptions manually.
When that narration context is strong enough, the response may also carry additiveinterpretation, anchorText, and startCueText fields. For downstream narration alignment, pass anchorText and startCueText intoscene-timestamps.
How it works
Think of this endpoint as a scene description normalizer. You send the images in the order you want to keep, optionally add full-story narration for context, and you get back one description per image in that same order.
From a caller perspective, the main contract is simple: images[] defines the scene order,sceneIds[] lets you carry your own identifiers through the response, sceneOptions[]lets paid callers mark a few scenes for deeper analysis, and indexis the stable key you can always rely on when mapping results back into your editor, database, or agent workflow.
If you already know the story arc, add metadata.narrationText. If you only have images, omit it and still get usable output. In both cases the API returns concise scene text rather than long prose, so the result is ready for the next step in the pipeline instead of needing another cleanup pass.
What comes back
- One result per uploaded image. Response order matches upload order exactly.
indexis the canonical mapping key. Use it to join API output back to your editor or asset list.sceneIds[]are echoed when supplied. Helpful for UIs, but still treatindexas source of truth.- Cue hints are additive.
interpretation,anchorText, andstartCueTextmay appear when narration context supports them, but the baseline response shape still works without them. - No narration text is returned. Public contract returns only
narrationHash, not the raw text.
Why use it?
- Prepare scene text from images. Useful when you have storyboard frames, thumbnails, or keyframes but not structured scene descriptions yet.
- Feed the next postproduction step. The output is designed to become input for
scene-timestamps, ideally carryinganchorTextandstartCueTextwhen available. - Keep scene mapping deterministic. Upload order is preserved end to end, which makes automation simpler.
- Keep descriptions within editor-friendly limits. The endpoint enforces per-tier description length instead of returning arbitrary long prose.
Examples
cURL example
curl -X POST 'https://api.creatornode.io/postproduction/v1/describe-scenes' \
-H 'X-API-Key: YOUR_KEY' \
-F 'images=@scene-01.png' \
-F 'images=@scene-02.png' \
-F 'images=@scene-03.png' \
-F 'metadata={"narrationText":"A cyclist crosses a bridge, enters a crowded market, then reaches a sunset skyline.","sceneIds":["scene-1","scene-2","scene-3"],"sceneOptions":[{}, {"extraDetail":true}, {}],"hints":{"languageCode":"en","style":"normal"}}'Response excerpt
{
"success": true,
"data": {
"narrationHash": "a0a14d2d5b8d0f1d6f3c0d9a66f41c2c0a7fd9d3dbe8e6af0a0d0123456789ab",
"scenes": [
{ "index": 0, "id": "scene-1", "imageName": "scene-01.png", "description": "Cyclist crossing a bridge in early morning light." },
{ "index": 1, "id": "scene-2", "imageName": "scene-02.png", "description": "Crowded market lane with fast movement and city energy." },
{ "index": 2, "id": "scene-3", "imageName": "scene-03.png", "description": "Sunset skyline closing the sequence with a calm wide shot." }
]
},
"meta": {
"requestId": "req_123",
"processingTimeMs": 410,
"imageCount": 3
}
}Tips & tricks
- You can omit
narrationText. The endpoint still works from images alone; narration is a quality hint, not a required field. - Give scenes in final timeline order. The endpoint does not detect or reorder scenes for you.
- Use
styledeliberately.shortis better for labels,normalfor default usage, anddetailedwhen you want richer scene text on paid tiers. - Provide
sceneIds[]when your UI already has stable IDs. They come back in the response and save one mapping step client-side. - Use
sceneOptions[].extraDetailsparingly. Premium and Enterprise can mark up to 5 ambiguous or important scenes for deeper analysis at +1 credit per marked scene. - See full schema in API docs. OpenAPI reference: Describe Scenes docs.
Cost & Limits
| Feature | Detail |
|---|---|
| Base cost | 5 credits (includes up to 5 images) |
| Extra cost | +1 credit per image above 5 |
| Enhanced detail scenes | +1 credit per scene marked with sceneOptions[].extraDetail (max 5, paid tiers only) |
| Input format | multipart/form-data (metadata JSON + images[]) |
| Best paired with | Scene Timestamps for cut-point alignment on narration audio |
Tier Limits
| Limit | Free | Premium |
|---|---|---|
| Max images per request | 5 | 50 |
| Max image size | 2 MB | 5 MB |
| Max total image payload | 5 MB | 50 MB |
| Max narration length | 2 000 chars | 20 000 chars |
| Max description length / scene | 200 chars | 500 chars |
| Max enhanced-detail scenes | Not available | 5 |
Other Endpoints
Compose OTIO
Compose an OpenTimelineIO timeline from an explicit manifest and deterministic editorial timing.
Scene Timestamps
Align ordered scene descriptions and optional cue hints to narration audio, then return transition timestamps.