Describe Scenes

POST/postproduction/v1/describe-scenes

What it does

Use this when you are preparing a video from storyboard frames, keyframes, or generated visuals and you need clean scene text before editing starts. Send an ordered list of scene images and the API returns one short description per scene in the same order, so you can turn loose visual assets into structured scene data for shot planning, timeline prep, or the next postproduction step.

If you also provide metadata.narrationText, the output stays more consistent with the story you want the final video to tell. That is especially useful when an editor or AI agent needs scene text that is brief, stable, and ready to reuse instead of writing those descriptions manually.

When that narration context is strong enough, the response may also carry additiveinterpretation, anchorText, and startCueText fields. For downstream narration alignment, pass anchorText and startCueText intoscene-timestamps.

How it works

Think of this endpoint as a scene description normalizer. You send the images in the order you want to keep, optionally add full-story narration for context, and you get back one description per image in that same order.

From a caller perspective, the main contract is simple: images[] defines the scene order,sceneIds[] lets you carry your own identifiers through the response, sceneOptions[]lets paid callers mark a few scenes for deeper analysis, and indexis the stable key you can always rely on when mapping results back into your editor, database, or agent workflow.

If you already know the story arc, add metadata.narrationText. If you only have images, omit it and still get usable output. In both cases the API returns concise scene text rather than long prose, so the result is ready for the next step in the pipeline instead of needing another cleanup pass.

What comes back

One result per uploaded image. Response order matches upload order exactly.
index is the canonical mapping key. Use it to join API output back to your editor or asset list.
sceneIds[] are echoed when supplied. Helpful for UIs, but still treat index as source of truth.
Cue hints are additive. interpretation, anchorText, and startCueText may appear when narration context supports them, but the baseline response shape still works without them.
No narration text is returned. Public contract returns only narrationHash, not the raw text.

Why use it?

Prepare scene text from images. Useful when you have storyboard frames, thumbnails, or keyframes but not structured scene descriptions yet.
Feed the next postproduction step. The output is designed to become input for scene-timestamps, ideally carrying anchorText and startCueText when available.
Keep scene mapping deterministic. Upload order is preserved end to end, which makes automation simpler.
Keep descriptions within editor-friendly limits. The endpoint enforces per-tier description length instead of returning arbitrary long prose.

Examples

cURL example

curl -X POST 'https://api.creatornode.io/postproduction/v1/describe-scenes' \
  -H 'X-API-Key: YOUR_KEY' \
  -F 'images=@scene-01.png' \
  -F 'images=@scene-02.png' \
  -F 'images=@scene-03.png' \
  -F 'metadata={"narrationText":"A cyclist crosses a bridge, enters a crowded market, then reaches a sunset skyline.","sceneIds":["scene-1","scene-2","scene-3"],"sceneOptions":[{}, {"extraDetail":true}, {}],"hints":{"languageCode":"en","style":"normal"}}'

Response excerpt

{
  "success": true,
  "data": {
    "narrationHash": "a0a14d2d5b8d0f1d6f3c0d9a66f41c2c0a7fd9d3dbe8e6af0a0d0123456789ab",
    "scenes": [
      { "index": 0, "id": "scene-1", "imageName": "scene-01.png", "description": "Cyclist crossing a bridge in early morning light." },
      { "index": 1, "id": "scene-2", "imageName": "scene-02.png", "description": "Crowded market lane with fast movement and city energy." },
      { "index": 2, "id": "scene-3", "imageName": "scene-03.png", "description": "Sunset skyline closing the sequence with a calm wide shot." }
    ]
  },
  "meta": {
    "requestId": "req_123",
    "processingTimeMs": 410,
    "imageCount": 3
  }
}

Tips & tricks

You can omit narrationText. The endpoint still works from images alone; narration is a quality hint, not a required field.
Give scenes in final timeline order. The endpoint does not detect or reorder scenes for you.
Use style deliberately. short is better for labels, normal for default usage, and detailed when you want richer scene text on paid tiers.
Provide sceneIds[] when your UI already has stable IDs. They come back in the response and save one mapping step client-side.
Use sceneOptions[].extraDetail sparingly. Premium and Enterprise can mark up to 5 ambiguous or important scenes for deeper analysis at +1 credit per marked scene.
See full schema in API docs. OpenAPI reference: Describe Scenes docs.

Cost & Limits

Feature	Detail
Base cost	5 credits (includes up to 5 images)
Extra cost	+1 credit per image above 5
Enhanced detail scenes	+1 credit per scene marked with `sceneOptions[].extraDetail` (max 5, paid tiers only)
Input format	multipart/form-data (metadata JSON + images[])
Best paired with	Scene Timestamps for cut-point alignment on narration audio

Tier Limits

Limit	Free	Premium
Max images per request	5	50
Max image size	2 MB	5 MB
Max total image payload	5 MB	50 MB
Max narration length	2 000 chars	20 000 chars
Max description length / scene	200 chars	500 chars
Max enhanced-detail scenes	Not available	5

Other Endpoints

POST/postproduction/v1/compose-otio

Compose OTIO

Compose an OpenTimelineIO timeline from an explicit manifest and deterministic editorial timing.

POST/postproduction/v1/scene-timestamps

Scene Timestamps

Align ordered scene descriptions and optional cue hints to narration audio, then return transition timestamps.