Postproduction API Reference
Scene-level postproduction endpoints for creator editing workflows. This API currently includes production endpoints for deterministic timeline composition, generating index-aligned scene descriptions with optional cue fields, and aligning required scene cues back to narration audio.
How the Postproduction API Fits Together
/postproduction/v1/describe-scenes turns ordered scene images into concise scene descriptions and may also emit anchorText and startCueText when narration context supports them. /postproduction/v1/scene-timestamps takes ordered scene cue fields plus one narration audio file and can optionally use narrationText and languageCodefor stronger alignment. /postproduction/v1/compose-otio assembles an explicit editorial manifest into an OpenTimelineIO timeline artifact you can move straight into downstream editing or export tooling.
- Use Compose OTIO when you already know the exact clip order, trims, and audio layering you want to export.
- Use Describe Scenes to generate stable, ordered scene text and cue fields from images.
- Use Scene Timestamps to align required scene cue fields to narration audio on the final timeline.
- All three endpoints preserve caller-provided order and expose tracking metadata through
meta.requestId. - Public responses stay provider-agnostic and focus on user-visible behaviour only.
Compose OTIO
application/jsonrequest withproject,folder.items[], and explicitintent.responseFormat: defaults to file download, optionaljsonwrapper for API-first consumers.output.targetConsumer: optional import tuning for downstream editors.- Returns a deterministic OTIO timeline artifact instead of best-effort scene analysis.
Describe Scenes
metadata.narrationText: optional story context textmetadata.sceneIds: optional array, must match image countmetadata.hints.languageCode: optional language hintmetadata.hints.style:short,normal, ordetailed(premium and enterprise)images[]: required ordered scene image files
Scene Timestamps
multipart/form-datawith oneaudiofile and JSONmetadata.metadata.scenes[]: required ordered scenes with requiredanchorTextandstartCueText.metadata.startOffsetMs: optional offset for the first scene on the final timeline.metadata.languageCode: optional top-level language hint for speech-to-text.- Response includes ordered scene
startMsvalues and transition confidence intervals.
Tier Limits
| Limit | Compose OTIO (Free / Premium / Enterprise) | Describe Scenes (Free / Premium / Enterprise) | Scene Timestamps (Free / Premium / Enterprise) |
|---|---|---|---|
| Primary payload | 20 / 200 / 200 manifest items | 5 / 50 / 100 images | 5 MB / 25 MB / 25 MB audio |
| Duration / total payload | 180 sec / 3600 sec / 3600 sec total timeline duration | 5 MB / 50 MB / 100 MB total images | 5 min / 20 min / 60 min audio duration |
| Max scenes | Caller-defined via explicit sequence | 5 / 50 / 100 via images | 10 / 100 / 100 aligned scenes |
| Narration text | Not applicable | 2 000 / 20 000 / 20 000 chars | 5 000 / 50 000 / 50 000 chars |
| Per-scene text field | Not applicable | 200 / 500 / 500 chars | 300 / 500 / 500 chars per anchorText or startCueText field |
| Total scene text | Not applicable | Not applicable | 4 000 / 40 000 / 40 000 chars across all scene text fields |
Billing models: Compose OTIO starts at 4 credits for deterministic timeline composition. Billing models: Describe Scenes starts at 5 credits, adds +1 per extra image beyond the included set, and adds +1 for each scene marked with metadata.sceneOptions[].extraDetail on paid tiers. Scene Timestamps starts at 9 credits for requests up to 5 minutes and 10 scenes. That minimum already includes the 6-credit request base and the first started5-minute audio block. Each additional started block adds +3, and each additional 10-scene block after the first 10 scenes adds +1. Audio size remains a limit, not a billing dimension.
End-to-end usage walkthroughs: Compose OTIO guide, Describe Scenes guide, and Scene Timestamps guide.
/postproduction/v1/compose-otio
Compose an OpenTimelineIO timeline from an explicit manifest with deterministic trims, transitions, and audio layering.
/postproduction/v1/describe-scenes
Generate structured per-scene descriptions and optional cue hints from ordered scene images and narration context.
/postproduction/v1/scene-timestamps
Align required scene cue fields to narration audio and return scene start times plus transition recommendations.
SWAGGER DEMO
Demo endpoints are available only for interactive Swagger testing on this website. They are not production endpoints and cannot be called directly from external clients.