Postproduction API Reference

Scene-level postproduction endpoints for creator editing workflows. This API currently includes production endpoints for deterministic timeline composition, generating index-aligned scene descriptions with optional cue fields, and aligning required scene cues back to narration audio.

How the Postproduction API Fits Together

/postproduction/v1/describe-scenes turns ordered scene images into concise scene descriptions and may also emit anchorText and startCueText when narration context supports them. /postproduction/v1/scene-timestamps takes ordered scene cue fields plus one narration audio file and can optionally use narrationText and languageCodefor stronger alignment. /postproduction/v1/compose-otio assembles an explicit editorial manifest into an OpenTimelineIO timeline artifact you can move straight into downstream editing or export tooling.

Use Compose OTIO when you already know the exact clip order, trims, and audio layering you want to export.
Use Describe Scenes to generate stable, ordered scene text and cue fields from images.
Use Scene Timestamps to align required scene cue fields to narration audio on the final timeline.
All three endpoints preserve caller-provided order and expose tracking metadata through meta.requestId.
Public responses stay provider-agnostic and focus on user-visible behaviour only.

Compose OTIO

application/json request with project, folder.items[], and explicit intent.
responseFormat: defaults to file download, optional json wrapper for API-first consumers.
output.targetConsumer: optional import tuning for downstream editors.
Returns a deterministic OTIO timeline artifact instead of best-effort scene analysis.

Describe Scenes

metadata.narrationText: optional story context text
metadata.sceneIds: optional array, must match image count
metadata.hints.languageCode: optional language hint
metadata.hints.style: short, normal, or detailed (premium and enterprise)
images[]: required ordered scene image files

Scene Timestamps

multipart/form-data with one audio file and JSON metadata.
metadata.scenes[]: required ordered scenes with required anchorText and startCueText.
metadata.startOffsetMs: optional offset for the first scene on the final timeline.
metadata.languageCode: optional top-level language hint for speech-to-text.
Response includes ordered scene startMs values and transition confidence intervals.

Tier Limits

Limit	Compose OTIO (Free / Premium / Enterprise)	Describe Scenes (Free / Premium / Enterprise)	Scene Timestamps (Free / Premium / Enterprise)
Primary payload	20 / 200 / 200 manifest items	5 / 50 / 100 images	5 MB / 25 MB / 25 MB audio
Duration / total payload	180 sec / 3600 sec / 3600 sec total timeline duration	5 MB / 50 MB / 100 MB total images	5 min / 20 min / 60 min audio duration
Max scenes	Caller-defined via explicit sequence	5 / 50 / 100 via images	10 / 100 / 100 aligned scenes
Narration text	Not applicable	2 000 / 20 000 / 20 000 chars	5 000 / 50 000 / 50 000 chars
Per-scene text field	Not applicable	200 / 500 / 500 chars	300 / 500 / 500 chars per anchorText or startCueText field
Total scene text	Not applicable	Not applicable	4 000 / 40 000 / 40 000 chars across all scene text fields

Billing models: Compose OTIO starts at 4 credits for deterministic timeline composition. Billing models: Describe Scenes starts at 5 credits, adds +1 per extra image beyond the included set, and adds +1 for each scene marked with metadata.sceneOptions[].extraDetail on paid tiers. Scene Timestamps starts at 9 credits for requests up to 5 minutes and 10 scenes. That minimum already includes the 6-credit request base and the first started5-minute audio block. Each additional started block adds +3, and each additional 10-scene block after the first 10 scenes adds +1. Audio size remains a limit, not a billing dimension.

End-to-end usage walkthroughs: Compose OTIO guide, Describe Scenes guide, and Scene Timestamps guide.

POST

/postproduction/v1/compose-otio

Compose an OpenTimelineIO timeline from an explicit manifest with deterministic trims, transitions, and audio layering.

POST

/postproduction/v1/describe-scenes

Generate structured per-scene descriptions and optional cue hints from ordered scene images and narration context.

POST

/postproduction/v1/scene-timestamps

Align required scene cue fields to narration audio and return scene start times plus transition recommendations.

SWAGGER DEMO

Demo endpoints are available only for interactive Swagger testing on this website. They are not production endpoints and cannot be called directly from external clients.

POST

/postproduction/describe-scenes-demo

Describe Scenes demo in Swagger UI with a separate demo OpenAPI specification.

→

POST

/postproduction/scene-timestamps-demo

Scene Timestamps demo in Swagger UI with a separate demo OpenAPI specification.

→