Postproduction API Reference

Scene-level postproduction endpoints for creator editing workflows. This API currently includes production endpoints for deterministic timeline composition, generating index-aligned scene descriptions with optional cue fields, and aligning required scene cues back to narration audio.

How the Postproduction API Fits Together

/postproduction/v1/describe-scenes turns ordered scene images into concise scene descriptions and may also emit anchorText and startCueText when narration context supports them. /postproduction/v1/scene-timestamps takes ordered scene cue fields plus one narration audio file and can optionally use narrationText and languageCodefor stronger alignment. /postproduction/v1/compose-otio assembles an explicit editorial manifest into an OpenTimelineIO timeline artifact you can move straight into downstream editing or export tooling.

  • Use Compose OTIO when you already know the exact clip order, trims, and audio layering you want to export.
  • Use Describe Scenes to generate stable, ordered scene text and cue fields from images.
  • Use Scene Timestamps to align required scene cue fields to narration audio on the final timeline.
  • All three endpoints preserve caller-provided order and expose tracking metadata through meta.requestId.
  • Public responses stay provider-agnostic and focus on user-visible behaviour only.

Compose OTIO

  • application/json request with project, folder.items[], and explicit intent.
  • responseFormat: defaults to file download, optional json wrapper for API-first consumers.
  • output.targetConsumer: optional import tuning for downstream editors.
  • Returns a deterministic OTIO timeline artifact instead of best-effort scene analysis.

Describe Scenes

  • metadata.narrationText: optional story context text
  • metadata.sceneIds: optional array, must match image count
  • metadata.hints.languageCode: optional language hint
  • metadata.hints.style: short, normal, or detailed (premium and enterprise)
  • images[]: required ordered scene image files

Scene Timestamps

  • multipart/form-data with one audio file and JSON metadata.
  • metadata.scenes[]: required ordered scenes with required anchorText and startCueText.
  • metadata.startOffsetMs: optional offset for the first scene on the final timeline.
  • metadata.languageCode: optional top-level language hint for speech-to-text.
  • Response includes ordered scene startMs values and transition confidence intervals.

Tier Limits

LimitCompose OTIO (Free / Premium / Enterprise)Describe Scenes (Free / Premium / Enterprise)Scene Timestamps (Free / Premium / Enterprise)
Primary payload20 / 200 / 200 manifest items5 / 50 / 100 images5 MB / 25 MB / 25 MB audio
Duration / total payload180 sec / 3600 sec / 3600 sec total timeline duration5 MB / 50 MB / 100 MB total images5 min / 20 min / 60 min audio duration
Max scenesCaller-defined via explicit sequence5 / 50 / 100 via images10 / 100 / 100 aligned scenes
Narration textNot applicable2 000 / 20 000 / 20 000 chars5 000 / 50 000 / 50 000 chars
Per-scene text fieldNot applicable200 / 500 / 500 chars300 / 500 / 500 chars per anchorText or startCueText field
Total scene textNot applicableNot applicable4 000 / 40 000 / 40 000 chars across all scene text fields

Billing models: Compose OTIO starts at 4 credits for deterministic timeline composition. Billing models: Describe Scenes starts at 5 credits, adds +1 per extra image beyond the included set, and adds +1 for each scene marked with metadata.sceneOptions[].extraDetail on paid tiers. Scene Timestamps starts at 9 credits for requests up to 5 minutes and 10 scenes. That minimum already includes the 6-credit request base and the first started5-minute audio block. Each additional started block adds +3, and each additional 10-scene block after the first 10 scenes adds +1. Audio size remains a limit, not a billing dimension.

End-to-end usage walkthroughs: Compose OTIO guide, Describe Scenes guide, and Scene Timestamps guide.

SWAGGER DEMO

Demo endpoints are available only for interactive Swagger testing on this website. They are not production endpoints and cannot be called directly from external clients.