Doubao Seedance 2.0 / 2.0 Fast video generation — multimodal inputs (text / image / video / audio), first-last frame locking, reference images, native audio. Volcengine native and amux universal endpoints.

Doubao Seedance 2.0 is ByteDance Volcengine Ark's multimodal video generation model — accepts text, image, video, and audio as four input modalities, up to 9 images / 3 video segments / 3 audio segments. amux-api exposes both Volcengine native and amux universal task-creation paths, sharing the same task status endpoint.

Model Variants

Model ID	Positioning	Use Case
`doubao-seedance-2.0`	Standard	Final renders, best quality / consistency. Supports 720p / 1080p
`doubao-seedance-2.0-fast`	Fast	Speed-first and cheaper — ideal for prompt iteration & rough drafts. 720p only — does not support 1080p

See https://api.amux.ai/pricing for current pricing.

Create Video Generation Task

Two calling paths, sharing the same task status endpoint:

Path A (Volcengine native): POST /api/v3/contents/generations/tasks — fully compatible with the Volcengine Ark contract; Volcengine SDKs work with just a base_url change.
Path B (amux universal): POST /v1/video/generations — vendor-neutral schema; Volcengine-specific fields go through metadata.

Path A — Volcengine Native Endpoint

Field	Value
Method	`POST`
Full URL	`https://api.amux.ai/api/v3/contents/generations/tasks`
Auth	`Authorization: Bearer <AMUX_API_KEY>`
Request format	`application/json`
Response format	`application/json`

amux's gateway treats a request as Volcengine native format when the body contains content and does not contain prompt. If both are present, the request is handled as the universal format (Path B conversion chain).

Request Body Fields

Prop

Type

ContentItem Element

Each element of the content array is a ContentItem distinguished by type:

Prop

Type

Scenarios & Roles

Tasks fall into 4 scenarios. The 3 image-bearing scenarios are mutually exclusive — they cannot be mixed within a single task:

Scenario	image_url	video_url	audio_url
Text-to-Video	—	—	—
Image-to-Video (first frame)	1, `role: first_frame`	—	—
Image-to-Video (first + last)	2, `role: first_frame` + `last_frame`	—	—
Multimodal reference	1–9, `role: reference_image`	0–3	0–3

In multimodal reference mode, all three modalities are optional — you can pass images only, videos only, audios only, or any combination.

First/last-frame behavior:

The first and last frames may be the same image
When their aspect ratios differ, the first frame's ratio dominates and the last frame is auto-cropped to fit

Approximating "first/last frame + multimodal" via prompt:

In multimodal reference mode, you can use @imageN in the prompt to nominate a particular reference image as a first / last frame — an indirect way to combine "first/last frame + multimodal reference". For strict first/last-frame locking, use the Image-to-Video (first + last) scenario with explicit role: first_frame / last_frame.

Resource Constraints

Resource	Per-file limit	Duration	Count	Format
Image (image_url)	30 MB	—	See scenarios above	HTTP/HTTPS URL or Base64 data URL
Video (video_url)	50 MB	[2, 15]s per segment	≤ 3 segments, total ≤ 15s	mp4 / mov; resolutions 480p / 720p / 1080p
Audio (audio_url)	15 MB	[2, 15]s per segment	≤ 3 segments, total ≤ 15s	wav / mp3

Overall limit: total request body ≤ 64 MB. Do not Base64-encode large files — submit via public HTTP/HTTPS URLs to avoid the 64 MB cap and reduce upload latency.

Example — Text-to-Video

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms" }
    ],
    "ratio": "16:9",
    "duration": 5,
    "resolution": "720p",
    "generate_audio": true
  }'

Example — Image-to-Video (first/last-frame lock)

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Camera pushes from the courtyard slowly into the room" },
      { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
      { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" },   "role": "last_frame"  }
    ],
    "ratio": "16:9",
    "duration": 5
  }'

Example — Multiple Reference Images

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "@image1 and @image2 meet in the garden" },
      { "type": "image_url", "image_url": { "url": "https://example.com/character-a.jpg" }, "role": "reference_image" },
      { "type": "image_url", "image_url": { "url": "https://example.com/character-b.jpg" }, "role": "reference_image" }
    ],
    "ratio": "16:9"
  }'

Example — Video + Audio Multimodal Reference

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Continue the picture pacing, sync to the audio" },
      { "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
      { "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
    ],
    "ratio": "16:9",
    "duration": 5
  }'

Example — With Callback and Custom trace_id

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Harbor at dusk, seagulls gliding past" }
    ],
    "ratio": "16:9",
    "duration": 5,
    "callback_url": "https://your-app.example.com/api/seedance/callback",
    "trace_id": "biz-order-20260428-001"
  }'

Response (on submission)

{
  "id": "cgt-2024xxxxxxxxxxxx",
  "model": "doubao-seedance-2.0",
  "status": "queued",
  "created_at": 1714300000
}

id is the task ID — usable directly with Get Task Status.

Path B — amux Universal Endpoint

Vendor-neutral contract. Standard fields (prompt / image / images) at the top level; Volcengine-specific fields go through metadata.

Field	Value
Method	`POST`
Full URL	`https://api.amux.ai/v1/video/generations`
Auth	`Authorization: Bearer <AMUX_API_KEY>`
Request format	`application/json`

For the full universal contract, see Create Video Generation Task (universal).

prompt is required: amux's validation layer rejects empty prompt even for first/last-frame locking and pure reference-image cases. Provide a short prompt regardless.

Top-Level Fields

Prop

Type

Unlisted metadata keys are not recognized: amux unmarshals metadata into Path A's requestPayload struct, so keys that don't exist on the struct are dropped. Common mistake: passing metadata.first_frame / metadata.last_frame / metadata.reference_images directly — none of these take effect; use metadata.content array with the role field instead.

Specifying Image Roles / Injecting Video / Audio

To attach a role to an image, or inject video_url / audio_url on the universal endpoint, use metadata.content with the full Doubao content array (same shape as Path A's ContentItem Element):

{
  "model": "doubao-seedance-2.0",
  "prompt": "Camera pushes in slowly",
  "metadata": {
    "ratio": "16:9",
    "duration": 5,
    "content": [
      { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
      { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg"   }, "role": "last_frame"  }
    ]
  }
}

Images sent via top-level image / images are added to content without a role — the upstream applies its default. For explicit role / multimodal control, always go through metadata.content.

3 mutually exclusive scenarios: Image-to-Video (first frame), Image-to-Video (first + last), and Multimodal reference cannot be mixed within the same task — see Scenarios & Roles above.

amux's adapter also applies one extra rule: when content contains only reference_image (no first/last frame), duration is automatically stripped — the upstream rejects duration in multimodal reference mode with InvalidParameter.

Example — Text-to-Video (standard)

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "resolution": "720p",
      "generate_audio": true
    }
  }'

import os
import httpx

response = httpx.post(
    "https://api.amux.ai/v1/video/generations",
    headers={"Authorization": f"Bearer {os.environ['AMUX_API_KEY']}"},
    json={
        "model": "doubao-seedance-2.0",
        "prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
        "metadata": {
            "ratio": "16:9",
            "duration": 5,
            "resolution": "720p",
            "generate_audio": True,
        },
    },
)
task = response.json()
print(f"Task ID: {task['task_id']}")

const response = await fetch('https://api.amux.ai/v1/video/generations', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${process.env.AMUX_API_KEY}`,
  },
  body: JSON.stringify({
    model: 'doubao-seedance-2.0',
    prompt:
      'A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms',
    metadata: {
      ratio: '16:9',
      duration: 5,
      resolution: '720p',
      generate_audio: true,
    },
  }),
});

const task = await response.json();
console.log(`Task ID: ${task.task_id}`);

Example — First + Last Frame Lock

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0-fast",
    "prompt": "Camera pushes slowly from the courtyard into the room",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
        { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" },   "role": "last_frame"  }
      ]
    }
  }'

Example — Video + Audio Multimodal Reference

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "Continue the picture pacing, sync to the audio",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "content": [
        { "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
        { "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
      ]
    }
  }'

Example — With Callback and Custom trace_id

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "Harbor at dusk, seagulls gliding past",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "callback_url": "https://your-app.example.com/api/seedance/callback",
      "trace_id": "biz-order-20260428-001"
    }
  }'

Get Video Task Status

Regardless of whether you submitted via Path A or Path B, status query and video download share the same unified endpoints:

Purpose	Endpoint
Query task status; get the video URL	`GET /v1/video/generations/{task_id}`
Stream the video bytes	`GET /v1/videos/{task_id}/content`

See Get Video Task Status for full details.

Pricing Notes

amux-api applies an automatic video-input discount for the Doubao Seedance 2.0 family: when the request content contains a video_url entry (continuation / multimodal video reference), the per-task price is multiplied by the upstream's "with-video-input" discount factor (~0.59 ~ 0.61). Requests without video input are billed at the base rate.

Note: although the "with-video-input" tier has a lower per-task rate, the input video itself consumes extra input tokens. Once that's added in, the final total cost is roughly comparable to a request without video input — the discount is just pricing tier reclassification, not a net savings.

In addition, resolution: "1080p" is priced higher than 720p, and doubao-seedance-2.0-fast does not support 1080p — see pricing for full details.

Doubao Seedance 2.0 Series

Model Variants

Create Video Generation Task

Path A — Volcengine Native Endpoint

Request Body Fields

ContentItem Element

Scenarios & Roles

Resource Constraints

Example — Text-to-Video

Example — Image-to-Video (first/last-frame lock)

Example — Multiple Reference Images

Example — Video + Audio Multimodal Reference

Example — With Callback and Custom trace_id

Response (on submission)

Path B — amux Universal Endpoint

Top-Level Fields

Specifying Image Roles / Injecting Video / Audio

Example — Text-to-Video (standard)

Example — First + Last Frame Lock

Example — Video + Audio Multimodal Reference

Example — With Callback and Custom trace_id

Get Video Task Status

Pricing Notes

FAQ

On this page

Doubao Seedance 2.0 Series

Standard vs Fast — which one?

Why didn't `metadata.first_frame` take effect?

Top-level `duration` not working?

Is `prompt` always required?

What are the multimodal limits?

My output has no audio?

How does `trace_id` work?

callback_url isn't being hit?

On this page