Doubao Seedance 2.0 Series
Doubao Seedance 2.0 / 2.0 Fast video generation — multimodal inputs (text / image / video / audio), first-last frame locking, reference images, native audio. Volcengine native and amux universal endpoints.
Doubao Seedance 2.0 is ByteDance Volcengine Ark's multimodal video generation model — accepts text, image, video, and audio as four input modalities, up to 9 images / 3 video segments / 3 audio segments. amux-api exposes both Volcengine native and amux universal task-creation paths, sharing the same task status endpoint.
Model Variants
| Model ID | Positioning | Use Case |
|---|---|---|
doubao-seedance-2.0 | Standard | Final renders, best quality / consistency. Supports 720p / 1080p |
doubao-seedance-2.0-fast | Fast | Speed-first and cheaper — ideal for prompt iteration & rough drafts. 720p only — does not support 1080p |
See https://api.amux.ai/pricing for current pricing.
Create Video Generation Task
Two calling paths, sharing the same task status endpoint:
- Path A (Volcengine native):
POST /api/v3/contents/generations/tasks— fully compatible with the Volcengine Ark contract; Volcengine SDKs work with just abase_urlchange. - Path B (amux universal):
POST /v1/video/generations— vendor-neutral schema; Volcengine-specific fields go throughmetadata.
Path A — Volcengine Native Endpoint
| Field | Value |
|---|---|
| Method | POST |
| Full URL | https://api.amux.ai/api/v3/contents/generations/tasks |
| Auth | Authorization: Bearer <AMUX_API_KEY> |
| Request format | application/json |
| Response format | application/json |
amux's gateway treats a request as Volcengine native format when the body contains content and does not contain prompt. If both are present, the request is handled as the universal format (Path B conversion chain).
Request Body Fields
Prop
Type
ContentItem Element
Each element of the content array is a ContentItem distinguished by type:
Prop
Type
Scenarios & Roles
Tasks fall into 4 scenarios. The 3 image-bearing scenarios are mutually exclusive — they cannot be mixed within a single task:
| Scenario | image_url | video_url | audio_url |
|---|---|---|---|
| Text-to-Video | — | — | — |
| Image-to-Video (first frame) | 1, role: first_frame | — | — |
| Image-to-Video (first + last) | 2, role: first_frame + last_frame | — | — |
| Multimodal reference | 1–9, role: reference_image | 0–3 | 0–3 |
In multimodal reference mode, all three modalities are optional — you can pass images only, videos only, audios only, or any combination.
First/last-frame behavior:
- The first and last frames may be the same image
- When their aspect ratios differ, the first frame's ratio dominates and the last frame is auto-cropped to fit
Approximating "first/last frame + multimodal" via prompt:
In multimodal reference mode, you can use @imageN in the prompt to nominate a particular reference image as a first / last frame — an indirect way to combine "first/last frame + multimodal reference". For strict first/last-frame locking, use the Image-to-Video (first + last) scenario with explicit role: first_frame / last_frame.
Resource Constraints
| Resource | Per-file limit | Duration | Count | Format |
|---|---|---|---|---|
| Image (image_url) | 30 MB | — | See scenarios above | HTTP/HTTPS URL or Base64 data URL |
| Video (video_url) | 50 MB | [2, 15]s per segment | ≤ 3 segments, total ≤ 15s | mp4 / mov; resolutions 480p / 720p / 1080p |
| Audio (audio_url) | 15 MB | [2, 15]s per segment | ≤ 3 segments, total ≤ 15s | wav / mp3 |
Overall limit: total request body ≤ 64 MB. Do not Base64-encode large files — submit via public HTTP/HTTPS URLs to avoid the 64 MB cap and reduce upload latency.
Example — Text-to-Video
curl https://api.amux.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"content": [
{ "type": "text", "text": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms" }
],
"ratio": "16:9",
"duration": 5,
"resolution": "720p",
"generate_audio": true
}'Example — Image-to-Video (first/last-frame lock)
curl https://api.amux.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"content": [
{ "type": "text", "text": "Camera pushes from the courtyard slowly into the room" },
{ "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
{ "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" }, "role": "last_frame" }
],
"ratio": "16:9",
"duration": 5
}'Example — Multiple Reference Images
curl https://api.amux.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"content": [
{ "type": "text", "text": "@image1 and @image2 meet in the garden" },
{ "type": "image_url", "image_url": { "url": "https://example.com/character-a.jpg" }, "role": "reference_image" },
{ "type": "image_url", "image_url": { "url": "https://example.com/character-b.jpg" }, "role": "reference_image" }
],
"ratio": "16:9"
}'Example — Video + Audio Multimodal Reference
curl https://api.amux.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"content": [
{ "type": "text", "text": "Continue the picture pacing, sync to the audio" },
{ "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
{ "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
],
"ratio": "16:9",
"duration": 5
}'Example — With Callback and Custom trace_id
curl https://api.amux.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"content": [
{ "type": "text", "text": "Harbor at dusk, seagulls gliding past" }
],
"ratio": "16:9",
"duration": 5,
"callback_url": "https://your-app.example.com/api/seedance/callback",
"trace_id": "biz-order-20260428-001"
}'Response (on submission)
{
"id": "cgt-2024xxxxxxxxxxxx",
"model": "doubao-seedance-2.0",
"status": "queued",
"created_at": 1714300000
}id is the task ID — usable directly with Get Task Status.
Path B — amux Universal Endpoint
Vendor-neutral contract. Standard fields (prompt / image / images) at the top level; Volcengine-specific fields go through metadata.
| Field | Value |
|---|---|
| Method | POST |
| Full URL | https://api.amux.ai/v1/video/generations |
| Auth | Authorization: Bearer <AMUX_API_KEY> |
| Request format | application/json |
For the full universal contract, see Create Video Generation Task (universal).
prompt is required: amux's validation layer rejects empty prompt even for first/last-frame locking and pure reference-image cases. Provide a short prompt regardless.
Top-Level Fields
Prop
Type
Unlisted metadata keys are not recognized: amux unmarshals metadata into Path A's requestPayload struct, so keys that don't exist on the struct are dropped. Common mistake: passing metadata.first_frame / metadata.last_frame / metadata.reference_images directly — none of these take effect; use metadata.content array with the role field instead.
Specifying Image Roles / Injecting Video / Audio
To attach a role to an image, or inject video_url / audio_url on the universal endpoint, use metadata.content with the full Doubao content array (same shape as Path A's ContentItem Element):
{
"model": "doubao-seedance-2.0",
"prompt": "Camera pushes in slowly",
"metadata": {
"ratio": "16:9",
"duration": 5,
"content": [
{ "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
{ "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" }, "role": "last_frame" }
]
}
}Images sent via top-level image / images are added to content without a role — the upstream applies its default. For explicit role / multimodal control, always go through metadata.content.
3 mutually exclusive scenarios: Image-to-Video (first frame), Image-to-Video (first + last), and Multimodal reference cannot be mixed within the same task — see Scenarios & Roles above.
amux's adapter also applies one extra rule: when content contains only reference_image (no first/last frame), duration is automatically stripped — the upstream rejects duration in multimodal reference mode with InvalidParameter.
Example — Text-to-Video (standard)
curl https://api.amux.ai/v1/video/generations \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
"metadata": {
"ratio": "16:9",
"duration": 5,
"resolution": "720p",
"generate_audio": true
}
}'import os
import httpx
response = httpx.post(
"https://api.amux.ai/v1/video/generations",
headers={"Authorization": f"Bearer {os.environ['AMUX_API_KEY']}"},
json={
"model": "doubao-seedance-2.0",
"prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
"metadata": {
"ratio": "16:9",
"duration": 5,
"resolution": "720p",
"generate_audio": True,
},
},
)
task = response.json()
print(f"Task ID: {task['task_id']}")const response = await fetch('https://api.amux.ai/v1/video/generations', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.AMUX_API_KEY}`,
},
body: JSON.stringify({
model: 'doubao-seedance-2.0',
prompt:
'A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms',
metadata: {
ratio: '16:9',
duration: 5,
resolution: '720p',
generate_audio: true,
},
}),
});
const task = await response.json();
console.log(`Task ID: ${task.task_id}`);Example — First + Last Frame Lock
curl https://api.amux.ai/v1/video/generations \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0-fast",
"prompt": "Camera pushes slowly from the courtyard into the room",
"metadata": {
"ratio": "16:9",
"duration": 5,
"content": [
{ "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
{ "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" }, "role": "last_frame" }
]
}
}'Example — Video + Audio Multimodal Reference
curl https://api.amux.ai/v1/video/generations \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"prompt": "Continue the picture pacing, sync to the audio",
"metadata": {
"ratio": "16:9",
"duration": 5,
"content": [
{ "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
{ "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
]
}
}'Example — With Callback and Custom trace_id
curl https://api.amux.ai/v1/video/generations \
-H "Authorization: Bearer $AMUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0",
"prompt": "Harbor at dusk, seagulls gliding past",
"metadata": {
"ratio": "16:9",
"duration": 5,
"callback_url": "https://your-app.example.com/api/seedance/callback",
"trace_id": "biz-order-20260428-001"
}
}'Get Video Task Status
Regardless of whether you submitted via Path A or Path B, status query and video download share the same unified endpoints:
| Purpose | Endpoint |
|---|---|
| Query task status; get the video URL | GET /v1/video/generations/{task_id} |
| Stream the video bytes | GET /v1/videos/{task_id}/content |
See Get Video Task Status for full details.
Pricing Notes
amux-api applies an automatic video-input discount for the Doubao Seedance 2.0 family: when the request content contains a video_url entry (continuation / multimodal video reference), the per-task price is multiplied by the upstream's "with-video-input" discount factor (~0.59 ~ 0.61). Requests without video input are billed at the base rate.
Note: although the "with-video-input" tier has a lower per-task rate, the input video itself consumes extra input tokens. Once that's added in, the final total cost is roughly comparable to a request without video input — the discount is just pricing tier reclassification, not a net savings.
In addition, resolution: "1080p" is priced higher than 720p, and doubao-seedance-2.0-fast does not support 1080p — see pricing for full details.