Amux

Doubao Seedance 2.0 Series

Doubao Seedance 2.0 / 2.0 Fast video generation — multimodal inputs (text / image / video / audio), first-last frame locking, reference images, native audio. Volcengine native and amux universal endpoints.

Doubao Seedance 2.0 is ByteDance Volcengine Ark's multimodal video generation model — accepts text, image, video, and audio as four input modalities, up to 9 images / 3 video segments / 3 audio segments. amux-api exposes both Volcengine native and amux universal task-creation paths, sharing the same task status endpoint.

Model Variants

Model IDPositioningUse Case
doubao-seedance-2.0StandardFinal renders, best quality / consistency. Supports 720p / 1080p
doubao-seedance-2.0-fastFastSpeed-first and cheaper — ideal for prompt iteration & rough drafts. 720p only — does not support 1080p

See https://api.amux.ai/pricing for current pricing.

Create Video Generation Task

Two calling paths, sharing the same task status endpoint:

  • Path A (Volcengine native): POST /api/v3/contents/generations/tasks — fully compatible with the Volcengine Ark contract; Volcengine SDKs work with just a base_url change.
  • Path B (amux universal): POST /v1/video/generations — vendor-neutral schema; Volcengine-specific fields go through metadata.

Path A — Volcengine Native Endpoint

FieldValue
MethodPOST
Full URLhttps://api.amux.ai/api/v3/contents/generations/tasks
AuthAuthorization: Bearer <AMUX_API_KEY>
Request formatapplication/json
Response formatapplication/json

amux's gateway treats a request as Volcengine native format when the body contains content and does not contain prompt. If both are present, the request is handled as the universal format (Path B conversion chain).

Request Body Fields

Prop

Type

ContentItem Element

Each element of the content array is a ContentItem distinguished by type:

Prop

Type

Scenarios & Roles

Tasks fall into 4 scenarios. The 3 image-bearing scenarios are mutually exclusive — they cannot be mixed within a single task:

Scenarioimage_urlvideo_urlaudio_url
Text-to-Video
Image-to-Video (first frame)1, role: first_frame
Image-to-Video (first + last)2, role: first_frame + last_frame
Multimodal reference1–9, role: reference_image0–30–3

In multimodal reference mode, all three modalities are optional — you can pass images only, videos only, audios only, or any combination.

First/last-frame behavior:

  • The first and last frames may be the same image
  • When their aspect ratios differ, the first frame's ratio dominates and the last frame is auto-cropped to fit

Approximating "first/last frame + multimodal" via prompt:

In multimodal reference mode, you can use @imageN in the prompt to nominate a particular reference image as a first / last frame — an indirect way to combine "first/last frame + multimodal reference". For strict first/last-frame locking, use the Image-to-Video (first + last) scenario with explicit role: first_frame / last_frame.

Resource Constraints

ResourcePer-file limitDurationCountFormat
Image (image_url)30 MBSee scenarios aboveHTTP/HTTPS URL or Base64 data URL
Video (video_url)50 MB[2, 15]s per segment≤ 3 segments, total ≤ 15smp4 / mov; resolutions 480p / 720p / 1080p
Audio (audio_url)15 MB[2, 15]s per segment≤ 3 segments, total ≤ 15swav / mp3

Overall limit: total request body ≤ 64 MB. Do not Base64-encode large files — submit via public HTTP/HTTPS URLs to avoid the 64 MB cap and reduce upload latency.

Example — Text-to-Video

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms" }
    ],
    "ratio": "16:9",
    "duration": 5,
    "resolution": "720p",
    "generate_audio": true
  }'

Example — Image-to-Video (first/last-frame lock)

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Camera pushes from the courtyard slowly into the room" },
      { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
      { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" },   "role": "last_frame"  }
    ],
    "ratio": "16:9",
    "duration": 5
  }'

Example — Multiple Reference Images

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "@image1 and @image2 meet in the garden" },
      { "type": "image_url", "image_url": { "url": "https://example.com/character-a.jpg" }, "role": "reference_image" },
      { "type": "image_url", "image_url": { "url": "https://example.com/character-b.jpg" }, "role": "reference_image" }
    ],
    "ratio": "16:9"
  }'

Example — Video + Audio Multimodal Reference

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Continue the picture pacing, sync to the audio" },
      { "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
      { "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
    ],
    "ratio": "16:9",
    "duration": 5
  }'

Example — With Callback and Custom trace_id

curl https://api.amux.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "content": [
      { "type": "text", "text": "Harbor at dusk, seagulls gliding past" }
    ],
    "ratio": "16:9",
    "duration": 5,
    "callback_url": "https://your-app.example.com/api/seedance/callback",
    "trace_id": "biz-order-20260428-001"
  }'

Response (on submission)

{
  "id": "cgt-2024xxxxxxxxxxxx",
  "model": "doubao-seedance-2.0",
  "status": "queued",
  "created_at": 1714300000
}

id is the task ID — usable directly with Get Task Status.

Path B — amux Universal Endpoint

Vendor-neutral contract. Standard fields (prompt / image / images) at the top level; Volcengine-specific fields go through metadata.

FieldValue
MethodPOST
Full URLhttps://api.amux.ai/v1/video/generations
AuthAuthorization: Bearer <AMUX_API_KEY>
Request formatapplication/json

For the full universal contract, see Create Video Generation Task (universal).

prompt is required: amux's validation layer rejects empty prompt even for first/last-frame locking and pure reference-image cases. Provide a short prompt regardless.

Top-Level Fields

Prop

Type

Unlisted metadata keys are not recognized: amux unmarshals metadata into Path A's requestPayload struct, so keys that don't exist on the struct are dropped. Common mistake: passing metadata.first_frame / metadata.last_frame / metadata.reference_images directly — none of these take effect; use metadata.content array with the role field instead.

Specifying Image Roles / Injecting Video / Audio

To attach a role to an image, or inject video_url / audio_url on the universal endpoint, use metadata.content with the full Doubao content array (same shape as Path A's ContentItem Element):

{
  "model": "doubao-seedance-2.0",
  "prompt": "Camera pushes in slowly",
  "metadata": {
    "ratio": "16:9",
    "duration": 5,
    "content": [
      { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
      { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg"   }, "role": "last_frame"  }
    ]
  }
}

Images sent via top-level image / images are added to content without a role — the upstream applies its default. For explicit role / multimodal control, always go through metadata.content.

3 mutually exclusive scenarios: Image-to-Video (first frame), Image-to-Video (first + last), and Multimodal reference cannot be mixed within the same task — see Scenarios & Roles above.

amux's adapter also applies one extra rule: when content contains only reference_image (no first/last frame), duration is automatically stripped — the upstream rejects duration in multimodal reference mode with InvalidParameter.

Example — Text-to-Video (standard)

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "resolution": "720p",
      "generate_audio": true
    }
  }'
import os
import httpx

response = httpx.post(
    "https://api.amux.ai/v1/video/generations",
    headers={"Authorization": f"Bearer {os.environ['AMUX_API_KEY']}"},
    json={
        "model": "doubao-seedance-2.0",
        "prompt": "A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms",
        "metadata": {
            "ratio": "16:9",
            "duration": 5,
            "resolution": "720p",
            "generate_audio": True,
        },
    },
)
task = response.json()
print(f"Task ID: {task['task_id']}")
const response = await fetch('https://api.amux.ai/v1/video/generations', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${process.env.AMUX_API_KEY}`,
  },
  body: JSON.stringify({
    model: 'doubao-seedance-2.0',
    prompt:
      'A cute baby sea otter wearing a beret, rolling in a spring courtyard with falling cherry blossoms',
    metadata: {
      ratio: '16:9',
      duration: 5,
      resolution: '720p',
      generate_audio: true,
    },
  }),
});

const task = await response.json();
console.log(`Task ID: ${task.task_id}`);

Example — First + Last Frame Lock

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0-fast",
    "prompt": "Camera pushes slowly from the courtyard into the room",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
        { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" },   "role": "last_frame"  }
      ]
    }
  }'

Example — Video + Audio Multimodal Reference

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "Continue the picture pacing, sync to the audio",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "content": [
        { "type": "video_url", "video_url": { "url": "https://example.com/sample-clip.mp4" } },
        { "type": "audio_url", "audio_url": { "url": "https://example.com/voice.mp3" } }
      ]
    }
  }'

Example — With Callback and Custom trace_id

curl https://api.amux.ai/v1/video/generations \
  -H "Authorization: Bearer $AMUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-2.0",
    "prompt": "Harbor at dusk, seagulls gliding past",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "callback_url": "https://your-app.example.com/api/seedance/callback",
      "trace_id": "biz-order-20260428-001"
    }
  }'

Get Video Task Status

Regardless of whether you submitted via Path A or Path B, status query and video download share the same unified endpoints:

PurposeEndpoint
Query task status; get the video URLGET /v1/video/generations/{task_id}
Stream the video bytesGET /v1/videos/{task_id}/content

See Get Video Task Status for full details.

Pricing Notes

amux-api applies an automatic video-input discount for the Doubao Seedance 2.0 family: when the request content contains a video_url entry (continuation / multimodal video reference), the per-task price is multiplied by the upstream's "with-video-input" discount factor (~0.59 ~ 0.61). Requests without video input are billed at the base rate.

Note: although the "with-video-input" tier has a lower per-task rate, the input video itself consumes extra input tokens. Once that's added in, the final total cost is roughly comparable to a request without video input — the discount is just pricing tier reclassification, not a net savings.

In addition, resolution: "1080p" is priced higher than 720p, and doubao-seedance-2.0-fast does not support 1080p — see pricing for full details.

FAQ

On this page