How Text-to-Video Generation Works

Text-to-video generation uses a diffusion-based AI model to convert written descriptions into video frames. You provide a text prompt describing the scene, camera angle, lighting, and action. The model interprets your description and synthesizes a coherent video clip with realistic motion, physics, and temporal consistency.

Seedance 2.0 excels at understanding complex prompts with multiple elements: camera movements (dolly, pan, zoom), environmental conditions (rain, fog, golden hour), and subject actions (walking, pouring, rotating). The model produces output at up to 2K resolution with up to 10 seconds of fluid motion per clip.

Through US Video API, you access Seedance 2.0 via a simple REST endpoint. No ML infrastructure, no model hosting, no GPU management. Just HTTP requests and video files.

Code Examples

Python

import requests, time

# Step 1: Submit generation job
job = requests.post(
    "https://usvideoapi.com/v1/videos",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "prompt": "Aerial drone shot of a coastal highway at sunset, "
                 "waves crashing against cliffs, golden light",
        "resolution": "1080p",
        "duration": 5,
    }
).json()

# Step 2: Poll for completion
while job["status"] == "pending":
    time.sleep(5)
    job = requests.get(
        f"https://usvideoapi.com/v1/videos/{job['id']}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    ).json()

# Step 3: Download video
print(job["video_url"])  # Direct MP4 link

cURL

Bash

curl -X POST https://usvideoapi.com/v1/videos \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Close-up of coffee being poured into a ceramic mug, steam rising, soft morning light",
    "resolution": "720p",
    "duration": 5
  }'

# Response: {"id": "job_a1b2c3", "status": "pending", "price": "$1.25"}

Node.js

JavaScript

const response = await fetch("https://usvideoapi.com/v1/videos", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    prompt: "Timelapse of a flower blooming, macro lens, studio lighting",
    resolution: "720p",
    duration: 5,
  }),
});

const job = await response.json();
console.log(job.id); // "job_x7y8z9"

Prompt Engineering Tips

The quality of your text-to-video output depends heavily on how you write your prompts. Here are practical tips for getting the best results from Seedance 2.0:

Be specific about camera movement

Instead of "a city at night," write "slow dolly forward through a neon-lit Tokyo alley at night, rain-slicked streets reflecting lights." Seedance 2.0 responds well to cinematographic direction: dolly, pan, tilt, crane, tracking shot, static wide, handheld.

Describe lighting explicitly

Lighting makes or breaks a shot. Specify: "golden hour backlight," "overcast diffused light," "dramatic side-lit chiaroscuro," or "warm tungsten interior." The model uses lighting cues to set mood and realism.

Include motion descriptions

Static scenes work, but motion sells. Add action: "steam rises from the mug," "hair blows in the wind," "leaves fall in slow motion." Describe the subject's movement and the camera's movement separately for best results.

Use negative prompts sparingly

If you want to avoid certain artifacts, you can include a negative_prompt parameter: "blurry, distorted, watermark." But most of the time, a well-written positive prompt produces clean output without needing negation.

Resolution vs. speed tradeoff

For rapid iteration and prompt testing, use 480p ($0.10/sec, under 30s generation). When you have a prompt you are happy with, re-generate at 1080p ($0.50/sec) for production quality. This workflow keeps costs low during development.

Pricing for Text-to-Video

Text-to-video generation is billed per second of output at your chosen resolution:

480p — $0.10/second ($0.50 for a 5s clip)
720p — $0.25/second ($1.25 for a 5s clip) — most popular
1080p — $0.50/second ($2.50 for a 5s clip)

No subscriptions. No minimums. Prepaid balance — add funds and start generating. Volume discounts available for accounts spending $500+/month.

Quality Showcase

Seedance 2.0 is ByteDance's flagship video generation model. It excels at:

Realistic physics — water flows, cloth drapes, smoke dissipates naturally
Cinematic camera control — smooth dolly, pan, crane, and tracking shots
Temporal consistency — subjects maintain appearance across frames without flickering
Complex scenes — multiple subjects, background activity, environmental interaction
Diverse styles — photorealistic, anime, oil painting, 3D render, and more

Visit our homepage demo section to see real API output — unedited, no cherry-picking.

Text to Video API
Describe It. Generate It.

How Text-to-Video Generation Works

Code Examples

Python

cURL

Node.js

Prompt Engineering Tips

Be specific about camera movement

Describe lighting explicitly

Include motion descriptions

Use negative prompts sparingly

Resolution vs. speed tradeoff

Pricing for Text-to-Video

Quality Showcase

See what Seedance 2.0 can do

Turn your words into
cinematic video

Written by Eric J.

Text to Video APIDescribe It. Generate It.

How Text-to-Video Generation Works

Code Examples

Python

cURL

Node.js

Prompt Engineering Tips

Be specific about camera movement

Describe lighting explicitly

Include motion descriptions

Use negative prompts sparingly

Resolution vs. speed tradeoff

Pricing for Text-to-Video

Quality Showcase

See what Seedance 2.0 can do

Turn your words intocinematic video

Written by Eric J.

Text to Video API
Describe It. Generate It.

Turn your words into
cinematic video