GENERATIVE · VIDEO · VEO5–8s @ 1080p + audio
GENERATIVE · VIDEO · VEO

Veo 3.1 Fast.

Cinematic video. Native audio.Fast tier.

Google's flagship video model in its speed-optimized tier. Synchronized native audio out of the box, deep object permanence across an 8-second clip, and the cleanest Foley sync in the catalog.

TYPICAL CLIP · 5–8s @ 1080p + audio
STARTING AT · $0.10 / SEC
TRY IT NOWCmd/Ctrl + Enter to generate
Cinematic close-up of a singer in a teal jacket on a neon-lit Tokyo street at night, synth-pop instrumental in the audio.
Cinematic tabletop POV: a chef's hands plating a single hand-rolled ravioli, ambient kitchen Foley.
Cinematic product hero: a matte-black smartphone slowly rotating on a plinth, glass reflections, subtle synthesizer pad.
Cinematic 5-second slow-motion shot: a lone surfer carves down the glassy face of a wave at sunset, ambient ocean sound.Generated by Veo 3.1 Fast · Google
LIVE OUTPUT

Where Veo 3.1 Fast shines.

Ad creative B-roll

8-second cinematic clips for ad cuts and commercial inserts. Native audio means you can drop in without a sound design pass.

EXAMPLEA surfer carving down a glassy wave at sunset, slow-motion, ambient ocean sound.

Social shorts

TikTok / Reels / Shorts content with synchronized audio out of the box. Fast tier keeps cost-per-clip in budget.

EXAMPLEA barista pulling an espresso shot, 1080p vertical, ambient cafe sound.

Product demos

Quick product motion previews for hero pages, app stores, and pitch decks. Native audio adds production value cheaply.

EXAMPLESmartphone rotating slowly on a black surface, glass reflections, subtle synthesizer pad.

Music-video concepts

Generate a dozen scene concepts before booking a director. Veo's cinematic vocabulary is the closest to a real DP.

EXAMPLECinematic close-up of a singer in neon-lit rain, synth-pop instrumental.

Animated marketing

Brand video with animated transitions, motion typography, and synchronized audio cues.

EXAMPLEBrand teaser: rotating product on a black plinth, music swell on reveal.

Storyboard-to-video

Convert key-frame storyboards into 8-second motion previews. Fast iteration before committing to a real shoot.

EXAMPLEFrom a storyboard: chef plating a single dish, hands only, soft kitchen light.

Generated with Veo 3.1 Fast.

A live cross-section of the model's range — portraits, products, typography, illustration, fashion, cinematic. Hover any tile to pause and read its prompt.

Cinematic slow-motion: a lone surfer carving down the glassy face of an emerald wave at sunset, ambient ocean roar.
Cinematic time-lapse of a flower blooming in a sunlit meadow, soft ambient breeze and birdsong.
Cinematic POV walking through a rain-soaked Hong Kong night market, neon reflected in puddles.
Slow-motion: a barista's hands pulling an espresso shot, golden crema rolling, ambient cafe murmur.
Cinematic crane up over a misty Japanese onsen at dawn, steam rising, ambient bamboo wind chimes.
Cinematic dolly-in on a vintage red Vespa at a sunlit Roman cobblestone alley at golden hour.
Slow-motion: a chef tossing fresh pasta dough in a dimly-lit Italian trattoria, flour cloud mid-air.
Cinematic two-shot: an elderly couple sharing a quiet laugh on a Parisian café terrace.
Slow-motion underwater: a freediver gliding through a kelp forest, shafts of sunlight piercing canopy.
Cinematic crane shot rising over a Brazilian beach at golden hour with surfers paddling out.
Slow-motion: a vintage steam locomotive thundering through a snow-dusted alpine pass.
Cinematic POV motorcycle ride through a misty Vietnamese mountain pass at dawn.
Cinematic close-up of an artisan glass blower shaping a glowing molten vase.
Cinematic dolly-in on a desert dune at dusk, silhouette of a Bedouin walking with a camel.

By the numbers.

#9On Infer's video generation1208 Elo · Arena EloView leaderboard →
Best in classAudio sync accuracy
Top 1Object permanence (8s clip)
YesNative 4K
Kling 3.0 ProBetter synchronized native audio; Kling is more stylized but needs a separate audio pass.
Seedance 2.0 ProBoth have native audio; Veo wins on global object permanence in 8s+ clips.
Hailuo 02 ProMore expensive; better global object permanence across the clip.
$0.1/ second of video

Pay only for successful generations. No idle, no minimums, no per-seat. Volume discounts kick in at 10K req/mo.

VS NATIVESame per-second price as Google's direct Veo API — but with one Infer key, batched billing, retries, and region routing.
VS SELF-HOSTClosed weights — self-host isn't an option. Infer is the production path.

Things teams ask.

Q.01How good is the native audio?
Best in catalog. Veo synthesizes ambient sound, Foley, and music tied to the on-screen action — no manual sound design pass needed for most use cases.
Q.02What aspect ratios are supported?
16:9, 9:16, 1:1. 16:9 is the fastest path; vertical and square add ~10–15% to latency.
Q.03What's the max clip length?
8 seconds at 1080p. For longer clips, use the extend-video endpoint to chain — up to ~148 seconds total.
Q.04Is there watermarking?
Yes. Both visible (small Google watermark) and SynthID-Audio embedded in the audio track for provenance.
Q.05Can I use the outputs commercially?
Yes. Infer passes through Google's commercial terms for Veo outputs.
Q.06What are the rate limits?
Default tier is 60 requests per minute. Video generation is queued and async — Infer returns a job ID immediately, then streams progress.
Q.07How is this different from calling Google directly?
One key, one bill, one SDK shape across 100+ models. Drop in by changing one URL.

Ship with Veo 3.1 Fast.

One key. One bill. One SDK shape — across 100+ models. Free credits on signup, no card required.