Google has unveiled a new artificial intelligence model called Omni, which the company claims can eventually convert any type of input—text, images, audio, or video—into any other format. For now, the first iteration, Omni Flash, focuses on video generation and is available through Google's Flow platform. Early tests reveal a tool that is both astonishingly realistic and frustratingly inconsistent, offering a glimpse into a future where AI-generated content becomes indistinguishable from reality—but not without serious caveats.
What Is Omni?
Omni represents a leap forward from Google's previous video generation model, Veo. While Veo could create videos from text prompts, Omni introduces the ability to upload a starting video and use it alongside a text prompt to generate new scenes. Google also claims Omni incorporates more real-world knowledge, helping maintain character consistency and scene coherence. The model is part of a broader push at Google to unify its generative AI capabilities under the Gemini brand.
The technology behind Omni builds on advances in transformer architectures and diffusion models. By training on vast datasets of videos and their descriptions, the model learns to predict realistic motion, lighting, and object interactions. However, as with all generative AI, the results can be unpredictable.
Testing the Limits: A Stuffed Deer Goes on Vacation
To evaluate Omni's capabilities, we conducted a series of tests reminiscent of a famous Gemini advertisement from last year. The goal was to create video clips of a plush deer named Buddy on various adventures. The results were a mixed bag. Some clips were remarkably coherent—Buddy packing a suitcase, boarding a cruise ship, and even skydiving—but others suffered from bizarre glitches. For instance, during a skydiving scene, Buddy's orientation would suddenly flip, breaking the illusion of gravity. In another clip, a bottle of honey that Buddy packed kept changing shape and color, from a jar to a squeeze bottle to a clear container filled with water.
These inconsistencies highlight the current limitations of AI video generation. While the model can produce individual frames that look photorealistic, maintaining logical continuity across multiple seconds remains a challenge. Characters may spontaneously gain or lose accessories—Buddy occasionally sprouted antlers, which he does not have in reality. When prompted to remove the antlers in one scene, the model obliged but then added them to all other scenes.
Despite these flaws, the best clips were genuinely impressive. One montage had Buddy packing a jar of honey, which later appeared in a beach scene where he squirted it onto his hoof as if it were sunscreen. The model even added dialogue ("Uh oh") and a playful narrative arc. For a few seconds, it felt like watching a real animation.
Deepfaking the Author: A Personal Test
Perhaps the most unsettling test involved deepfaking the author. Starting with a simple selfie video of herself with a neutral expression, she prompted Omni to generate videos of eating spaghetti, sitting in an airplane seat, and posing in front of the Eiffel Tower while eating a baguette. The results were shockingly realistic.
In the pasta-eating clip, the fork hitting the bowl produced a slightly artificial clink, and a woman in the background of the airplane video appeared twice, but aside from those tells, the videos were convincing. The author's husband—who has seen her face every day for a decade—was fooled, believing the pasta video was real except for an unfamiliar bowl. The Eiffel Tower clips varied in quality, but one was so realistic that it required multiple viewings to spot the AI artifacts.
This demonstration underscores the ethical concerns around AI-generated content. With a smartphone video and a few minutes of prompting, anyone can create convincing footage of themselves doing things they never did. The implications for misinformation, identity theft, and trust in visual media are profound.
Cost and Accessibility
Omni is not free. Generating videos consumes credits, with prices ranging from 15 to 40 credits per clip depending on length and complexity. Edits cost 40 credits each. Google's AI Pro plan, which costs $20 per month, includes 1,000 credits. In our tests, generating 20 clips with occasional edits depleted 855 credits, leaving only 145. For users with specific visions, the cost can quickly escalate as they iterate to refine results.
This pricing model positions Omni as a professional tool rather than a casual toy. It competes with other AI video platforms like RunwayML, Pika Labs, and OpenAI's Sora (still in limited preview). However, Google's integration with its cloud ecosystem and existing user base gives it a significant advantage.
Real-World Implications
Omni's release comes at a time when generative AI is reshaping creative industries. Filmmakers, advertisers, and social media creators are exploring AI-generated content for storyboarding, concept art, and final outputs. But the technology also raises pressing questions about authenticity and regulation.
Google has implemented safeguards, including watermarking AI-generated content and restricting certain prompts that could be used for harmful deepfakes. However, these measures are imperfect. In our tests, the model produced high-quality deepfakes without requiring explicit consent from the individuals depicted. As Omni and similar tools become more accessible, the line between real and generated video will continue to blur.
Historically, each leap in AI video generation—from early GAN-based faces to diffusion models like Veo—has democratized content creation while amplifying risks. Omni represents the latest step in this trajectory. Its ability to edit existing videos with text prompts is particularly powerful, enabling users to change backgrounds, add objects, or alter character expressions with minimal effort. But the same features could be exploited to manipulate evidence, create fake news, or impersonate public figures.
Google has stated that Omni will eventually support audio and image inputs as well, further expanding its utility. The company is also developing tools to detect AI-generated content and verify provenance, but these efforts are still in early stages.
Technical Underpinnings
Omni Flash uses a variant of Google's Gemini architecture, which combines multimodal understanding with autoregressive generation. The model processes input sequences—whether text, images, or video frames—and predicts subsequent frames in a temporal sequence. It leverages a large-scale training dataset of diverse video clips paired with detailed captions, allowing it to learn common physical interactions, camera movements, and scene transitions.
One key innovation is Omni's ability to maintain character identity across scenes. By learning to map facial features, clothing, and objects to consistent representations, the model reduces the "melting face" effect common in earlier AI videos. However, as our tests showed, this consistency is not foolproof. Objects still warp, and characters can morph unexpectedly.
The model also incorporates a form of world modeling, meaning it understands basic physics—like how a bottle of honey should pour or how a deer might move. But this knowledge is probabilistic, leading to occasional violations of reality (e.g., a jar of honey turning into a different container). These glitches provide insight into the model's limitations: it mimics patterns in its training data without true understanding of causality.
Comparison with Competitors
Omni enters a crowded field. OpenAI's Sora, announced earlier in 2025, stunned observers with its ability to generate minute-long videos from text prompts. However, Sora remains in closed beta. RunwayML's Gen-3 Alpha offers similar capabilities with a focus on creative tools, while Pika Labs targets quick turnaround for social media content. Each platform has strengths: Runway excels at generating surreal art, Pika is optimized for short loops, and Sora aims for cinematic realism.
Omni's differentiator is its multimodal input and tight integration with Google's ecosystem. Users can start with a photo from Google Photos, edit it using Gemini, and then generate a video—all within Flow. This seamless experience could attract users already invested in Google services. Additionally, Omni's emphasis on video-to-video editing sets it apart from competitors that primarily focus on text-to-video.
However, in terms of pure output quality, Omni does not yet surpass Sora's best demos. The clips we generated often had lower resolution and more artifacts than what OpenAI has shown. But Google's model is publicly available, whereas Sora is not, giving Omni a first-mover advantage in the mass market.
Ethical and Regulatory Landscape
As AI video generation becomes more accessible, governments and platforms are scrambling to establish guidelines. The European Union's AI Act, passed in 2024, mandates disclosure of AI-generated content and imposes strict rules on high-risk applications like deepfakes. In the United States, the Biden administration's executive order on AI safety includes requirements for content provenance. Some states, like California, have introduced laws specifically targeting deepfakes in political advertising.
Tech companies are also taking voluntary measures. Google, Microsoft, and OpenAI have signed the C2PA (Coalition for Content Provenance and Authenticity) standard, which embeds cryptographic metadata in AI-generated files. However, these systems are not foolproof—metadata can be stripped, and the detection tools are still being developed.
During our tests, all generated videos were clearly marked as AI-generated within the Flow platform, but once downloaded and shared on social media, those watermarks could be removed. This highlights the need for platform-level detection and user education.
The Uncanny Valley and Beyond
The test results place Omni firmly in the uncanny valley. Individual frames often look photorealistic, but motion and physics betray the artificial origin. For example, a deepfake of the author eating spaghetti showed a slight delay in the fork's movement relative to her mouth, and the pasta itself glistened with unnatural uniformity. These imperfections are likely to diminish as models improve, but for now, they serve as a reminder that AI video generation is not yet seamless.
Google has indicated that Omni will receive regular updates, including improved physics modeling and longer video generation. The company is also exploring user controls to fine-tune realism versus creativity. In the future, users might specify a "stylized" mode for cartoons or a "hyperreal" mode for documentary-style content.
For creatives, Omni offers a powerful sandbox for experimentation. The ability to quickly generate video variations from a single input could accelerate storyboarding, advertising, and education. But with great power comes great responsibility—both for the creators using the tools and the platforms hosting the content.
A New Era of Video Creation
Omni represents a significant milestone in the evolution of generative AI. While not yet perfect, it brings the dream of anything-to-anything conversion closer to reality. The technology will undoubtedly transform how we create and consume video, but it also demands a critical conversation about trust, authenticity, and regulation. As one test put it, we are deep in the uncanny valley, but the path to the other side is becoming clearer—and faster than many anticipated.
Source: The Verge News