AI Video Generation: A Nascent Field with Rapid Progress

How AI Transforms Video Production
Jan 07, 2024 00:00

The current AI landscape is bustling with applications for generating text and images, but AI-driven video generation is still in its nascent stages. Despite initial attempts and breakthroughs, it pales in comparison to other AI-generated content. Challenges like jitter, flickering, and frequent scene jumps create a "glitchy" effect, making stable output difficult.

However, the recent buzz around Pika has reignited interest in AI video generation, even leading to a surge in the stock prices of the founder's father's listed company. Tech media outlets are touting Pika as a formidable competitor to the more established AI video generation tool Runway. Runway gained recognition for its use in last year's hit film "Instant Universe."

There are concerns about over-hyping Pika 1.0's capabilities, raising unrealistic public expectations for AI video. In response, Pika quickly released beta test videos, stunningly matching their promotional claims and reversing their reputation overnight.

The AI video generation field is heating up, with participants ranging from universities and research teams to tech giants and burgeoning AI startups. Internationally, Runway, Pika Labs, Meta (Emu Video), and the team of Fei-Fei Li are engaged in an intense battle to produce the most stable and impressive results. Domestically, companies like ByteDance and Meitu are quietly competing, each developing their AI video generation tools.

Pika's launch raises questions about the future leader in AI video generation. Why has AI video suddenly exploded, and what does it mean for the industry? What were the critical constraints on its development, and how have they been overcome? Does this signal the arrival of a killer app in this field, akin to a "GPT moment"? And what direction will it evolve in?


01. Overtaking Runway?

Why has the newly established Pika, just over six months old, gained such attention? In just seven months, Pika has completed three funding rounds, securing $55 million and valuing over $250 million. Investors include OpenAI scientists and board members. In April this year, founder Wenjing Guo and a classmate left their Stanford PhDs to start Pika, aiming to create an AI video generator that lets everyone be the director of their story.

Until now, Pika wasn't as effective as Runway, known only in video circles. Its free model garnered a loyal user base, now reaching 500,000 with millions of videos created weekly. Pika utilizes hundreds of GPUs to support this demand.

Since November 29, reports on Pika's venture have flooded the media. Pika 1.0 significantly lowers barriers to entry, claiming "Start just by typing" - exceeding industry expectations in semantic understanding and detailed imagery. Pika plans to commercialize next year, offering higher-quality material and video segments while keeping videos under one minute.

Creators with early access reveal that Pika 1.0 supports three video generation methods: text-to-video, image-to-video, and video-to-video, feeling like a complete model overhaul. It excels in 2D and 3D animation, quickly producing Pixar-level animations. It seems Runway has met its match!

However, in a Forbes interview, Guo modestly stated she doesn't intend to rival industry giants like Adobe and Runaway. Pika focuses on consumers, differentiating itself from competitors like Runway, which serves both consumers and businesses. Runway offers a free trial and costs $12/month, whereas Pika is free, considering a subscription model in the future.


02. Power Players in Fierce Competition

The rise of short videos and social media marketing, along with the creator economy, has led to an influx of new editing products showing potential. Currently, the demand for video editing surpasses generation. ByteDance's CapCut, for example, has gained immense popularity overseas, with over a hundred million users.

AI has significantly impacted post-production and editing, changing the landscape for average users who can now use lightweight AI editing features without traditional software or extensive learning.

From a16z's statistics, video editing remains crucial in such applications. Of the six video AIGC apps they surveyed, only two focus solely on video generation: Runway and Kaiber. The rest, like D-ID and Fliki, involve repurposing existing materials, and others like Kapwing and Veed offer video editing.

However, as production costs rise, generated content becomes more appealing, indicating a shift in video production. Products driving this trend fall into two categories: digital human products like HeyGen and Sythesia, and video generation products like Runway and Pika Labs, which create new videos from text or images.

October saw Moonvalley shift from AI image/text generation to video, claiming to produce the "most powerful video AI" to date. November brought updates from Runway, Meta (with Emu Video and Emu Edit), and Stability AI's Stable Video Diffusion, marking a significant leap in AI video generation.


03. The "GPT Moment" in AI Video Still Awaits

Despite improvements in large-scale models for text generation, image generation, and audio generation, video generation lags. This suggests AI video is on the cusp of a breakthrough, full of potential and opportunities.

Pika, established just over six months ago, has already secured significant funding, contrasting with Runway, an AI startup since 2015. Pika is confident that the "GPT moment" is near, with founder Wenjing Guo expecting significant improvements next year.

Realistically, AI video generation still faces challenges, especially in stabilizing imagery. This is reminiscent of early hand-drawn animations, where many still frames are connected to create motion. Current AI technology struggles to determine keyframes accurately, leading to unstable results.

Moreover, people still pay for traditional software like Adobe, while video generation tools see low user payment rates, questioning their profitability.

AI-generated video tools are predominantly producing short series due to technical limitations, having little impact on longer videos and films. As iQiyi CEO Gong Yu recently discussed, AIGC has started impacting the content industry, though it's still far from meeting industry standards.

Therefore, while Pika has raised expectations for AI-generated video, there's still a long way to go before reaching ideal results. AI video's "iPhone moment" or killer app has yet to arrive.



Pika's AI-generated video app has reignited interest in this field, backed by multimodal large-scale models like GPT-4V and technologies like Dalle-3, Midjourney, and Stable Diffusion. These advancements are set to revolutionize industries like film, entertainment, and advertising.

AI video faces uncertainties and challenges, including copyright issues, originality protection, and ensuring content quality and compliance. The rising computational costs of running advanced AI models, with growing user bases, are a concern.

Supported by investors and former OpenAI board members and scientists, Pika is poised to become a potential game-changer, offering a platform akin to TikTok but more automated and intelligent, making video creation simpler for average users.

The rise of video AI marks a new phase in AI competition, with companies and research institutions vying for dominance. Future innovations in algorithms, like Transformers, could herald a new era of video models, potentially leading to the next hot content platform like TikTok.