In December 2023, Google launched Gemini, a multimodal AI model that could understand and generate text, images, and audio.
I worked on the small team tasked to create a series of promotional videos for the launch. Multimodal AI was a new and complex concept, so we had to create a series of videos that would explain the technology to a wide audience.
Hands on with Gemini
One of the videos was a 6 minute long demo of the AI in action, that used abridged versions of the prompts. I helped with the concepting and production of the video, alongisde a team of 3 other brilliant filmmakers.
Controversy
The launch was an overnight success. It increased Google stock by 5.5% overnight. However, the launch video's popularity was followed by a wave of scrutiny.
In the video, it was unclear that the prompts voiced aloud were abridged versions of the actual prompts sent to the model.
The underlying prompts for anyone to recreate the results are outlined in this developer post.
Media Coverage
Personal Note
I am EXTREMELY glad news outlets were criticizing this launch. I believe it's important to hold companies accountable for their claims, especially when it comes to introducing complex technology.
It shifted my values from making AI feel magical towards making it feel approachable.
Snackable Snippets
Another series of videos from the launch (facing less scrutiny) were 1 minutes videos demonstrating novel use cases, like picking an outfit or combining different emojis.
Coding Snippet
I was responsible for concepting, scripting, and voicing the use case video for coding.
I personally love fractal trees... especially ties between biomimicry and code, so I made a video highlighting the model's ability to generate code by bridging patterns across different domains.