Rap Mona Lisa? New Microsoft AI animates faces from photos

New York
CNN
–

Thanks to Microsoft’s new AI technology, the Mona Lisa can now do more than just smile.

Last week, Microsoft researchers unveiled a new AI model they’ve developed that can take a still image of a face and an audio clip of a person speaking and automatically create a realistic-looking video of the person speaking. The videos — which can be made from real faces, but also caricatures or artwork — feature lip sync and natural face and head movements.

In an experimental video, the researchers showed how they animated the Mona Lisa to a comedic rap song by actress Anne Hathaway.

The output of the AI model is called Float-1, is both amusing and somewhat contradictory in its reality. Microsoft says the technology could be used for educational purposes, “to improve accessibility for people with communication difficulties,” or perhaps to create virtual companions for people. But it’s also easy to see how the tool can be misused and used to impersonate real people.

It’s a concern that goes beyond Microsoft: As more tools emerge to create compelling AI-generated images, videos and audio clips, Concerned experts And their misuse can lead to new forms of misinformation. Some fear the technology could further disrupt the creative industries, from film to advertising.

At the time, Microsoft stated that it did not plan to release the VASA-1 model to the public immediately. The move is similar to how Microsoft partner OpenAI is addressing related concerns Video tool created by artificial intelligenceSora: OpenAI teased Sora in February, but so far has only made it available to a few professional users and cybersecurity professors for testing purposes.

“We reject any behavior designed to create content that is misleading or harmful to real people,” Microsoft researchers said in a blog post. However, they added that the company “has no plans to publicly release the product”.

The researchers said Microsoft’s new AI model was trained on multiple videos of human faces while speaking and was designed to recognize natural facial and head movements, including “lip movements, (non-lip) expressions, eye gaze, among others.” thing”. The result is more realistic video when VASA-1 pans a still image.

For example, in an experimental video in which someone appeared excited while playing a video game, the talking face had furrowed brows and pursed lips.

The AI tool can also be controlled to create a video where the person is looking in a certain direction or expressing a certain emotion.

If you look closely, there are still signs that the videos are machine-made, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft believes its model “significantly outperforms” other similar tools and “paves the way for real-time interaction with lifelike avatars that mimic human conversational behavior.”

Source link