Ventured

Tech, Business, and Real Estate News

Counterfeiters Are Using AI And Machine Learning To Make Better Fakes

Source: Yahoo Finance, Andrew Tarantola
Photo: Synthesizing Obama – Learning Lip Sync from Audio

It’s terrifyingly easy to just make stuff up online these days, such is life in the post-truth era. But recent advancements in machine learning (ML) and artificial intelligence (AI) have compounded the issue exponentially. It’s not just the news that’s fake anymore but all sorts of media and consumer goods can now be knocked off thanks to AI. From audio tracks and video clips to financial transactions and counterfeit products — even your own handwriting can be mimicked with startling levels of accuracy. But what if we could leverage the same computer systems that created these fakes to reveal them just as easily?

People have been falling for trickery and hoaxes since forever. Human history is filled with false prophets, demagogues, snake-oil peddlers, grifters and con men. The problem is that these days, any two-bit huckster with a conspiracy theory and a supplement brand can hop on YouTube and instantly reach a global audience. And while the definition of “facts” now depends on who you’re talking to, one thing that most people agreed to prior to January 20th this year is the veracity of hard evidence. Video and audio recordings have long been considered reliable sources of evidence but that’s changing thanks to recent advances in AI.

In July 2016, researchers at the University of Washington developed a machine learning system that not only accurately synthesizes a person’s voice and vocal mannerisms but lip syncs their words onto a video. Essentially, you can fake anybody’s voice and create a video of them saying whatever you want. Take the team’s demo video, for example. They trained the ML system using footage of President Obama’s weekly address. The recurrent neural network learned to associate various audio features with their respective mouth shapes. From there, the team generated CGI mouth movements, and with the help of 3D pose matching, ported the animated lips onto a separate video of the president. Basically, they’re able to generate a photorealistic video using only its associated audio track.

While the team took an outsized amount of blowback over the potential misuses of such technology, they had far more mundane uses for it in mind. “The ability to generate high-quality video from audio could signicantly reduce the amount of bandwidth needed in video coding/transmission (which makes up a large percentage of current internet bandwidth),” they suggested in their study, Synthesizing Obama: Learning Lip Sync from Audio. “For hearing-impaired people, video synthesis could enable lip-reading from over-the-phone audio. And digital humans are central to entertainment applications like film special effects and games.”

UW isn’t the only facility looking into this sort of technology. Last year, a team from Stanford debuted the Face2Face system. Unlike UW’s technology, which generates video from audio, Face2Face generates video from other video. It uses a regular webcam to capture the user’s facial expressions and mouth shapes, then uses that information to deform the target YouTube video to best match the user’s expressions and speech — all in real time.

AI-based audio-video transcription is a two-way street. Just as UW’s system managed to generate video from an audio feed, a team from MIT’s CSAIL figured out how to create audio from a silent video reel. And do it well enough to fool human audiences.

“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” Andrew Owens, the paper’s lead author told MIT News. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”

The MIT’s deep learning system was trained over the course of a few months using 1,000 videos containing some 46,000 sounds resulting from different objects being poked, struck or scraped with a drumstick. Like the UW algorithm, MIT’s learned to associate different audio properties with specific onscreen actions and synthesize those sounds as the video played. When tested online against a video with authentic sound, people actually chose the fake audio over the real twice as often as the baseline algorithm.

The MIT team figures that they can leverage this technology to help give robots better situational awareness. “A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if they stepped on either of them,” Owens said. “Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world.”

https://finance.yahoo.com/news/counterfeiters-using-ai-machine-learning