After watching hours of video footage of former President Barack Obama delivering his weekly address, Shruti Agarwal began to notice a few quirks about the way Obama speaks.
“Every time he says ‘Hi, everybody,’ he moves his head up to the left or the right, and then he purses his lips,” said Agarwal, a computer science graduate student at UC Berkeley.
Agarwal and her thesis advisor Hany Farid, an incoming professor in the Department of Electrical Engineering and Computer Science and in the School of Information at UC Berkeley, are racing to develop digital forensics tools that can unmask “deepfakes,” hyper-realistic AI-generated videos of people doing or saying things they never did or said.
Seeing these patterns in the real Obama’s speech gave Agarwal an idea.
“I realized that there is one thing common among all these deepfakes, and that is that they tend to change the way a person talks,” Agarwal said.
Agarwal’s insight led her and Farid to create the latest weapon in the war against deepfakes: a new forensic approach that can use the subtle characteristics of how a person speaks, such as Obama’s distinct head nods and lip purses, to recognize whether a new video of that individual is real or a fake.
Their technique, which Agarwal presented this week at the Computer Vision and Pattern Recognition conference in Long Beach, CA, could be used to help journalists, policy makers, and the public stay one step ahead of bogus videos of political or economic leaders that could be used to swing an election, destabilize a financial market, or even incite civil unrest and violence.
“Imagine a world now, where not just the news that you read may or may not be real — that’s the world we’ve been living in for the last two years, since the 2016 elections — but where the images and the videos that you see may or may not be real,” said Farid, who begins his tenure at UC Berkeley on July 1. “It is not just about these latest advances in creating fake images and video. It is the injection of these techniques into an ecosystem that is already promoting fake news, sensational news and conspiracy theories.”
The new technique works because all three of the most common deepfake techniques — known as “lip-sync,” “face swap,” and “puppet-master,” — involve combining audio and video from one source with an image from another source, creating a disconnect that may be uncovered by a keen viewer — or a sophisticated computer model.
On the left, Saturday Night Live star Kate McKinnon impersonates Elizabeth Warren during a skit, and on the right, face swap deepfake technology has been used to superimpose Warren’s face onto that of McKinnon.
The image is credited to Stephen McNally.
Using the “face swap” technique, for example, one could create a deepfake of Donald Trump by superimposing Trump’s face onto a video of Alec Baldwin doing an impersonation of Trump, so that it is almost as if Baldwin is wearing a skin-tight Trump mask. But Baldwin’s facial expressions will still show through the mask, Agarwal said.
“The new image that is created will have the expressions and facial behavior of Alec Baldwin, but the face of Trump,” Agarwal said.
Likewise, in a “lip-sync” deepfake, AI algorithms take an existing video of a person talking, and alter the lip movements in the video to match that of a new audio, where the audio may be an older speech taken out of context, an impersonator speaking, or synthesized speech.
Last year, actor and director Jordan Peele used this technique to create a viral video of Obama saying inflammatory things about president Trump.
But in these videos, only the lip movements are changed, so the expressions on the rest of the face may no longer match the words being spoken.
To test the idea, Agarwal and Farid gathered video footage of five major political figures – Hillary Clinton, Barack Obama, Bernie Sanders, Donald Trump and Elizabeth Warren – and ran them through the open-source facial behavior analysis toolkit OpenFace2, which picked out facial tics like raised brows, nose wrinkles, jaw drops and pressed lips.
OpenFace tracking software analyzes a real video of President Obama on the left, and a “lip-sync” deepfake on the right. The image is credited to Stephen McNally.
They then used the outputs to create what the team calls “soft biometric” models, which correlate facial expressions and head movements for each political leader.
They found each leader had a distinct way of speaking and, when they used these models to analyze real videos and deepfakes created by their collaborators at the University of Southern California, they found the models could accurately tell the real from the fake between 92 and 96 percent of the time, depending on the leader and length of the video.
“The basic idea is we can build these soft biometric models of various world leaders, such as 2020 presidential candidates, and then as the videos start to break, for example, we can analyze them and try to determine if we think they are real or not,” Farid said.
Video credit: UC Berkeley.
Unlike some digital forensics techniques, which identify fakes by spotting image artifacts left behind during the fabrication process, the new method can still recognize fakes that have been altered through simple digital processing like resizing or compressing.
But it’s not foolproof. The technique works well when applied to political figures giving speeches and formal addresses because they tend to stick to well-rehearsed behaviors in these settings.
But it might not work as well for videos of these people in other settings: for example, Obama may not give his same characteristic head nod when greeting his buddies.
Deepfake creators could also become savvy to these speech patterns and learn to incorporate them into their videos of world leaders, the researchers said.
Agarwal says she hopes the new approach will help buy a little time in the ever-evolving race to spot deepfakes.
“We are just trying to gain a little upper-hand in this cat and mouse game of detecting and creating new deepfakes,” Agarwal said.
Deepfakes — a technology originally used by Reddit perverts who wanted to superimpose their favorite actresses’ faces onto the bodies of porn stars – have come a long way since the original Reddit group was banned.
Deepfakes use artificial intelligence (AI) to create bogus videos by analyzing facial expressions to replace one person’s face and/or voice with another’s.
Using computer technology to synthesize videos isn’t exactly new.
Remember in Forrest Gump, how Tom Hanks kept popping up in the background of footage of important historical events, and got a laugh from President Kennedy?
It wasn’t created using AI, but the end result is the same.
In other cases, such technology has been used to complete a film when an actor dies during production.
The difference between these examples and that latest deepfake technology is a question of ease and access.
Historically, these altered videos have required a lot of money, patience, and skill. But as computer intelligence has advanced, so too has deepfake technology.
Now the computer does the work instead of the human, making it relatively fast and easy to create a deepfake video.
In fact, Stanford created a technology using a standard PC and web cam, as I reported in 2016.
Nowadays, your average Joe can access open source deepfake apps for free. All you need is some images or video of your victim.
While the technology has mostly been used for fun – such as superimposing Nicolas Cage into classic films – deepfakes could and have been used for nefarious purposes.
There is growing concern that deepfakes could be used for political disruption, for example, to smear a politician’s reputation or influence elections.
Legislators in the House and Senate have requested that intelligence agencies report on the issue.
The Department of Defense has already commissioned researchers to teach computers to detect deepfakes.
One promising technology developed at the University of Albany analyzes blinking to detect deep fakes, as subjects in the faked videos usually do not blink as often as real humans do.
Ironically, in order to teach computers how to detect them, researchers must first create many deepfake videos.
It seems that deepfake creators and detectors are locked in a sort of technological arms race.
The falsified videos have the potential to exacerbate the information wars, either by producing false videos, or by calling into question real ones.
People are already all too eager to believe conspiracy theories and fake news as it is, and the insurgence of these faked videos could be created to back up these bogus theories.
Others worry that the existence of deepfake videos could cast doubt on actual, factual videos.
Thomas Rid, a professor of strategic studies at Johns Hopkins University says that deepfakes could lead to “deep denials” – in other words, “the ability to dispute previously uncontested evidence.”
While there have not yet been any publicly documented cases of attempts to influence politics with deepfake videos, people have already been harmed by the faked videos.
Women have been specifically targeted.
Celebrities and civilians alike have reported that their likeness has been used to create fake sex videos.
Deepfakes prove that just because you can achieve an impressive technological feat doesn’t always mean you should.
Source:
UC Berkeley
Media Contacts:
Kara Manke – UC Berkeley
Image Source:
The images are credited to Stephen McNally.
Original Research: The findings were presented at the Computer Vision and Pattern Recognitionconference in Long Beach, California.