If you’re listening to or have already listened to the beginning of this episode, you’ve probably noticed that the voice that suggested Kimberly Adams sounded a little strange, right?
That’s because it’s a “deepfake” of audio, and it was created by Yisroel Mirsky of the Offensive AI Research Lab through a process called speech cloning.
It uses software to study a short sample of someone’s voice and create a deepfake, which in this case only took 15 seconds to generate, according to Mirsky.
It’s far from a perfect replica, but it turns out that scammers have recently started using deepfake audio for phone scams and in some cases they have succeeded.
Marketplace’s Kimberly Adams recently spoke with Kyle Alspach, a cybersecurity reporter at Protocol, about how the technology behind audio deepfakes works and how they’re used in phone scams. The following is an edited transcript of their conversation.
Kyle Alspach: The technology means that [artificial intelligence], deep learning, and an audio sample of a person’s voice is needed. It can only be three seconds, and it trains itself the way this person sounds. And it creates a model that can then be used to replicate that person’s voice. The current way it works is you have to use text to speech, which means you just type a sentence, and then it speaks with the cloned voice.
Kimberly Adams: How are these audio deepfakes used in scams?
Alspach: So we know they are currently being used to target businesses, especially larger ones. Someone will pretend to be someone else with someone else’s voice, usually some sort of executive or someone’s boss, and ask them to transfer money or ask them for password information, that sort of thing. And in some cases they have been successful.
Adams: Do you have any idea how common audio deepfakes are now? And where do industry experts see it going?
Alspach: You know, among the bigger companies, I think more and more of them are starting to see this because they’re really mature targets for this kind of thing. But right now I don’t think many people see it, but it’s the beginning of this kind of technology being used in this way.
Adams: What do cybersecurity experts suggest that ordinary people or companies do to avoid falling for this deepfake audio scam?
Alspach: For business purposes, for example, if you’re going to transfer money, you might want to add some extra steps to that process and involve someone saying some kind of challenge phrase or something like that that everyone agrees on beforehand, [to] prevent that from happening. But the most important thing is to be aware that it can happen to you and pay attention when someone calls you and make sure it really sounds like you would expect and they ask you for things they would normally ask you for. It’s really kind of a low-tech approach right now to defend against this sort of thing. But really, it just creates the possibility in your mind for this to happen. And I think unfortunately you need to get a little more skeptical now that people are calling you.
You can read more of Alspach’s reporting on audio deepfakes here.
Yisroel Mirsky told us that the technology he used to generate the deepfake audio for Alspach’s article and for our show is “relatively old.”
Alspach also mentioned in his piece that there is an open-source vote cloning tool online that anyone – even scammers — can download and possibly use.
But as we found out, getting that software to work is not easy by downloading and pressing the install button.
Technical barriers can hold you back at some steps in the process. For example, if you don’t know the Python coding language or if your computer isn’t powerful enough, you probably won’t be able to use it, as our producer Daniel Shin discovered.