This Voice-Cloning Tool Could Get Banned (So Let's Play With It!)

Intro

In this video, we're going to talk about voice cloning, text to speech, using those cloned voices and a tool that might get one company in a lot of hot water and we're going to play with it ourselves.

News Article

So let's dig in. You may have even seen some of these news articles lately. 4chan users embrace AI voice clone tool to generate celebrity hate speech. The generated audio ranges in content from memes and erotica to virulent hate speech. 4chan members used 11 labs to make deep fake voices of Emma Watson, Joe Rogan and others saying racist, transphobic and violent things. 11 labs, a startup that offered voice synthesis by mimicking any voice, was shut down just days after its launch announcement. This is actually not true. You can still use it and we have access and we're going to play with it in this video.

Sample Clip

Here's a sample clip that's been circulating of Joe Biden. Take a listen. Hello, this is an AI generated voice clone of President Joe Biden. This voice clone was trained using audio of Biden's speeches downloaded from YouTube. This audio was generated in a matter of seconds. There you go. Pretty convincing, right?

Text to Speech Generators

Now, in a previous video, I talked about text to speech generators and I even trained my voice into this tool called Resemble AI. And it's actually pretty dang good. This is an example of what my voice sounds like coming from Resemble AI. Not great, not horrible either, but it didn't sound like that Joe Biden clip we were just listening to. And this Resemble.ai platform. Well, when you go to train a voice in this, it has you record 25 samples of very specific text prompts that you need to read to it. So you couldn't just take somebody else's voice and implement it unless you have a recording of them saying the exact prompts it needs you to say to train your voice.

Instant Voice

Now, Eleven Labs, on the other hand, in order to train a voice into this program, all you have to do is click add instant voice, upload sound clips of existing audio that's out there, and it trains it on that voice. The only real restriction that it gives you when uploading voices is you got to check a box that says, " I agree that I take full responsibility for potential copyright infringement when cloning voices I do not have rights to". As long as you check that box, you can pretty much upload any audio you want. I'm not going to go crazy here and try to clone voices that I don't have the rights to. I don't want to be named in any of the most likely inevitable lawsuits that are coming out of it. But I do want to play around with this tool because I'm a nerd. I love cool tools. And this tool, although definitely has some ethical concerns, I'm excited to see what it can do with my voice. So I'm going to upload some audio of my own voice and see what we get out of it. Here's a quick snippet of what I'm going to be uploading. Welcome to the first episode of the Future Tools Podcast. It was one of only three, but that's what it sounds like for two minutes and 43 seconds. So I'm going to go ahead and drag and drop this MP3 file right in here, let it upload, and I'm going to name it Matt Wolf Voice.

I'm going to agree to the terms that I take full responsibility for my own voice, and I'm going to click add voice. Now I can add more samples and it will improve the sound quality with more samples. But for now, let's just see how it works with two minutes and 45 seconds of my voice uploaded into the platform. Let's click use and let's go ahead and enter some text and I'll click generate. This is an example of what my voice sounds like using 11 labs. Not too dang bad. I mean, I feel like it made my voice a little bit deeper, but let's mess with some of these settings here. Let's crank this all the way up a little bit more. Let's bring this up to 90% and let's try some new text. Let's generate it one more time. My name is Matt Wolf and I love to nerd out about cool tools, especially tools that leverage artificial intelligence. Yeah, I don't really feel like it sounds too much like me. Maybe I just need to add some more audio. So let's go grab some more audio and see if I can actually improve upon the sound quality and make it sound even more like me.

Converting Video To MP3

In order to do this, I'm going to take one of the recent videos that I made and I'm going to drag it into this video to MP3 converter that converts it to an MP3 file for free. So all I have to do is take one of my recent videos here. Let's grab this mixo video that I made the other day and drop it in here and click convert. It's going to upload this file and then allow me to download it as an MP3. All right, so it's finished converting that. So let's go ahead and click download.

Editing Voice

Let's go back over to our voice lab and let's edit Matt Wolf's voice here. And I'm going to grab this file that I just converted and upload that as another sample. Agree to the terms once again and click edit voice.

Generating Longer Text

Let it process the new audio. All right, so it just processed some more audio. Let's go ahead and use this again and generate it one more time. See how it comes out. My name is Matt Wolf and I love to nerd out about cool tools, especially tools that leverage artificial intelligence. Not too bad. It definitely sounds like it got closer. It took a little bit longer to generate that second time around. Let's try a different longer prompt.

I jumped over to my MattWolf.com blog here and let's just go ahead and grab a paragraph. This was actually a paragraph generated by AI if you watched my previous video and we'll jump back over to 11 labs and paste this whole paragraph in and see how it does with a little bit longer form text. Let's mess with the variability a little bit. This says that increasing variability can make speech more expressive with output varying between regenerations. It can also lead to instabilities. So let's go ahead and move this more towards the middle. And then for clarity and similarity enhancement, it says low values are recommended if background artifacts are present in generated speech. I don't believe I have too much background noise, but I'm going to go ahead and bring this down a little bit and let's go ahead and generate one more time and see how it comes out with a little bit longer text. In this blog post, we will explore the different ways entrepreneurs can use AI for business process automation, the benefits it can bring, and the steps entrepreneurs need to take to implement it in their businesses. Our goal is to show entrepreneurs how they can use AI to automate their business processes and gain a competitive advantage in their industry. So whether you're a small business owner or an entrepreneur just starting out, this post is for you. I've got to say it doesn't really sound that close to me, at least not in my opinion. You could let me know what you think in the comments. I think it sounds like it's got some similarities, especially when it first starts talking. You can tell it's kind of close, but then the longer the text goes on, the less I feel like it sounds like me. However, this is the best text to speech generator that I've come across. Even if the voice doesn't sound totally like me, how natural did that sound? I mean, it sounded like a real human speaking so much more so than any of the other text to speech generators we've played with previously. And you can see right now I'm on a completely free trial plan. I haven't paid a cent to use 11 labs yet, and I still have this quota of 9,239 remaining. And this refers to the amount of characters. That's not a word count. That's a character count. So this text above is using 461 out of a possible 2,500 characters. Let's go ahead and grab one last sample audio just out of curiosity. And I'm going to mess with the voice settings one more time, and we'll see how it sounds. I'm going to grab a bit of text from my build in public blog post here. Let's grab this and jump back over. I'll paste this in. Let's see how it handles spaces just out of curiosity. Let's click on our voice settings. Let's just bring them both to 50% and see how that affects things. As I started building it, I began sharing the results I was getting on Twitter and Facebook, and people were loving the behind the scenes look at what I was doing. These simple actions helped build more momentum for the site, grew my followers, and really started to build my confidence again. I figured it would be a fun idea to create periodic updates on this site about what I've been doing, what results I'm seeing, and what I'm learning. So whether or not you think it sounds totally like me or not, you can't really disagree that the audio quality is actually really, really good and sounds very natural. The more training data you give it inside of the voice lab here, the better the audio quality is going to sound. And as you can imagine, people like Joe Rogan and Joe Biden and all of these various celebrities, especially podcasters, they have a ton of content out there that anybody could really kind of abuse. They can take this audio content, put it inside of 11 Labs, and then generate those people saying whatever you can imagine. And they have. And that's a little bit scary because we're no longer going to be able to trust our ears when we hear people's voices. And I don't know how I feel about that. Either way, training your own voice into this system, the way that this platform was designed to be used is really, really cool and can give you the most natural sounding voice that I've ever heard from a text to speech platform. And if you don't want to train your own voice, they have a lot of other great voices in here as we discovered in a previous video. For example, here's what Josh sounds like. As I started building it, I began sharing the results I was getting on Twitter and Facebook and people were loving the behind the scenes look at what I was doing. And for reference, here's a female voice. As I started building it, I began sharing the results I was getting on Twitter and Facebook and people were loving the behind the scenes look at what I was doing. For some reason that sounded like Aubrey Plaza to me. So 11 Labs is a winner in my book and hopefully there aren't too many lawsuits or problems ahead down the road for them. Now just to close the loop on this product, if you're curious, they do have a subscription. The free forever plan gives you 10,000 characters per month, which is really, really generous. You can see all of these examples that I just gave you only used about 1200 characters. Now in order to use these voices, you do have to give attributions and give credit to 11 Labs for creating it if you want to use the free platform. However, you do not have to give attribution to 11 Labs on a paid account.

Conclusion

And if you're curious, here's the pricing free forever, 22 bucks a month for 100,000 characters or 100 bucks a month for 500,000 characters.

So not only is the text to speech some of the best I've come across, it's actually some of the most reasonably priced. Anyway, let me know your thoughts on all of this. I know there's a lot of ethical concerns and I know there's probably going to be some legal battles ahead with technology like this, but I really, really hope they get it figured out and figure out ways that people can use this platform more ethically because I really love the voice generations that come out of this. And if there was any one text to speech tool that I'm probably going to play around with much, much more myself, it's 11 Labs.

Future Tools

It's really that good. And if you want to keep your finger on the pulse of more really cool technology coming out in the AI space, make sure you check out future tools.io. This is where I curate all of the really cool tools that I come across. And I add something like an average of 20 new tools a day. Really, really cool stuff. You can find it at future tools.io. Thanks so much for tuning in. I hope you enjoyed this one. See you guys in the next one.