The BEST LOCAL AI Voice Cloning TTS Pipeline - Tortoise TTS + RVC

  Рет қаралды 57,069

Jarods Journey

Jarods Journey

Жыл бұрын

Links referenced in the video:
ZERO-code Tortoise TTS installation - • Local AI Voice Cloning...
Tortoise TTS Playlist - • Tortoise TTS
RVC Playlist - • RVC (Retrieval-based V...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/JarodMica
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoffee.com/jarodsjo...

Пікірлер: 125
@SyntheticVoices
@SyntheticVoices Жыл бұрын
One click pipeline would be awesome
@daniellewis6228
@daniellewis6228 Жыл бұрын
Jarod I just found this channel and you are going to blow up man you earned it. Thank you for existing.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Appreciate it 🙏🙏!
@WillyFlowerz
@WillyFlowerz Жыл бұрын
Excellent video man, can't wait for the 1-click pipeline that would be awesome !
@isaac10231
@isaac10231 Жыл бұрын
This is so funny I saw every ajatt channel in the 2-3 years ago except yours.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
LOL, I never went full AJATT. I think it's a good concept to get fluent quick, but I never dived in hard 😂
@JohnSmith-vs6yy
@JohnSmith-vs6yy 11 ай бұрын
This needs to be updated now. TorToiSE-TTS-Fast has gained the 5x GPU performance lift in training that 11Labs uses. 11Labs is still REALLY fast for training. Did they implement some of that "5-second voice clone source sample" training technology that was talked about online before? Also, what's the deal with the current model generation creating such noisy, muted output? UVR is okay, but it would probably be of benefit to add an EQ filter pass to the output to enhance it. And what about higher sample rates for output? Can we resample it to CD-quality 44.1KHz?
@user-ob4cj9dq8w
@user-ob4cj9dq8w Жыл бұрын
This is really fantastic.
@nottomayo
@nottomayo Жыл бұрын
your a legend mate
@benboo8895
@benboo8895 11 ай бұрын
Hey jarod, thats amazing... crazy to see how good the ai voice can sound like with a proper technique. How is the process goingt to merge the worksteps? keep up the great video quality. you´re quite helpfull! :)
@Djsong4u
@Djsong4u Жыл бұрын
That's crazy good thank you.
@RamInMinecraft
@RamInMinecraft Жыл бұрын
Thanks man everything works.
@MaisnerProductions
@MaisnerProductions 11 ай бұрын
super good info!
@Artholos
@Artholos Жыл бұрын
This is really fantastic. The one thing that would make this whole system truly out of this world would be to be able to use multiple GPUs on each step. Good sir Jarod, do you know if it’s possible or how to generate on tortoise with multiple GPUs? I’m searching for this answer but I’ve yet to find it😅
@user-dr7fi1xn8k
@user-dr7fi1xn8k 7 ай бұрын
Thanks for this idea Jarod! i have a issue though. when i generate clips from tortoise tts mrq, then use those clips inside RVC, RVC changes the voice a tiny bit.. how can i make sure RVC doesn't change the way words are said? I am using the same voice from tortoise and in RVC.
@dzhan84
@dzhan84 Жыл бұрын
Great content, thanks. Could you please post a link to a video about the audio book creator you reffered to in the beginning of your video. I could'nt find it on your channel.
@Qareio
@Qareio Жыл бұрын
what does the new setting in the ai voice changer mean? its under input and output and its called monitor
@ced.studios
@ced.studios Жыл бұрын
you can use tortoise trained models interchangeably with RVC?
@scottmurray2776
@scottmurray2776 10 ай бұрын
I've finally gotten everything working but have noticed when it slices the audio for training it cuts it into 57 small files so it only trains from a small portion of the full audio. Also no matter how long I train for, when I finally do a text-to-speech generation it never really sounds anything like the device and either has an American accent or a really posh sounding southern-UK one. (I'm from northern england).
@SumriseHD
@SumriseHD Жыл бұрын
Hey! Are there alternatives to RVC? Which is the best for TTS?
@hottyblah123
@hottyblah123 Жыл бұрын
I set up the program the right way and it worked fine for awhile but then I got the lastest update to 3.8 version and the audio won't play threw any of the applications but I can still here it otherwise?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Not sure, could be a package dependency issues where youl have to reinstall many things. You could get away with deleting somethings and then retrying the installations
@Airbender131090
@Airbender131090 Жыл бұрын
How did you clone your voice?! I was training with 8 hours voice sample for 12 hours and got horrible resaults…
@GraveUypo
@GraveUypo Жыл бұрын
i've been playing around with RVC and this is exactly the sort of thing i was thinking of trying. well, first i wanted to see what it would do with audio from my melodica, THEN this. anyway, i don't want to keep paying elevenlabs with their unfair character resets after the end of the month and total lack of control. ideally i would want to try this with bark instead, since bark gives way better control than tortoise. but i'm just too mentally drained from work and all sorts of projects i have going (which includes learning to read japanese which so far has been more fun and way easier than i thought it would be, making a visual novel -> which is why i need the AI voices, and fixing my broken 3d printer) to try to get bark and tortoise to work. i know it won't be difficult once i try, but the activation energy is more than i have right now.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Lol, wish you the best on all of those. I'm looking into bark, so hopefully there'll be something out on my channel when you regain your energy
@GraveUypo
@GraveUypo Жыл бұрын
@@Jarods_Journey ive tried it. the raw tortoise voice was very dissappointing, but a rvc pass COMPLETELY fixed it. from my short testing its on the level of eleven labs, or at least good enougbt that i dont mind the difference (i actually prefer my local voice, since i have more control and it doesnt sound like its always on a hurry). good stuff
@reijin999
@reijin999 8 ай бұрын
is there a way to do the reverse? using RVC models as voices for tortoise?
@mik3lang3lo
@mik3lang3lo Жыл бұрын
I will give Tortoise a chance, thank you
@stevecato
@stevecato 10 ай бұрын
Definitely a pipleline would be good - click no - would like to be able to run this from other software for complete automation. Would also be nice to be able to handle segmenting so you could give it a long text input file and come out with the spoken version. Thanks.
@coexist675
@coexist675 Жыл бұрын
wanted to ask if you could maybe upload the voice data of some vtubers so i can have more in my voicechanger
@BTMYYY
@BTMYYY 10 ай бұрын
i already have an rvc model of my friend and now i want to use tortoise
@zafkieldarknesAnimation
@zafkieldarknesAnimation Жыл бұрын
Hello help me error (When start Training get an Error: (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte) If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling scheduler.step() at the wrong time.
@88RunnerBlade
@88RunnerBlade 11 ай бұрын
Hi Jarod. Thank you for this. I have one question in TTS when I generate the voice and press play to hear it inside of TTS it sounds good but then when I download it and try to play it, it sounds much worse with a lot of static. Why is this happening?
@Jarods_Journey
@Jarods_Journey 11 ай бұрын
Not sure, this has never happened in my case
@TheDailyMemesShow
@TheDailyMemesShow 11 ай бұрын
Hey there, Jarod! Back again stalking your comment section👋😊 I was wondering whether this solution has any good shot by running it in the cloud? Thanks! P.S. your channel is going to blow up very soon 💪💯
@user-ul8tr4ko6j
@user-ul8tr4ko6j 11 ай бұрын
Thanks for the amazing vids! What are your thought on YourTTS? and compared to Tortoise TTS?
@Jarods_Journey
@Jarods_Journey 11 ай бұрын
I've heard YourTTS and I still think Tortoise TTS is better over it.
@KenDoStudios
@KenDoStudios 11 ай бұрын
could you teach ups how to install Tortoise TTS + RVC so it looks like yours
@DaveK-by3wq
@DaveK-by3wq Жыл бұрын
If i trained a model in tortoise, can i use it in rvc?
@aniversext
@aniversext 5 ай бұрын
Can i do this for any language?
@sukhpalsukh3511
@sukhpalsukh3511 Жыл бұрын
Great
@chaks2432
@chaks2432 11 ай бұрын
Is there a website similar to civitAI but for voice cloning models?
@sownheard
@sownheard Жыл бұрын
How do I use the .pht files that I have downloaded for Tortoise TTS + RVC? Where should I place them so that the voices can be loaded in? I found the solution: The .pht needs to be in a map with at least audio file before you can run (Re)Compute Voice Latents and generate audio. File Location = C:\YourPathLocation\ai-voice-cloning\voices
@emperorjustinianIII4403
@emperorjustinianIII4403 Жыл бұрын
I don't understand it, I placed my .pht in that path but it can't seem to find it in the Gui voices
@hypukpetersberg9068
@hypukpetersberg9068 Жыл бұрын
Hello, really thanks for your guides! Can you please say what is the name of a microphone you use?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Behringer xm8500 with xenyx 802 sound board :)
@lemonaide6101
@lemonaide6101 Жыл бұрын
I kinda hoped there's also a version for google colab, bc i cant really run it 😅
@speedeespeedboi9527
@speedeespeedboi9527 7 ай бұрын
is there a community for tortoise tts? i need to ask question
@HASHIM8ALHASHIM
@HASHIM8ALHASHIM Жыл бұрын
Hello Mr Jarods Can you please tell me How to make dubbed video for other language with voice cloning and TTS with simulation feelings and emotions for example 3 mins of anime or movie ❤❤❤ Thank you
@Poiyo
@Poiyo Жыл бұрын
I once listened to Ina (from Hololive) reading the Spice and Wolf Light novel, she didn't do the entire series and I really wish she did. With this kind of thing, that will be possible- albeit with a lack of emotion.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Opens up a whole new world tbh, I see fan made content getting a boost in the fututre.
@JesseJuup
@JesseJuup Жыл бұрын
lack of emotion, for now... in 1-2 years the emotion will be there.
@Not_criptikl
@Not_criptikl Жыл бұрын
How to do I get the voice changer todo work on discord
@MolobOfficial
@MolobOfficial Жыл бұрын
I have problems connecting my sony Wh-100XM4
@CarrloX
@CarrloX Жыл бұрын
Only work with voices in english?
@trilogen
@trilogen 22 күн бұрын
Can Tortoise tts be used for commercial purposes i.e. in KZfaq videos, Tiktok etc?
@chunchunmaru1688
@chunchunmaru1688 Жыл бұрын
Hey! I appreaciate all your videos. I was wondering if you would make a tutorial on how to make any AI model to sing songs?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
I think this is already done with the RVC one lol
@chunchunmaru1688
@chunchunmaru1688 Жыл бұрын
@@Jarods_Journey Isnt the RVC only for talking? I meant like having a popular song sang by the AI itself
@Jarods_Journey
@Jarods_Journey Жыл бұрын
@@chunchunmaru1688 Not yet. You still need to manually do that like with vocaloid. AI still can't yet "sing" on its own
@Bigjuergo
@Bigjuergo 7 ай бұрын
Did you finish the one click pipline?
@ezrachua1317
@ezrachua1317 11 ай бұрын
if theres a comfy ui style pipeline for this that would be cool
@user-rr4pj4jh6d
@user-rr4pj4jh6d Жыл бұрын
Can I use it on Windows 11?
@endersteve27
@endersteve27 Жыл бұрын
Can you also teach us how to remove it?
@hdhdhvdjgdjjdbjdb5541
@hdhdhvdjgdjjdbjdb5541 Жыл бұрын
Is i5 12500h with rtx 3050 enough? Im new to a.i and really captivated learning a.i
@Jarods_Journey
@Jarods_Journey Жыл бұрын
You should be able to start with that, but there will be a lot of things you run out of memory for due to 4gb of vram. These can be resolved with smaller files, but you may have to upgrade in the future to get more eventually
@daryladhityahenry
@daryladhityahenry 8 ай бұрын
Okay.. I do watch so many of your videos, I have 1 question though.. I really like to run it on my local computer instead of google colab if possible. Is RTX 2070 Super is enough for running all of this? I don't mind waiting 24 hours of training etc. Is it possible? Or it is not enough? Thank you for sharing these..
@Jarods_Journey
@Jarods_Journey 8 ай бұрын
Yup, a 2070 super is enough, I believe that one has 8gb of vram
@daryladhityahenry
@daryladhityahenry 8 ай бұрын
@@Jarods_Journey Wogh! Nice! Thanks :D:D
@PenguinjitsuX
@PenguinjitsuX 6 ай бұрын
OK, this was hilarious because even after you switched back to your real voice, I thought it was the AI voice still LOL. I then tried it in reverse, and the AI voice sounded like your real voice to me. (funny how I get tricked just by the order)
@Jarods_Journey
@Jarods_Journey 6 ай бұрын
😂 that's some human psychology for you right there
@daffertube
@daffertube Жыл бұрын
Awesome tutorial! Thanks for spending $5 lol
@XXXIVth
@XXXIVth Жыл бұрын
I’ve fallen down a rabbit hole, catching up but it’s a lot to take in, i’m going to try all you’ve taught, even then though… once you’ve come up with your “one click pipeline”, assuming it’ll be as if you’re selling an easy button to all this, you have a buyer here, and i’m sure you’ll make a sh** ton
@jonidimo
@jonidimo Жыл бұрын
There is any multilanguage opensource model?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
This would be it, but requires a bit of tinkering with the tokenizer to use it for other languages
@ayron419
@ayron419 10 ай бұрын
Is there any way to load in pre trained models for tortoise? I haven't been getting very favorable training results that I'm working on improving, hut am curious if this is possible in the meantime
@Jarods_Journey
@Jarods_Journey 10 ай бұрын
If you have someone elses .pth models, you can use those instead for the autoregressive decoder model
@ayron419
@ayron419 10 ай бұрын
@@Jarods_Journey thanks! Are you aware of any sites for these? I haven't been able to find any
@Andalusic
@Andalusic 9 ай бұрын
@@ayron419lmk if you find
@LEGENDSNEVERDIE720
@LEGENDSNEVERDIE720 Жыл бұрын
can you make a colab tutorial aswell?
@sergialbert97
@sergialbert97 10 ай бұрын
How good does it perform with multilingual trasnformations?
@Jarods_Journey
@Jarods_Journey 10 ай бұрын
It can be done in tortoise, but I've only gotten a successful English speaking voice as of rn
@daniellewis6228
@daniellewis6228 Жыл бұрын
does anyone know any good colabs for tortoise tts?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
The mrq repository has a ipynb file that can be used on colab, but I haven't tried to get it working or running unfortunately
@heerarodriguez9563
@heerarodriguez9563 Жыл бұрын
Is it possible yo use RVC models on Tortoise for TTS? If not, any good ppaces to download TTS models?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Not possible ATM to my knowledge, not sure of places to download
@heerarodriguez9563
@heerarodriguez9563 Жыл бұрын
@@Jarods_Journey thank you! Keep putting out that good content bro. Only one who goes through all these programs and even host your own vtuber.
@Zielloss
@Zielloss Жыл бұрын
@@Jarods_Journey So do you make all your models in Tortoise? It seems a lot slower for training than RVC. I've been trying to figure out how to use my RVC models on tortoise lol.
@omnipresence8089
@omnipresence8089 Жыл бұрын
You should try 11 labs with the v2 model. The similarity is a LOT better. I can actually clone anime English dub voices which I couldn’t do before with v1. Tortoise never really worked for me but you seem to have figured it out. RVC is great. I have made some funny covers with it
@jjjbeastfd
@jjjbeastfd Жыл бұрын
but 11 labs is not free
@kftnight6598
@kftnight6598 Жыл бұрын
Tortoise is very flaky and the audio quality is not as good. But yea it’s free and I think this combination of tortoise and rvc is a very good alternative
@jjjbeastfd
@jjjbeastfd Жыл бұрын
@@kftnight6598 how long does it take to generate and what’s the word limit per generation
@_nom_
@_nom_ Жыл бұрын
Costs too much. We need to do thousands of lines.
@Kemahudson
@Kemahudson 8 ай бұрын
So I have yet to get RVC to accept any of my trained voices from Tortoise TTS. They work in Tortoise just fine, but when I try to load them into RVC I get attrobuteerror's. I have tried both RVC1 and 2. :/ Also.. Love your videos! They have been a big help in my AI journey!
@Jarods_Journey
@Jarods_Journey 8 ай бұрын
RVC doesn't take Tortoise models (voices), you need to train an RVC model for that
@Kemahudson
@Kemahudson 8 ай бұрын
@@Jarods_Journey Thanks for the reply! Yeah I figured that out eventually lol. I am still learning. Ended up training the voice in RVC. Keep making videos btw. I am not sure if I would even be as far along as I am without your guides and info.
@Kemahudson
@Kemahudson 8 ай бұрын
@@Jarods_JourneyI was mainly just confused since I assumed you were using the same model of yourself in both Tortoise, and in RVC. Once I figured out that this wasn't possible, I realized that I had to train my model in RVC as well as the one I trained for Tortoise.
@FenrirRobu
@FenrirRobu Жыл бұрын
I'll try to connect Tortoise to RVC.
@user-vg8fz3ux9f
@user-vg8fz3ux9f Жыл бұрын
It has a major issue, which is the inability to fine-tune languages other than English. I've tried to address this problem by replacing the pre-trained model and tokenizer, but it resulted in various errors. Finally, I came across the author's statement that the model is trained specifically for English, and if we want to use models for other languages, we would need to retrain them within this software, which presents significant challenges in terms of data quantity and time requirements.
@Blacktacmaus
@Blacktacmaus Жыл бұрын
Please help me i get an error that says torchaudio not found
@Jarods_Journey
@Jarods_Journey Жыл бұрын
You need to uninstall and reinstall torch
@Blacktacmaus
@Blacktacmaus Жыл бұрын
@@Jarods_Journey thanks
@Segaco4
@Segaco4 11 ай бұрын
I'm a little confused, is RVC a TTS thing?
@danzirvine
@danzirvine 10 ай бұрын
no, RVC is a Voice cloning modelling system. it can not produce audio by itself but instead needs another source to infer itself upon; like a face filter on Snapchat. so the TTS here is generating the base line audio for RVC to then act as a filter over.
@axerawr
@axerawr Жыл бұрын
WTF, Marine Senpai speaks English !
@adnansukruozturk9717
@adnansukruozturk9717 Жыл бұрын
where's the link?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Description
@adnansukruozturk9717
@adnansukruozturk9717 Жыл бұрын
@@Jarods_Journey i couldn't see
@adnansukruozturk9717
@adnansukruozturk9717 Жыл бұрын
@@Jarods_Journey can you send on chat?
@adnansukruozturk9717
@adnansukruozturk9717 Жыл бұрын
@@Jarods_Journey brother there's no link in description. will you send pls?
@monsterking7676
@monsterking7676 Жыл бұрын
...if it can't properly say AI or other words, just phonetisize it with word's/spellings that would create the word phonetically. "Ay Eye."
@pace_cruzer
@pace_cruzer Жыл бұрын
Can we make Hindi language?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Yes, but you need a custom tokenizer
@Afflictionability
@Afflictionability Жыл бұрын
first!
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Very fast xD!
@Afflictionability
@Afflictionability Жыл бұрын
@@Jarods_Journey these are coder hours
@MaikeNoShinSeikatsu
@MaikeNoShinSeikatsu Жыл бұрын
17 Seconds after upload :P But still too slow for first comment @@Afflictionability
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Aye second ain't bad! Appreciate y'all :D!
@user-yr9qr3dn9o
@user-yr9qr3dn9o Жыл бұрын
your videos are long and say nothing. same content in 3 mins other videos
@IronKnee963
@IronKnee963 11 ай бұрын
Awesome video, thank you. I'm wondering is there a website were you can download trained voice models already? I only found one on huggingface.
@Jarods_Journey
@Jarods_Journey 11 ай бұрын
Hugging face is the only one I know of, but not much is being shared in terms of trained tortoise models atm. Mainly just RVC stuff
@IronKnee963
@IronKnee963 11 ай бұрын
@@Jarods_Journey Gotcha, I thought I was just too dumb to find anything. Thx for the fast response!
@sosososo4348
@sosososo4348 Жыл бұрын
RVC Please answer my question, do you have Telegram or Facebook to communicate with you? I trained an audio clip and reduced it to 10 seconds and trained it. Do you know how long it takes for my laptop Lenovo core i5 10th generation RAM 12 and graphics card amd Does it succeed in training Please answer my question
@sosososo4348
@sosososo4348 Жыл бұрын
Please answer my question, do you have Telegram or Facebook to communicate with you? I trained an audio clip and reduced it to 10 seconds and trained it. Do you know how long it takes for my laptop Lenovo core i5 10th generation RAM 12 and graphics card amd Does it succeed in training Please answer my question
Can A Seed Grow In Your Nose? 🤔
00:33
Zack D. Films
Рет қаралды 27 МЛН
How to Clone Most Languages Using Tortoise TTS - AI Voice Cloning
29:40
FREE Text to Speech with YOUR Voice with Applio!
18:23
Bob Doyle Media
Рет қаралды 26 М.
My Top 5 Open Source Text to Speech Softwares Starting off in 2024
8:37
The Clever Way to Count Tanks - Numberphile
16:45
Numberphile
Рет қаралды 541 М.
Free Speech: Reviewing Coqui-ai, Mycroft Mimic3 and Tortoise TTS Libraries
14:23
Voice Cloning In Multiple Languages - Open Source
16:49
Prompt Engineering
Рет қаралды 82 М.
How To Clone ANY Voice In Under 5 MIN w/ Eleven Labs AI
14:09
The Joe Rogan AI Experience
Рет қаралды 16 М.
Лучший браузер!
0:27
Honey Montana
Рет қаралды 1,1 МЛН
Todos os modelos de smartphone
0:20
Spider Slack
Рет қаралды 65 МЛН
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 8 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 18 МЛН
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 63 МЛН