FREE Voice Cloning in Microsoft Windows with Coqui TTS

  Рет қаралды 37,234

Thorsten-Voice

Thorsten-Voice

Күн бұрын

Step by step tutorial on how to clone your voice for free using Coqui TTS in Microsoft Windows. All of this locally without internet connection or cloud services.
Please subscribe to my channel 😊.
kzfaq.info...
Useful links:
---
* Dependeny Python
www.python.org/downloads/rele...
* Dependency eSpeak-ng
github.com/espeak-ng/espeak-n...
* Dependency Microsoft Build Tools
visualstudio.microsoft.com/en...
* Dependency PyTorch (for CUDA support)
pytorch.org/get-started/locally/
* Great tutorial on PyTorch & CUDA by ‪@CloudCastsAlanSmith‬
• PyTorch & CUDA Setup -...
* Complete process (inc. recording setup and LJspeech creation)
• Create your own Text t...
* Coqui TTS
github.com/coqui-ai/TTS
* Coqui TTS documentation
tts.readthedocs.io/en/latest/
* The used "Thorsten" recipe
github.com/thorstenMueller/Th...
* Based on this recipe
github.com/coqui-ai/TTS/blob/...
* Coqui TTS workaround for Windows "freeze" support
github.com/coqui-ai/TTS/issue...
* Introduction to Coqui TTS configuration handling
• Coqui TTS model traini...
#tts #voice #machinelearning #python #tutorial #windows #free #privacy #artificialintelligence
Chapters:
---
00:00 Intro
02:00 Dependency Python
03:35 Dependency eSpeak-ng
04:06 Dependency Microsoft Build Tools
04:38 Verify dependency installation
05:50 Create Python virtual environment
07:10 PyTorch for CUDA support
08:30 Install Coqui TTS
10:30 Start TTS Model (Voice Clone) training
12:02 Closer Look to training recipe (config)
17:38 Windows special for recipe
19:05 Running tensorboard
21:04 Showing complete configuration
21:50 Audio samples in tensorboard
23:10 Synthesize voice from TTS training
27:45 Testaudio after 2 hours training
========================
To support the channel please subscribe and give videos a thumb up (👍🏽).
========================
---
- www.Thorsten-Voice.de
- github.com/thorstenMueller/Th...

Пікірлер: 239
@john_blues
@john_blues Жыл бұрын
Yay! I've been waiting on this one. Thank you so much.
@connordissident6881
@connordissident6881 Жыл бұрын
Thanks for listening to us and making this video!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're welcome. I'm always happy for feedback and suggestions from my community and try to make right content for you 😊.
@scndsky
@scndsky Жыл бұрын
Great help for figuring out all these little details you just have to know somehow. Tnx!
@guilherme1556
@guilherme1556 Жыл бұрын
That's great you brought this tutorial for the windows community. I personally use linux to train my models, but it's awesome you are making an effort to make the windows open voice community stronger.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Yes, personally i use Linux for training, too. But model training on Windows has been requested quite often.
@ThorstenMueller
@ThorstenMueller 6 ай бұрын
​@@user-wc2jy4jr7r Not sure if i got you right. Do you mean "SAPI" in context of Windows integrated TTS voices?
@manuelherrerahipnotista8586
@manuelherrerahipnotista8586 Жыл бұрын
Really good video man. Well explained and researched. Thanks a lot
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback. I'm happy that you liked it 😊.
@christopherwoods3339
@christopherwoods3339 Жыл бұрын
Thank you very much for your videos. I almst never subscribe but I was so thankful for these that I've been liking every one and I did subscribe. :)
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Wow, that's probably one of the best feedback i received for my work on these videos 🤩.
@toykotokyoto
@toykotokyoto Жыл бұрын
nice! giving Windows some love :D
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks Josh, at least a little bit 😁.
@jonnypawan4650
@jonnypawan4650 Жыл бұрын
Great and Unique Videos Always, Thank you for your time and efforts.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thank you so much. Feedback like yours always keeps me motivated ☺️.
@CezarPopescu
@CezarPopescu 10 ай бұрын
Thanks for sharing, Thorsten! Got yourself a new subscriber (y)
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
Thank you and welcome 🤗.
@prakharpaw-de7vh
@prakharpaw-de7vh Жыл бұрын
Thank you so much for this video, really helpful!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thank you for your nice feedback 😊.
@hangtime79
@hangtime79 Жыл бұрын
Came here looking some information on Coqui as I'm looking to do a voice clone for voice over work. Fantastic job.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Great feedback like yours always keeps me motivated - thank you 😊.
@user-ez4so5nj4i
@user-ez4so5nj4i Жыл бұрын
Thank you so much!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thank you for this really nice feedback. Feedback like yours keeps me motivated 😊.
@Vito_0912
@Vito_0912 Жыл бұрын
Thank you for this tutorial and your entire audio series. I once started with Turtoise, which was too slow for me. Then I found coqui and your public voice model, which is also really good and understandable and with the factor 0.41 is also super fast for me. For my use case, however, still too funny pronunciations of proper names. Through this video I could finally create my own voice model that is completely adapted to the requirements of telling stories. It still sounds a bit shaky here and there and has just 100k steps (with increasing audio material), but is already on the way to improvement. Due to recording conditions and my unfortunately not so great narrator voice. I even come to a loss of 26-36%. So here can still be properly readjusted. For all who are interested in the Sats, if they also want to do something like that: Specs: RTX 2070, I7-10900k, Samsung Evo 970 Steptime: 0.5-0.6 Batchsize (you can go higher): 20 Checkpoint_steps: 1000 (just because i am lazy and train it in the middle of some idle periods, school work etc., so i don't have to wait for 10000) Audio dataset: Specs: HyperX idk (the rgb one) with pop filter, relative big room Here I can't make a statement like this and if you start with the "Total" you will get faster results. I trained in steps with increasing audio files: 0-5k: about 230 files ~ 0.4h 5-10k: about 350 files ~ 0.6h 10-30k: about 500 files ~ 1h 30-60k: about 800 files ~ 1.6h 60k-100k: about 1200 files ~ 2h Current total: 1200 ~ 2h Milestones: from 10k: First beginnings to understand not only noise but (not understandable) from 20k: First word recognizable without knowing text from 30-40k: Understandable text (but not nearly speech) from 80k: It's okay :) * Please note, however, that I used as input books and book excerpts with many proper names and denglish (German with some English words in books). This makes the training process slower in any case and generally worse (but in the trained areas, proper names, very good). Recording: For the recordings I wrote a Python script that automatically splits the text of a text file into sentences (ignoring sentences below 5 words) and outputs them. Then the recording was automatically started and stopped as soon as one second the sound was below 50DB. Then this audio was trimmed so that front and back everything is dropped (below 50DB to garantee a instant speech) and filled with 50ms silence. Then nomalized and saved in ljspeech format. Delete function included
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for sharing your great setup and training step times 👏😊. This will help other users for sure. I agree that pronouncing foreign words is still a challenge.
@bifrostbeberast3246
@bifrostbeberast3246 11 ай бұрын
Hi, sounds great! I want to train a anime voice from a particular voice actress but am struggling to get more than 20 minutes of clean audio. Do you have any suggestions for me? I need a child voice in English for my use-case (Virtual Assitant with visual representation as a anime character). I am thankful for any help and advice! I have seen ads where they promise good results with just a few seconds of Audio. I wonder how that is possible, when we actually need many hours of audio.
@loiclacaille8683
@loiclacaille8683 8 ай бұрын
Your content is amazing, really useful. Thx.
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
Thanks a lot for your nice feedback 😊. I'm always happy to hear if people find my content helpful.
@MrArdo-branch-main
@MrArdo-branch-main Жыл бұрын
this very well done explained.. Thank you Thorston-Voice this video helps me to continue my hobby and research.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thank you. Nice feedback like yours always keeps me motivated to continue this journey ☺️.
@nestboxcam-Surabaya
@nestboxcam-Surabaya Жыл бұрын
Thank you for this
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're welcome 😊. I hope it's been helpful for you.
@seansean995
@seansean995 7 ай бұрын
i subscibed 1st video great teacher!!!!!!!!
@ThorstenMueller
@ThorstenMueller 7 ай бұрын
Thanks a lot for your very nice feedback - and welcome 😊
@belalgaber555
@belalgaber555 8 ай бұрын
I love your knowledge man
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
Thank you so much 😊
@anthonyschilling7132
@anthonyschilling7132 Жыл бұрын
I spent ages trying to get this to work and finally ended up installing wsl which made the setup work. You should make a video on how to create your own dataset for training! Liebe Grüße aus den USA!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
So now you have another way to train a TTS model in addition to wsl. Hope you enjoyed this video 😊. I've created a tutorial on recording and creating a voice dataset here: kzfaq.info/get/bejne/ar-Ea7qLucXcZGw.html
@anthonyschilling7132
@anthonyschilling7132 Жыл бұрын
@@ThorstenMueller Ah very cool, I'll have to give that a shot. I've been using openAi's Whisper to transcribe audio I downloaded from youtube videos and podcasts and it's getting close. But I think I need to do a better job cleaning up and organizing the audio I download. Any suggestions for how how large the dataset should be when using vits? I've been using about 1-3 hours of clips and it's starting to sound ok...but I'm guessing I just need more and cleaner data. Thanks again!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@anthonyschilling7132 My voice datasets are way longer - at least 10k recordings, meaning > 10 hours of pure audio. But more important might be a good phoneme coverage.
@omarharbah6972
@omarharbah6972 7 ай бұрын
A lot of thanks man !
@ThorstenMueller
@ThorstenMueller 6 ай бұрын
You're very welcome 😊.
@masamiakita993
@masamiakita993 Жыл бұрын
Thanks a lot!!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're very welcome 😊.
@o_ortcloud
@o_ortcloud Жыл бұрын
Nice thank youu
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're very welcome 😊.
@user-kz1hh3jz9t
@user-kz1hh3jz9t Жыл бұрын
thank you for your video , it's great worker
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're very welcome. Happy it's helpful for you 😊.
@bifrostbeberast3246
@bifrostbeberast3246 11 ай бұрын
Vielen Dank für deine Videos, Thorsten! Du hast meinen Subscribe sicher! Liebe Grüße aus Taiwan :)
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
Vielen lieben Dank für deinen netten Kommentar und dein Abo 😊. Liebe Grüße zurück nach Taiwan 👋.
@devinhedge
@devinhedge Жыл бұрын
I love this if for no other reason it helps me learn German dialects.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
So, i'm your reference for a german dialect? 😆👍
@der-putz
@der-putz Жыл бұрын
Mal wieder klasse Video. Gibt es ein ATI Äquivalent?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Vielen Dank für das nette Kompliment 😊. Mit ATI Grafikkarten habe ich in diesem Zusammenhang keine Erfahrung. CUDA ist primär auf NVIDIA Karten ausgelegt. Es gibt/gab wohl ein altes Projekt namens "gpuocelot" was in diesem Bereich unterstützen wollte. Aber da kann ich Dir nicht wirklich weiterhelfen.
@peethaer
@peethaer Жыл бұрын
Du bist mein Held.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Soweit würde ich wohl nicht gehen 😉. Aber ich freue mich sehr über dieses mehr als nette Feedback 😊.
@phen-themoogle7651
@phen-themoogle7651 Жыл бұрын
I subscribed although I could only watch for a few mins because of some health problems I’m having nowadays. If possible I Would like a cool tutorial or explanation on ways to do this without downloading anything new to my computer or going through a long process, like maybe if it’s possible to do this 100% online then that would be awesome! Since technology is improving so fast nowadays I’m sure there’s some sites that have to exist where we can do this online right..
@ThorstenMueller
@ThorstenMueller Жыл бұрын
First of all, i hope you get well soon 😊. Thanks for subscribing and i agree, right now the process is not a simple 1-2-3 process, but voice cloning is getting better and for english voices it might be possible (in near future) to clone your voice easier. Not sure how perfect the cloned voice will be with a simple process, but we'll see.
@phen-themoogle7651
@phen-themoogle7651 Жыл бұрын
@@ThorstenMueller Thanks! I'm fluent in Japanese, and looking forward to doing this in Japanese sometime too.
@shazams461
@shazams461 Жыл бұрын
Okay 👍🏻👍🏻
@Supratim-jc9kz
@Supratim-jc9kz Жыл бұрын
Thanks for the video. Also can you make a video on how to run tortoise tts locally on your computer.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your comment 🙂. I've TorToiSe TTS already on my TODO list.
@Supratim-jc9kz
@Supratim-jc9kz Жыл бұрын
@@ThorstenMueller tyvm
@user-qj1br8ze7x
@user-qj1br8ze7x Жыл бұрын
Thanks for the tutorial. Its really helpful. Can you also make a tutorial on how can we make use of coqui TTS service to fine-tune yourTTS for low resource language with better quality. That would be really helpful. Thanks and keep inspiring :)
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback. So you mean a model that is fast enough for e.g. a Raspberry Pi but with a high quality?
@user-qj1br8ze7x
@user-qj1br8ze7x Жыл бұрын
@@ThorstenMueller With low resource language I mean Hindi, Korean, Arabic etc
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@user-qj1br8ze7x Okay, sorry did get that wrong 🤦‍♂. Not sure on that. Maybe you can get a good answer when asking this good/important question on Coqui TTS community.
@TheCeratius
@TheCeratius Жыл бұрын
Hi Thorsten, thanks for this awesome tutorial which worked perfectly on my machine. However, I trained my model and it's great but not perfect. Is there an option to continue training with this model instead of training a new one (which would take ages just to get to the point where i am now)? I am relatively new to python, so I am not sure if I just have to modify the training script a little or if there is a command somewhere which does this, or if it's just not possible. If you could give me a pointer that would be great!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback 😊. You're looking for restore_path and/or continue_path. I've made a special video tutorial on continuing a TTS model training from a previous step checkpoint. kzfaq.info/get/bejne/hZx7q62Dnpu7oHk.html
@TheCeratius
@TheCeratius Жыл бұрын
@@ThorstenMueller wow, i didn't see that. Sorry about that and thanks a lot for the quick reply and help!
@vadzimyesman7693
@vadzimyesman7693 11 ай бұрын
Great tutorial! Thank you for all the details! I have a question though about the training process and dataset. I used 102 samples for my dataset. In order to record them I used Audacity with default recordings settings (mono, 44100 Hz, 32-bit float). For the recipe file, I used the one you show in your video (named something like a "youtube recipe"). After 1000 Epochs I checked the results by synthesizing some words and sentences using tts-server. It was sounding very slow, not normal. While checking the congif,json file I found out that the sample rate in was set to 22050. After I changed it to 44100 and restarted the tts-sever voice was sounding closer to mine, but the quality is still really bad. Could the fact that all the samples were recorded at 44100 Hz affect the whole training since the default saple_rate in that config.json file is 22050 or it is irrelevant and I just need to train it more? Or do I need to start over using samples recorded with 22050 Hz frequency?
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
Thanks for your nice feedback on the details in my tutorial 😊. I guess that you might not get great results with just 102 recordings. Did the training process run even the samplerate did not match? I'd thought this should abort training process. However just changing the value after the training and just for time of synthesis this will not work. Samplerate in config and wave SR must match before starting training process not matter if 22 or 44k at least config is matching reality 🙃
@vadzimyesman7693
@vadzimyesman7693 11 ай бұрын
@@ThorstenMueller The training process did run even the samplerate did not match, 1000 epochs.
@user-yx8yd3bo2j
@user-yx8yd3bo2j 9 ай бұрын
Hello, thanks so much for the video. I'm in the process of training a custom VITS TTS model using a dataset that I've created. Around the 200,000-step mark, the average loss on my trainEpochstats/avg_loss_1 is creeping up . My dataset is fairly small, approximately 1 hour in length, but it does have good coverage of phonemes. When I tested the audio, it had the correct voice quality but the speech was nonsensical. Should I halt the training to expand my dataset, or is it typical for models to require more training steps to produce meaningful audio?
@ThorstenMueller
@ThorstenMueller 9 ай бұрын
You're welcome 😊. If your dataset is nice phonetically balanced it should produce useable results. My VITS model has been trained (i guess) for 600k steps so there might be room for more training. But maybe you can ask this on the Coqui TTS Github discussion before there are real pros in machine learning. If available add some screenshots on Tensorboard for analysis.
@amaarboss2115
@amaarboss2115 Жыл бұрын
Hello, Mister @Thorsten, I wanted to know how you do the training a thousand times, and yet the sound does not sound clear, but when I use your voice through the tts-server, the sound appears very clear .... How did you train your voice? (which is on the server) and Thank you for this great effort.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your feedback. The training in this video is just for the demo. With 3.000 steps there cannot be a clear voice. My public released models with tts-server have been trained for over 2 month with around 600.000 steps. Does this explanation help you?
@amaarboss2115
@amaarboss2115 Жыл бұрын
@@ThorstenMueller Thank you for this useful information. The picture is now clearer
@anaveragegoogleaccountname
@anaveragegoogleaccountname Жыл бұрын
I would have appreciated you breaking down how the audio samples should be formatted, maybe a bit more explanation of the code and also torch audio does not install along with torch either.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your suggestion. I thought diving to deep into code might be hard to follow, but i'll think on more in detail video - which will be longer though.
@pocketsfullofdynamite
@pocketsfullofdynamite Жыл бұрын
Which graphic card do you use pls. Thanks for the info.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
In this video i've used an NVIDIA GTX 1050 Ti. But for my other models training i use an NVIDIA Jetson Xavier AGX.
@mi16chap
@mi16chap 11 ай бұрын
Hi Thorsten, thanks for putting the video together, when I try run my version of your train_vits_.py script, I get an error saying ModuleNotFoundError: No module named 'TTS.tts.configs.shared_configs - any pointers (I tried to add the project path to my system environment variable, but no luck)
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
Hi, are you in your Python venv? Does "pip list" shows a TTS package?
@MrAngryWh1te
@MrAngryWh1te 10 ай бұрын
Hello! Thanks for the tutorial! Just finished teaching. My bot can't string letters into words at all. I would like to ask you what scale the dataset should be, and is it possible to speed up the training with google collab?
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
You are welcome 🙂. Not sure what you mean by "letters into words"? Do you mean, as example, "TTS" vs. "T T S"? pronunciation? Google colab provides simple GPU power which is far better than CPU, but it disconnects sessions regularly (in the free edition).
@MrAngryWh1te
@MrAngryWh1te 10 ай бұрын
@@ThorstenMueller First, thanks for the reply! I mean my bot can't say a word, it's more like a monster roar (like grr). But at the same time, he can change the tone of speech, using, for example, an exclamation mark. I asked about the dataset in my first comment because I think it's my problem and the quality of my dataset is not high enough.
@justelesnews
@justelesnews Жыл бұрын
Hi, nice video ! Could you tell me what you think of the new arduino for speech recognition ? -> nicla voice
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Personally i've no experience with arduino. You think it's worth to check this topic?
@justelesnews
@justelesnews Жыл бұрын
@@ThorstenMueller I don't know. Arduino says this is the first time that we can recognize voice commands with neural decision processor, ultra low power consumption and very good recognition. I don't know if it's true or not. It's expensive but I think I'll give it a try
@MatyssMatyss
@MatyssMatyss 4 ай бұрын
hello! I just wanted to know hoy many audio files do I need to clon a voice, since i just recorded like 50 wavs files but when I start the trainer the script fails since "there is no sample left"
@ThorstenMueller
@ThorstenMueller 4 ай бұрын
I guess 50 is way too less. I recorded over 10k wave files for my german "Thorsten-Voice" voice clone. Maybe give it a try with 1000 recordings.
@kostas9849
@kostas9849 Жыл бұрын
Hello,i just subscribe to your channel and i have one question:does this work with foreign languages or only english?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thank you for joining my channel 😊. This will work in other languages as well. I've created an earlier video (not Windows specific) with some more detail if that's helpful for you. kzfaq.info/get/bejne/ar-Ea7qLucXcZGw.html
@kostas9849
@kostas9849 Жыл бұрын
@@ThorstenMueller Thank you so much,you are the best!
@zsoltvastagh7023
@zsoltvastagh7023 Жыл бұрын
awesome tutorial, thank you... unfortunately, it keeps getting interrupted with a multiprocess error before the last step, I'm looking for a solution to solve the error. If others have succeeded, and I see in the video that it works for you, maybe it will work for me too. :) Could there be a difference between Windows that could cause this error?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback 😊. Different Windows version might be a reason. Which version do you use? Is there an error message shown?
@jaylee6488
@jaylee6488 Ай бұрын
hello Thorsten: I try to figure it out by myself follow the step, but it doesn't work in some how, can i make appointment with you for about half an hour, so that you can give me some guidance?
@ThorstenMueller
@ThorstenMueller 27 күн бұрын
You can contact me by using my contact form here, but it might take some time until i can respond. www.thorsten-voice.de/en/contact/
@qodeninja
@qodeninja Жыл бұрын
cool video, can you do this with a docker setup, sans windows?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your feedback 🙂. Do you mean training a TTS model using Coqui TTS inside a Docker container?
@qodeninja
@qodeninja Жыл бұрын
@@ThorstenMueller yes, exactly. is that even possible or do you need GPU? I want to be able to use my local NAS for something more than a filestore so I was wondering if this was possible
@qodeninja
@qodeninja 8 ай бұрын
yes please@@ThorstenMueller
@cmyk8964
@cmyk8964 11 ай бұрын
I started training the model, and after 8 hours, only 2 epochs were completed. Is this normal and do I need to complete all 1000?
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
What do you mean by "completed"? Normally the training process runs until you stop it manually. Did training end automatically?
@mukhamejantalap4526
@mukhamejantalap4526 4 ай бұрын
hey, I am trying to train my model on my language(kazakh) by your tutrotial. it's been over 1 day since it training, but I am getting some weird noises of speakers, I didn't see that you change or add any symbols, so did I. Do I need to add alphabet of my language?
@ThorstenMueller
@ThorstenMueller 3 ай бұрын
In general one day is not much time for training a tts model. Do you use phoneme or character based training?
@mukhamejantalap4526
@mukhamejantalap4526 3 ай бұрын
@@ThorstenMueller I've used phoneme based. Well I was thinking maybe at least I will get something. The data was containing over 12k audio samples with a lot of speakers, each speaker has 250 samples. Maybe because of that the feature it didn't match.
@EzmiTV
@EzmiTV 8 ай бұрын
Hi! Everything works fine, thanx! Except that it refuses to handle accented Hungarian characters (éáűőúöüóí). Does it need to be converted somewhere to handle these letters as well? For sentences without an accented character, it is perfect.
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
Do you mean you have problems on training the model with these chars or did training run good and you're having problems synthesizing? Have you trained using phonemes or characters? Maybe you can run this script on your dataset and add any specials chars to your config. github.com/coqui-ai/TTS/blob/dev/TTS/bin/find_unique_chars.py
@EzmiTV
@EzmiTV 8 ай бұрын
@@ThorstenMueller Yes, "abcdefgh..." - ok. "éáőúöüó..." - omits it from the speech. A new config.json is created in the new folder at every start. Where can I add the returned values to the configuration?
@boogeyman8099
@boogeyman8099 Жыл бұрын
How do I fix the freeze issue? I can't find anything about it other than the resource you provided (bug) was 'closed' with the authors comment being 'we don't support windows' when you've clearly done it on windows! I've spent a lot of time on this and would like to figure it, and help would be appreciated.
@boogeyman8099
@boogeyman8099 Жыл бұрын
Nevermind, I didn't get to the part where you explained it!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
😄, good luck :-)
@user-dt9he1br2o
@user-dt9he1br2o Жыл бұрын
Is it possbile to combine two voices? And what sample rate should I use for the dataset?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
What do you mean with "combining two voices"? I've trained my TTS models with 22kHz samplerate.
@o_ortcloud
@o_ortcloud Жыл бұрын
I would like to training new model tts for new language. Is this the same way to to that? Can you give me some advice it please.. it's really help me
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're right. It's working the same way. Maybe you can watch this tutorial showing how to create a voice dataset for your new language model. kzfaq.info/get/bejne/ar-Ea7qLucXcZGw.html
@ThugLife-is1yo
@ThugLife-is1yo Жыл бұрын
confused where exactly did you put your voice file for training ?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're looking for the parameter "dataset_config" in the training recipe file. There you can write the file location to your voice files (in LJSpeech format) for training.
@kostas9849
@kostas9849 Жыл бұрын
I need help!Inside the folder TTS - training there are some archives as you show in the video, how were these archives found there? How do I put it exactly the same in the folder TTS - Training I made?and when i change directory and enter in the TTS - Training folder and type the python command nothing happens.Please could you help me on that? :(
@ThorstenMueller
@ThorstenMueller Жыл бұрын
I'm not sure if i understand your question right. So training process starts and the "output_folder" is created and filled with files. Are you already trying to synthesize voice while training? Are audio samples in Tensorboard available?
@kostas9849
@kostas9849 Жыл бұрын
@@ThorstenMueller I don't know how the output file was created in your video and filled with files.I follow your steps one by one, i installed python,eSpeak-ng,Microsoft Build Tools and when you open the command prompt i really stuck there.I created the directory as you did but in my directory there's not the files that you show in the video.I type the python commands but nothing happened.What i did wrong? :(
@ThorstenMueller
@ThorstenMueller Жыл бұрын
​@@kostas9849 Strange, the output directory with the training_run name and a timestamp for training start date will be created automatically. Did cloning the Coqui TTS repo work and adjusting the recipe?
@michaelb1099
@michaelb1099 Жыл бұрын
great tutorial but i am trying to replace my microsoft voices with my cloned voice is this doable?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback 😊 and great question. I tried this some time ago too, but didn't find an easy solution for this. But if this is interesting in general i might give it a closer look. Most voices seems to come out of their Microsoft Azure cloud services.
@RogueMandoGaming
@RogueMandoGaming 6 ай бұрын
So i'm getting as far as running the "pip install -e ." command before getting errored out with status code 1 something about wheel
@ThorstenMueller
@ThorstenMueller 6 ай бұрын
Try running "pip install setuptools wheel -U" before, maybe this helps.
@gael3023
@gael3023 Жыл бұрын
I have a question. How can I stop in the middle and learn from that point again?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
This is called continue/restore and i've made a video on that. kzfaq.info/get/bejne/hZx7q62Dnpu7oHk.html Does this help you?
@youngphlo
@youngphlo 8 ай бұрын
I follow every step up until 08:33 but when I run `pip install TTS` it tries to install every version of transformers. I would share a screenshot if I could. Never seen a `pip install` go through all the different versions of a package
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
Maybe Coqui TTS dependencies have changed in newer releases? Could you download/clone the version i've used in the video just to check if this works.
@deeber35
@deeber35 Жыл бұрын
Can you change the tone of the voice reading text {e.g. excited, sad, etc}?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Emotions aren't supported on Coqui TTS models (as far i know). Maybe SSML in Mimic 3 might be at least a little bit helpful in that context.
@andiratze9591
@andiratze9591 Жыл бұрын
Hey Thorsten. Kann man coqui so installieren mit allen Models und Funktionen, wie auf der Website, dass man keine Commands mehr eintippen muss und es komplett offline nutzen kann über das User Interface?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Hi Andi, ich gehe davon aus, dass Du Coqui Studio meinst. Soweit ich weiß, ist das nicht Teil ihrer Open-Source Veröffentlichung. Also sage ich mal, das ist nicht möglich. Lediglich das Kommando "tts-server" bringt ein lokal lauffendes Webfrontend, was aber natürlich nicht mit Coqui Studio verglichen werden kann.
@andiratze9591
@andiratze9591 Жыл бұрын
Gibt es andere Software, die man, nachdem man alles eingerichtet hat, offline nutzen kann oder Coqui wenigstens mit ein paar pretrained Models?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@andiratze9591 Du kannst alle Coqui TTS Modelle offline nutzen, nur eben nicht per so komfortabler Oberfläche wie Coqui Studio. Kennst Du das Video von mir? Da zeige ich das. kzfaq.info/get/bejne/l9KgfJB107zQf2Q.html
@andiratze9591
@andiratze9591 Жыл бұрын
Ah danke, ich dachte, das ist nur ein Video mit Terminalbefehlen, ohne vorhandenes User Interface. Ich mache nachher mein Windows neu und probiere es mal aus.🙂
@andiratze9591
@andiratze9591 Жыл бұрын
Ich werde später mal versuchen, Python zu lernen. Vielleicht kann ich mein eigenes TTS-VC programmieren. Es ist unmöglich Freesoftware in dem Bereich zu finden, die einfach zu bedienen ist. Bei allen finde ich was. Foto Video u.s.w aber tts ist voll schlimm🥴
@Hellfreezer
@Hellfreezer Жыл бұрын
Is there a way to stop and resume training? The continue path command does begin the process but it then fails when generating sample sentences.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
It's some time ago since i used continue/restore a training. I guess you know my video on exact this topic? kzfaq.info/get/bejne/hZx7q62Dnpu7oHk.html This isn't working? Maybe it's a bug or a changed usecase in Coqui TTS then.
@Hellfreezer
@Hellfreezer Жыл бұрын
@@ThorstenMueller Yes, that's the video I found the method in. I'm not sure if anyone else is having the same trouble, but I haven't been able to find a solution at present.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@Hellfreezer Is there any specific error message when running continue and while generating sample sentences?
@Hellfreezer
@Hellfreezer Жыл бұрын
@@ThorstenMueller I tried to post the full info but it seems to have been hidden. Basically the traceback ends in TypeError: expected string or bytes-like object
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@Hellfreezer There's a closed issue on that. Maybe this is helpful for you. github.com/coqui-ai/TTS/issues/2070
@muhammadalfahrezi1745
@muhammadalfahrezi1745 Жыл бұрын
I want to make a new model of Indonesian language. but in espeak-ng it doesn't support that language. is it still possible to make a new model?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your good question. Yes, that's possible. You can set "use_phonemes" to "false" and then it will use character based training. Maybe this helps a bit. tts.readthedocs.io/en/latest/tutorial_for_nervous_beginners.html?highlight=use_phonemes
@muhammadalfahrezi1745
@muhammadalfahrezi1745 Жыл бұрын
@@ThorstenMueller still using espeak or not? the alphabet is the same as in English, but only the spelling is different. sorry I ask a lot
@Ecoute_AI
@Ecoute_AI Жыл бұрын
Sir while running last line, error occurres = charmap, codec can't decide bytes. Plz help
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Is your config file in UTF-8?
@magenta6
@magenta6 Жыл бұрын
Thanks Thorsten for your endless efforts at communicating a complex subject with enthusiasm and passion to people who don't know much about python. I see that you have linked another video about preparing recordings kzfaq.info/get/bejne/ar-Ea7qLucXcZGw.html
@ThorstenMueller
@ThorstenMueller Жыл бұрын
You're very welcome 😊. And yes, i'm really passionate about this topic.
@IngridUterus
@IngridUterus 6 ай бұрын
Ich habe Python 3.11 installiert. Muss ich das deinstallieren und 3.8 installieren? Wäre voll kacke
@ThorstenMueller
@ThorstenMueller 5 ай бұрын
Laut Readme sollte Python 3.11 funktionieren (python >= 3.9, < 3.12.).
@forgottendreams
@forgottendreams 10 ай бұрын
i can't get the pip command to work, help!!
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
What error message are you receiving?
@mungamurisairamiiitdharwad7451
@mungamurisairamiiitdharwad7451 Жыл бұрын
how many samples do we need for the trainnig
@ThorstenMueller
@ThorstenMueller Жыл бұрын
As always - it depends 😉. With less than 100 the training process will not start. I recorded > 10.000 phrases for my german "Thorsten-Voice" TTS models. But phonetic coverage might be more important than the pure number of recordings.
@-.nocturna.-
@-.nocturna.- 9 ай бұрын
How long does it take to train a model? lg
@ThorstenMueller
@ThorstenMueller 9 ай бұрын
Hallo 👋. For my Thorsten-Voice models training took around 3 month 7x24 compute time. But this depends on your available hardware for training.
@-.nocturna.-
@-.nocturna.- 9 ай бұрын
@@ThorstenMueller Woah, did you train it yourself? What GPU did you use? Thats insanely long in this trying times of energy prices. :/
@ThorstenMueller
@ThorstenMueller 9 ай бұрын
@@-.nocturna.- Absolutely. This is the usual trade-off between graphics performance and duration. I used an NVIDIA Jetson Xavier AGX, which has a relatively low power consumption.
@-.nocturna.-
@-.nocturna.- 9 ай бұрын
@@ThorstenMueller Thats a nice one. 30w vs the 320w of my 4080 :| i think i will do it if my other projects fail :P Have a nice night :>
@JamesBond-ix8rn
@JamesBond-ix8rn Жыл бұрын
how long training until it sounds good?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Depending on what you mean by "good" 😉. By step 30k you should be able to hear a voice with lots of background noise. Starting by step 100k voice should be clearer. Then it's up to your personal expecations.
@JamesBond-ix8rn
@JamesBond-ix8rn Жыл бұрын
@@ThorstenMueller thanks for the prompt response. how long does this take in hours/days/months and how much input data would approximately need?
@ThorstenMueller
@ThorstenMueller Жыл бұрын
​@@JamesBond-ix8rn It's hard to call specific values as it depends on the hardware you have available for training. Might be some hours to weeks/month training time. Ensure a good phonetic balance and add more recordings by time if you're not satisfied with the result.
@techterry5299
@techterry5299 5 ай бұрын
5:36 is not very clear where did that come from?
@ThorstenMueller
@ThorstenMueller 5 ай бұрын
You mean the voice dataset in this LJSpeech file and directory structure?
@MistakingManx
@MistakingManx 2 ай бұрын
Right, how should I go about creating the dataset though?
@ThorstenMueller
@ThorstenMueller 2 ай бұрын
Hi, do you know my tutorial on Piper-Recording-Studio for doing so? kzfaq.info/get/bejne/kJego9epsbrDY30.html
@MistakingManx
@MistakingManx 2 ай бұрын
@@ThorstenMueller I started following your mimic recording studio and it's instructions, so I could make my own Coqui LJSpeech model, but it isn't working for some reason. Some files don't exist anymore, and it seems mad about numpy.
@ThorstenMueller
@ThorstenMueller Ай бұрын
@@MistakingManx Hmm, as Mimic-Recording-Studio is not actively maintained this might stop working due newer package versions (like numpy). I'd use Piper-Recording-Studio as it will generate an LJSpeech like dataset too.
@MistakingManx
@MistakingManx Ай бұрын
@@ThorstenMueller I already used mimic-recording-studio, it's what the tutorials used, and it seemingly worked fine, minus the part I had to fix. Your script that makes the dataset was useful, I just can't get the training stuff to work at all. I wanted to use windows since I have a 4090ti on it. Would it be possible to talk on a platform like discord?
@ThorstenMueller
@ThorstenMueller Ай бұрын
​@@MistakingManx You can send me an email using my contact form here: www.thorsten-voice.de/en/contact/ But it might take some time to respond for me so please be a little bit patient 🙂.
@kaymat2368
@kaymat2368 Жыл бұрын
11:09 Help please im stuck in this step becuase its gave this error: "OSError: [WinError 126] The specified module could not be found. Error loading "cudart64_110.dll" or one of its dependencies."
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Seems like your CUDA installation is broken. Are you sure CUDA is installed correctly?
@kaymat2368
@kaymat2368 Жыл бұрын
@@ThorstenMueller Im not sure, i followed your steps clearly
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@kaymat2368 Hard to say, what might cause this issue. Maybe try installing a newer CUDA version.
@kaymat2368
@kaymat2368 Жыл бұрын
@@ThorstenMueller Ok, thanks for replying, btw, my GPU is nvidia GeForce GT 520, Os Win 7
@tesitest378
@tesitest378 8 ай бұрын
Coqui Eleutherodactylus a frog from Puerto Rico 🇵🇷
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
True, true 👍
@BaDHamisteR
@BaDHamisteR 10 ай бұрын
is it possible to train the model to speak in Portuguese?
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
Sure, if you have a Portuguese voice dataset ready for training.
@BaDHamisteR
@BaDHamisteR 10 ай бұрын
@@ThorstenMueller well.. i have my own voice 🤣. i wanna try that.
@thebluefacedbeastyangzhi
@thebluefacedbeastyangzhi 9 ай бұрын
Is there a non CUDA version?
@ThorstenMueller
@ThorstenMueller 9 ай бұрын
Coqui has a command line parameter called "use_cuda" which can be set to "false", but i guess training will take waaay longer than with CUDA.
@thebluefacedbeastyangzhi
@thebluefacedbeastyangzhi 9 ай бұрын
@@ThorstenMueller Thank you doe the reply. I have AMD and not Nvidia. So should I give up this method?
@ThorstenMueller
@ThorstenMueller 8 ай бұрын
@@thebluefacedbeastyangzhi Hard to say, but maybe you try a Google colab notebook with GPU that supports CUDA. Might be a more easy way for you if you don't have access to a local NVIDIA GPU card.
@thebluefacedbeastyangzhi
@thebluefacedbeastyangzhi 8 ай бұрын
@@ThorstenMueller thank you again for this information
@recrieprodutora
@recrieprodutora 11 ай бұрын
The process return the error: "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process..." Im used the your code.
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
I've seen this error previously, but i'm not absolutely sure about the reason. Is training running nevertheless or not starting? Does running command line prompt as admin change the behavior?
@recrieprodutora
@recrieprodutora 11 ай бұрын
@@ThorstenMueller The training starts, but the error occurs in the sequence. I don't know how to fix
@recrieprodutora
@recrieprodutora 11 ай бұрын
@@ThorstenMueller I tried modifying the root of the folder and the permission of the prompt, but the error keeps returning. Have you ever seen anything like it? Even using your "train..." which already contains "if _name_ == '__main__':", returns me with an error in training. Can you imagine which way I should go? 😪😥
@shadaaan
@shadaaan 10 ай бұрын
same error- i am also getting, any solution found this?
@JoeLinux2000
@JoeLinux2000 11 ай бұрын
Waiting fro Linux to get proper HQ Text to Speech.
@ThorstenMueller
@ThorstenMueller 11 ай бұрын
With Coqui TTS or Piper TTS there are some pretrained and really nice sounding TTS models available for Linux in multiple languages 😊. Do you know these?
@azer0013
@azer0013 10 ай бұрын
Where is TTS-training??
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
It is an empty folder in which you start working. I created a new folder "TTS-Training" but you can name it whatever you want.
@tarekhassan6958
@tarekhassan6958 Жыл бұрын
It looks like mining issues
@a.tevetoglu3366
@a.tevetoglu3366 10 ай бұрын
ei gude wie?! ;)
@ThorstenMueller
@ThorstenMueller 10 ай бұрын
Ei subba - un selbst? ;)
@a.tevetoglu3366
@a.tevetoglu3366 10 ай бұрын
@@ThorstenMueller wies halt so geht. Übrigens besten Dank für Deinen content. Ich hab mir 2 rtx a5000 gekauft, und frag mich was ich damit anstellen kann da ich kein Gamer oder Architekt oder Programmierer bin (die ursprüngliche Absicht eine Renderingworkstation zu bauen wurde aus unterschiedlichen Gründen obsolet) und deine Vids inspirieren zu ganz interessanten Versuchen. Ich war interessiert eigene ai Projekte auszuführen, und es scheint du bietest hierzu know how an. Beste Grüsse aus der Türkei vom rheinischen Exilanten.
@KominoStyle
@KominoStyle Жыл бұрын
Well something on my end is not working -.-!
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Do you get any specific error message?
@KominoStyle
@KominoStyle Жыл бұрын
@@ThorstenMueller Well sorry for the late respond, i tried many different ways to install and use TTS, but one big problem i have was that i cant install python 3.8 for all users, every other version i can and im not sure if thats the big problem
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@KominoStyle Which Python version are you using then?
@Hinterfrage
@Hinterfrage Жыл бұрын
Oh, nur betrug clips stellt der Herr rein, intressant, da gibt es viel zu reporten ...
@HighTechHomestead
@HighTechHomestead 5 ай бұрын
Thank you, this video has helped me get to this point. Can you help with this error, I am stuck here and can't seam to find a solution. I followed your video but when I go to run the trainer i get the following error: (TTS) C:\Users\7danny\Documents\CoquiTTS\TTS>python .\train_vits_win.py Traceback (most recent call last): File ".\train_vits_win.py", line 6, in from TTS.tts.configs.vits_config import VitsConfig File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\tts\configs\vits_config.py", line 5, in from TTS.tts.models.vits import VitsArgs, VitsAudioConfig File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\tts\models\vits.py", line 38, in from TTS.vocoder.models.hifigan_generator import HifiganGenerator File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\vocoder\models\hifigan_generator.py", line 6, in from torch.nn.utils.parametrizations import weight_norm ImportError: cannot import name 'weight_norm' from 'torch.nn.utils.parametrizations' (C:\Users\7danny\Documents\CoquiTTS\TTS\lib\site-packages\torch n\utils\parametrizations.py)
@ThorstenMueller
@ThorstenMueller 5 ай бұрын
You're welcome. Did you update all python packages before starting the training?
@OmriDaxia
@OmriDaxia Жыл бұрын
This is an awesome tutorial, thank you for doing all the trial and error that I kept running into. I do have one problem though. I've used your modified training script and only changed the directories, but I'm still getting a permission error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:/TTS/ThorstenTut/ljsAlex01-April-26-2023_05+12PM-0000000\\events.out.tfevents.1682554375.DESKTOP-IUNHJ2B' Is there any workaround for this? It's pointing to one of the files it just generated, which means it's not being used by any other process, so it must be that multithreading problem you mentioned still being an issue somehow.
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Thanks for your nice feedback 😃. I run into that permission thing once, too. I'm not sure how i solved it. I'll check my notes for this video and think how i solved this. When i remember i can share it here. Maybe try running command line prompt as local admin might be a first try.
@zsoltvastagh7023
@zsoltvastagh7023 Жыл бұрын
@@ThorstenMueller I have the same problem. Please let me know if you have found a solution to the error. Thank you very much!
@tuiskusaarelainen5635
@tuiskusaarelainen5635 Жыл бұрын
Any updates regarding this issue?
@OmriDaxia
@OmriDaxia Жыл бұрын
@@tuiskusaarelainen5635 nope, still stuck here. Not sure what to do
@ThorstenMueller
@ThorstenMueller Жыл бұрын
@@tuiskusaarelainen5635 Might this issue help you? For me it worked while testing for this tutorial. Hopefully i'll work for you too. If this is the case, i could add the link to the video description. github.com/coqui-ai/TTS/issues/1711
@psyk0l0ge
@psyk0l0ge Жыл бұрын
It tells me that I might need to install an third party phonemizer for the language de.... Where do you get the extra files from that u have installed and cd.. into at about 10:37 ? I
@ThorstenMueller
@ThorstenMueller Жыл бұрын
Did you install espeak-ng as shown here? kzfaq.info/get/bejne/mLCarbagxMyzg2w.html
@captainlavenderVHS
@captainlavenderVHS 7 ай бұрын
I had this problem too... A reboot seemed to fix it, but I also did a "pip install phonemizer" before, which may not have actually been necessary. In case anyone else is wondering, got this running on Win 11, using Anaconda 2.5.1 (Python 3.11.5), CUDA 12.3.5.1, and Coqui TTS 0.21.2
Voice design for Text-to-Speech with Coqui Studio
15:58
Thorsten-Voice
Рет қаралды 4,3 М.
Create your AI digital voice clone locally with Piper TTS | Tutorial
27:43
Я нашел кто меня пранкует!
00:51
Аришнев
Рет қаралды 1,3 МЛН
Local voice cloning with 6 seconds audio | Coqui XTTS on Windows
20:22
Free Speech: Reviewing Coqui-ai, Mycroft Mimic3 and Tortoise TTS Libraries
14:23
Create your own Text to Speech voice clone | FREE | LOCAL
1:11:04
Thorsten-Voice
Рет қаралды 39 М.
Should You Install Software Using .EXE or .MSI ?
10:15
ThioJoe
Рет қаралды 519 М.
How to Clone Any Voice With AI | Tortoise-TTS Tutorial
8:42
Prompt Engineering
Рет қаралды 122 М.
Super Fast Voice To Voice AI! | Voice Cloning with so-vits-svc
15:48
A Tip on Training Better Voice Models in Tortoise TTS
10:32
Jarods Journey
Рет қаралды 14 М.
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 7 МЛН
Gizli Apple Watch Özelliği😱
0:14
Safak Novruz
Рет қаралды 4,8 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 3,7 МЛН