Local voice cloning with 6 seconds audio

Local voice cloning with 6 seconds audio | Coqui XTTS on Windows

Рет қаралды 29,017

Күн бұрын

Tutorial on local voice cloning with Coqui XTTS on Windows with just 6 (!) seconds of audio. Easy to integrate in Python scripts. Please share your thoughts on the quality in the comments.
* XTTS Interview with Josh: • XTTS FAQ | Interview w...
* XTTS docs: docs.coqui.ai/en/dev/models/x...
* XTTS Huggingface space: huggingface.co/spaces/coqui/xtts
* CPML: coqui.ai/cpml
00:00 Intro
01:00 XTTS documentation
02:35 Playing around on Huggingface (no installation required)
04:00 Recording 6 seconds reference audio
05:00 Voice cloning on XTTS Huggingface space
09:50 Installing XTTS on Microsoft Windows
14:25 Using Python script to clone your voice locally
19:22 Some XTTS samples (emotional, neutral in english and german)
Please subscribe to my channel 😊.
kzfaq.info...
---
- www.Thorsten-Voice.de
- github.com/thorstenMueller/Th...

Пікірлер: 173

@toykotokyoto 6 ай бұрын

another great video, Thorsten 👏 We have a happy update... you can now use unlimited audio for the 0-shot clone :D no longer are you limited to just 6 seconds. The HuggingFace space is still hard coded to max out at 30 seconds though... so we don't overload their servers 😆

@ThorstenMueller 6 ай бұрын

You're very welcome and thanks for the update 😊.

@juanjesusligero391 6 ай бұрын

This is great news! :D You probably should make another video comparing the quality differences between the 6 seconds and 30 seconds input audio! (or maybe more, if you can change that max value in the local installation) ^^ @@ThorstenMueller

@ThorstenMueller 6 ай бұрын

@@juanjesusligero391 An audio samples comparison video with different audio input length is already in the making 😉.

@tsunderes_were_a_mistake 5 ай бұрын

Does the output sound better with longer audio? I tried the Japanese version on hugging face and output sounded robotic.

@ThorstenMueller 5 ай бұрын

@@tsunderes_were_a_mistake In my german model i didn't encounter a change depending on the text length. But i did not exactly check this specific aspect. If you think this would be helpful i can give it a more specific try (with a german model). But i can't say anything about the Japanese model.

@schakuun1995 6 ай бұрын

Great video! I'm really getting into TTS and it's so exciting to see what's possible now. It's incredible how something that needed hours of data a year ago can now be done in just 6 seconds. It's fascinating to watch this tech evolve

@ThorstenMueller 6 ай бұрын

Thank you for your nice feedback 😊. I'm really curious to see where quality is going in near future.

@juanjesusligero391 6 ай бұрын

I was exactly like you, I also had too high expectations for Coqui XTTS, haha ^_^ While the outcome wasn't quite what I was expecting, the results are still quite impressive, especially considering they are based on just a 6-second sample. I was also really happy to read in the comments that the devs are working on improvements, like allowing for voice samples longer than 6 seconds. I loved the video! Thanks a lot for your work, Thorsten! ^^

@ThorstenMueller 6 ай бұрын

Thanks a lot for your nice feedback 🥰.

@secondaccount5512 6 ай бұрын

Great video, expectations after listening to the interview with Josh were high, but XTTS is still kinda new, so I am excited for the future improvements.

@ThorstenMueller 6 ай бұрын

I'm excited too 😊.

@Reincarnated_Recap Ай бұрын

omg, the quality is so good compared to all the other voice-cloning TTS

@__________________________6910 6 ай бұрын

Sir, your explanation is very easy to understand.

@ThorstenMueller 6 ай бұрын

Thank you, happy to hear that 😊.

@MarcoManzo 6 ай бұрын

Great! I was looking forward to this, only got it running on linux. Thank you for the tech support ;-)

@MarcoManzo 6 ай бұрын

😂 maybe cuda is exactly my problem on windows🤷‍♂

@ThorstenMueller 6 ай бұрын

Thanks and you're welcome 😊. I'm happy if people find my videos helpful.

@Cmapukan 3 ай бұрын

Thanks for the good explanation and clear example. I wish you prosperity and new opportunities. I apologize for my broken English.

@ThorstenMueller 3 ай бұрын

Thank you for your nice comment. I wish you all the best, too 😊.

@anarmustafayev9145 6 ай бұрын

Genau das haben wir gesucht. Herzlichen Dank 👍

@ThorstenMueller 6 ай бұрын

Das freut mich sehr 😊.

@chrispeters8295 4 ай бұрын

Thank you for the super informative video! You're awesome!

@ThorstenMueller 3 ай бұрын

Wow, thanks a lot for your nice feedback 😊.

@AmrAli-ig2mk 2 ай бұрын

Thanks a lot for your efforts. you are doing great work, keep it up.

@ThorstenMueller 2 ай бұрын

Thank you a lot for your kind feedback - this keeps me motivated 😊

@chrsl3 6 ай бұрын

Amazing result.

@nerdynav 6 ай бұрын

Hi Thorsten, I am a computer engineer and AI KZfaqr myself (who isn't nowadays? haha :P). Just wanted to say that you make great tutorials on AI voice. I stumbled on this tutorial while exploring Coqui and it is the best tutorial I found. Thanks for taking the time to do these. Also, a subscriber asked me for a resource on Coqui TTS tutorials on reddit, I have shared your channel! Keep up the great work.

@ThorstenMueller 6 ай бұрын

Hi 👋. Thanks for your kind feedback on my content 😊. You're right, we are not alone on AI content 😆.

@ThatGuyNamedBender 4 ай бұрын

Pretty much 95% of youtube and the working class are against AI lmfao but keep daydreaming

@nuborn.studio 4 ай бұрын

Nettes Tool und großen Respekt an den Entwickler! Ich finde die Idee super, allerdings könnte ich persönlich nichts mit der Qualität anfangen. Aber hey, für 6 Sekunden input ist dass doch ein mega Ergebnis finde ich!

@ThorstenMueller 4 ай бұрын

Dem kann ich mich anschließen 😊.

@64jcl 6 ай бұрын

Quite amazing that they can do this with such a short clip. I had the same results as you with english, it doesn't really sound like me even though I tried to speak my best english. :) - How would you compare it with Piper with regards to TTS performance? Ofc Piper is quite difficult to train for new voices, but its free to use commercially even. I wish there was some simpler way to clone voices with it and that would be golden. I have looked at your video for this but preparing the training set seems like a chore.

@ThorstenMueller 6 ай бұрын

Thanks for your comment 😊. I didn't compare the performance between XTTS and Piper TTS. I guess when you want a free and best voice clone i'd go with Piper TTS right now, but the effort is higher - as you said.

@user-jf6li8mn3l 3 ай бұрын

Thanks for the informative video and interesting presentation. Please make a guide on how to train a model on a custom dataset.

@ThorstenMueller 3 ай бұрын

Thanks for your nice feedback 😊. This topic is already on my (growing) TODO list.

@JamBassMusic 4 ай бұрын

Thank you!!

@user-xu4so8fw3m 5 ай бұрын

amazing video! I am wondering if it's possible to train a given voice and then just use that voice for future use. In the "clone your voice locally" section, the code requires the reference audio as an input. I'm thinking in terms of efficiency and that if you plan to use the same voice over and over, you shouldn't need to train the model each time.

@ThorstenMueller 5 ай бұрын

Good question. I didn't think about that - up to now.

@64jcl 6 ай бұрын

Btw, how do I get the gpu parameter to work. I have a 3000 series GPU but even if I select gpu=True it says CUDA is not available. Also I have noticed that the cloned voice from my own speech shifts to sometimes output british accent and sometimes american (likely because my accent is neither). But it also means it is impossible to get consistent results with this. Is there some way to save a snapshot of whatever it came to was "the voice" and reuse that as input on subsequent generations. If not it is quite useless and just a fun demo really.

@ThorstenMueller 6 ай бұрын

Did you install CUDA and is it working? There are Python code sniplets available to check if CUDA is working.

@IvarDaigon 3 ай бұрын

I've been using coqui for months and it's amazing that Coqui simulates breathing at all, but breathing is typically the most distorted part of the generated the audio which can make it sound unnatural.. I'm wondering if you remove the breathing from the source audio whether that will improve the quality of the cloned voice or whether the distorted breathing is just a symptom of the underlying model.

@ThorstenMueller 3 ай бұрын

I've no idea how this could work. Maybe it helps if you use audio tools to cut out your breathing from the recording you provide to XTTS. Or maybe there are audiofilters like sox or ffmpeg that can remove breathing sounds from the generated audio.

@TomiTom1234 6 ай бұрын

Can you please tell me what program did you use to run the codes on @15:28 ?

@ThorstenMueller 6 ай бұрын

Sure, it's a code editor from Microsoft, called "Visual Studio Code".

@Aiolia_Games 2 ай бұрын

Posso usar essa voz para narrar um vídeo no KZfaq?

@DrFukuro 6 ай бұрын

Ich mag deine Videos sehr, auch wenn viele leider nur auf Englisch sind. Könntest du dir vorstellen, einmal ein generelleres Übersichtsvideo zur Sprachsynthese machen? Auch nach tagelager Recherche blickt man als Laie nur unvollständig durch, es wäre großartig, wenn mal ein Profi wie Du für den Interessierten etwas tiefergehend folgende Themen erläutert: Was genau ist/machen Coqui, Xtts, Tortoise, Espeak / espaek-ng und wo ist der Unterschied zu Mbrola und dessen Stimmen? (Kann ich tts anstelle von Mbrola in Skripten verwenden? Ja/nein - Wie/Warum?) Beispielhafte Fragen zu xtts: Was ist eine Multilingual Voice im Unterschied zur Thorsten Voice? Was genau ist voice cloning im Gegensatz zu voice transfer? Was machen/sind Coqui speakers? Wo ist der Unterschied darin, des xtts Modell zu feintunen und einfach nur eine speaker_wav Referenz anzugeben?

@ThorstenMueller 6 ай бұрын

Vielen Dank für deine tolle Rückmeldung und den Vorschlag 😊. Das Thema gefällt mir sehr gut. Wenn man sich so lange und intensiv mit einem Thema beschäftigt, dann werden diese "Grundlagen" irgendwie so normal, dass man gar nicht mehr drüber nachdenkt. Ich habe das Thema auf meine TODO Liste gesetzt. Besten Dank dafür 😊.

@amp3253 6 ай бұрын

Could you help, please? tts : The term 'tts' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + tts --list_models + ~~~ + CategoryInfo : ObjectNotFound: (tts:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException

@ThorstenMueller 6 ай бұрын

Did you use a python venv? Is this activated when try to run "tts" command? Does "pip list" show you an installed TTS package?

@ricardorey259 5 ай бұрын

Hello, good video, do you know how to remove the character limit restriction when writing? Warning: The text length exceeds the character limit of 239 for language 'es', this might cause truncated audio.

@ThorstenMueller 5 ай бұрын

Thanks for your nice feedback 😊. Hmm, not really. Earlier we sometimes run into a "max_decoder_steps" which caused truncated audio, but i'm not sure if this applies here too.

@LeSchurke Ай бұрын

nices video ;) und ei gude wie? Is it better, when the ref voice is longer than 6 sec? or doesn't matter or more worse? 00:43

@ThorstenMueller Ай бұрын

Ei subba, freut mich', dass des Video gefällt :) According to my talk with co-founder of Coqui AI, Josh Meyer, the model is optimized for a 6 second audio input. Before trying longer audio input try using other 6 second clips.

@ignacioalonsol Ай бұрын

Has anyone made a comparison between xtts and piper training? I'm curious on what's better quality @thorsten?

@ThorstenMueller Ай бұрын

Personally i prefer Piper. But i trained my models in piper with way more input data then the 6 seconds input to xtts.

@Gute_Nacht_Kurzgeschichten 2 ай бұрын

Super erklärt 👍Wie kann ich denn meine Stimme Klonen das er mir ganze Texte vorliest? z.b. eine PDF Datei oder ein Word Dokument, oder beschränkt es sich nur auf 6 Sek.

@ThorstenMueller 2 ай бұрын

Vielen Dank für das Lob - das freut mich sehr 😊. Eine fertige Lösung für Text/Word/PDF Input gibt es (glaube ich) nicht, aber generell kannst Du längeren Output erzeugen. Du musst den Eingabetext vielleicht aufteilen, aber sicherlich gehen deutlich mehr als 6 Sekunden.

@timo1949 4 ай бұрын

Sehr sehr guter Kanal! 👍 Ich habe mich gefragt: Was ist denn der Grund für die doch niedrige Samplingrate von 22.050Hz im ThorstenVoice Dataset? Einfach eine schnellere Vearbeitung der Daten?

@ThorstenMueller 4 ай бұрын

Vielen Dank für deine tolle Rückmeldung 😃. In den Tests war in der Audioausgabe kaum ein Unterschied hörbar, dafür aber war der Rechenaufwand bei bspw. 44kHz merklich höher.

@timo1949 4 ай бұрын

@@ThorstenMueller Danke für die Info. Elevenlabs will ja für ein Professional Voice Cloning auch nur 128kbps mp3 und meint, dass kein Nachteil feststellbar ist. Sehr interessant, wie die AI das verarbeitet.

@starbuck1002 6 ай бұрын

Ich habe mich ebenfalls ein wenig mit Coqui XTTS ausprobiert. Ich bin zu dem Entschluss gekommen dass es sich nicht lohnt. 1. kann coqui XTTS nicht annährend mit den führenden Mitstreitern bezogen auf Qualität der clones mithalten. 2. Ist coqui XTTS für diese Qualität bei diesem Preis meiner Meinung nach nicht lohnenswert, betrachtet man auch hier die Qualität und Pricings der Mitstreiter! Trotzdem wieder vielen Dank für dein Video Thorsten!

@ratside9485 6 ай бұрын

Welchen Preis? 1$ am Tag für Unternehmen sonst ist es Kostenlos.

@PlayGameToday Ай бұрын

What parameters I need to include to make audio output more quality? It's looks like only 96kbps bitrate..

@ThorstenMueller Ай бұрын

Normally generated output is the same samplerate as the voice dataset the model has been trained on. Maybe you can use tools like ffmpeg to adjust samplerate afterwards, but i doubt if this will increase the quality.

@PlayGameToday Ай бұрын

@@ThorstenMueller I need to train my own model in 48KHz, so the output will be more quality

@spiritual_audiobooks 2 ай бұрын

What do you say to Applio TTS? Maybe the best Open Source TTS?

@ThorstenMueller Ай бұрын

I didn't heard about Applio TTS. You say it's worth giving it a try?

@congtaihu1287 3 ай бұрын

thank you for this video! i am running into problems. when i execute the script, it shows "AssertionError: CUDA is not availabe on this machine.". But i have cuda12.3 and compatible torch and my other ai software ran well. i have no idea what is happening. please help!

@ThorstenMueller 2 ай бұрын

Does it work if you use it with "use_cuda false" in general?

@gorizon9802 4 ай бұрын

Is it possible to use AI even with texts in another language? I would really like to know because I want to dub a game with this tool.

@ThorstenMueller 4 ай бұрын

I'm not sure about that. I'd recommend asking on Coqui community, but as Coqui AI (the company) has shut down i'm not sure on how fast you might get an answer.

@bobbyboe 6 ай бұрын

Hi Thorsten, sieht so einfach aus bei dir. Ich hab Coqui über Pinokio installiert und gestartet, in der Erwartung dann irgendwie lokal zu dieser GUI zu kommen. Pinokio sagt dann auch "running" aber unter den üblichen local hosts im browser finde ich nichts. Dann gibt es noch einen button "server", den hab ich mal gedrückt und bekomme die Antwort: .........Connected! Macht alles den Eindruck als liefe alles wie es soll... nur für mich endet das Erlebnis dort, weil ich nicht weiß wo sich Coqui mir zeigen könnte... schade eigentlich. Pinokio ist normalerweise ein gute Zugang für Non-Coder.

@ThorstenMueller 6 ай бұрын

Meinst Du die GUI von Huggingface?

@bobbyboe 6 ай бұрын

@@ThorstenMueller ja, ich meinte generell irgendeine GUI

@marcinziajkowski3870 Ай бұрын

Can we create ready to use object instead of "speaker_wav" list passed every time we generate "output.wav" ? to speed up process ?

@ThorstenMueller Ай бұрын

As i'm not sure, i'd recommend asking on Coqui community on github. But as Coqui AI (the company) has shut down, i'm not sure on how fast you might get a reaction.

@PlayGameToday Ай бұрын

Hello, sir Thorsten! The title of the video doesn't really capture the point. Unfortunately, I didn't find in your video how to start the GUI for Coqui TTS. In the title to the video you have stated - XTTS - and just I was hoping that I could run the gradio-gui that was at the beginning of your video. Too bad you don't have a video tutorial on how to deploy on your local machine the handy GUI for voice generation that was in the demo.

@ThorstenMueller Ай бұрын

Do you mean the Huggingface UI from the video?

@PlayGameToday Ай бұрын

@@ThorstenMueller Yes

@Chriscs7 3 ай бұрын

What is better this or Tortoise TTS (Ecker Voice Clone) ?

@ThorstenMueller 3 ай бұрын

Hard to say, as i didn't give Tortoise TTS a closer look, but it's still on my todo list.

@EfficioIgnisVitae 5 ай бұрын

I'm getting this issue where when I try to check for models this happens: LLVM ERROR: Symbol not found: __svml_cosf8_ha Anyone know what's going on here?

@ThorstenMueller 5 ай бұрын

That's strange. Maybe recreate your python venv and reinstall. Maybe there's an error in your installation.

@alexlavertyau 5 ай бұрын

I have tried a some voice cloning tools and provided my voice as a reference audio, but none of the results sound anything like me... : ( I have an australian accent but the generated voices come out with American accents, not sure what I'm doing wrong.

@ThorstenMueller 5 ай бұрын

I guess you're doing nothing wrong. Maybe the english model has been trained on a voice dataset with hours of native english speaking people and one phrase has not enough "power" to change the accent. Normally i'd recommend asking in Coqui TTS community, but as Coqui is shutting down, it might take some time to get an answer, because of other priorities maybe.

@rogerperez9856 5 ай бұрын

Hello, do you know why when converting a text of about 500 words it takes about 25 minutes?

@ThorstenMueller 5 ай бұрын

I didn't try it with such long texts. Is it faster when you split it into smaller pieces and put the chunks together in post generation?

@Zimba-box 4 ай бұрын

I got this line or error code when I wanted to in the wheel -U: ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects how to fix that?

@ThorstenMueller 4 ай бұрын

Did you update pip to latest version first - "pip install pip setuptools wheel -U"?

@Name-is2bp Ай бұрын

did you make a tutorial on how to install and use cuda?

@ThorstenMueller Ай бұрын

No, not yet. But interesting idea. I've added it on my TODO list 😊.

@asanostudio 4 ай бұрын

Have you made a video tutorial to create a voice model for Indonesian, or how to add a voice model, I want to make an Indonesian voice model

@ThorstenMueller 4 ай бұрын

No. But as Coqui (company) shut down i'm not sure on further development of their code. Maybe it's worth taking a look to Piper TTS for training an Indonesian tts model. kzfaq.info/get/bejne/mMWnmMKb0seWYmQ.html

@ari4340 4 ай бұрын

Hello! I've been using this on hugging face for a few months, but today when I went to the page this error appears: Runtime error Scheduling failure: not enough hardware capacity Container logs: Fetching error logs... Any idea of what's happening? Thank you!

@ThorstenMueller 4 ай бұрын

According to the error message the XTTS container does not have enough compute power on Huggingface platform. This might be a temporary problem or might relate to the shutdown of Coqui AI as a company.

@ari4340 4 ай бұрын

@@ThorstenMueller Thanks for your reply! I hope it's not the later, It's the only free and online option that I knew of 😓

@terryjones2213 3 ай бұрын

What is your python version?

@asdasdaa7063 5 ай бұрын

I love your videos bro but you gotta speak a bit faster XD I have to play the video at 1.5x speed haha still love the videos!

@ThorstenMueller 5 ай бұрын

Hehe, thanks for your suggestion. I'll keep it in mind for next videos. As a non-native english speaker i have to think a little while for the right words 😆.

@john_blues 6 ай бұрын

Is this able to pull text from a text file? I have a Tortoise version that can do it, and it is helpful for long form text.

@ThorstenMueller 6 ай бұрын

IMHO this isn't supported by now. But finding a suitable solution for that is on my TODO list.

@john_blues 6 ай бұрын

@@ThorstenMueller For some reason my reply keeps getting deleted. Anyhow, I run a local TTS that can pull from a text file. Maybe it will help you. It is by neonbjb on Github.

@ThatPain1 4 ай бұрын

@john_blues You can totally read in, one or muliple files via python, transform the text as you like, and use xtts to generate a synthetic speech audiofile from it. Im using i currently to create sort of a audobook from a fanfiction. Removing points at end of sentences improved the result quite a lot.

@IngridUterus 5 ай бұрын

Hey, ich habe das über Pinokio installiert, da ich es anders nicht zum laufen gebracht habe. Allerdings weiß ich nicht, wie ich bei coqui-tts auf GPU umstellen kann. Welche Datei muss ich öffnen? Auch die Geisterstimmen möchte ich gerne verhindern. Weißt du wo ich da was einstellen muss? Ich weiß, dass es möglich ist, da ich einen Telegram-Bot verwende, der mit coqui arbeitet und fehlerfrei funktioniert, allerdings mit starker Zeichenbegrenzung. Achja, Zeichenbegrenzung :D wo kann ich die auch ändern? Danke dir im vorraus

@ThorstenMueller 4 ай бұрын

Bei den Coqui TTS Modellen gibt es einen Kommandozeilenparameter "--use_cuda". Damit sollte die GPU genutzt werden. Zur Länge kannst Du mal versuchen die Konfigurationsdatei des Modells zu öffnen und den Wert von "max_decoder_steps" zu erhöhen (habe ich aber bei XTTS selber noch nicht versucht). Viel Erfolg 😊.

@IngridUterus 4 ай бұрын

@@ThorstenMueller danke. Das werde ich heute Abend mal versuchen. Wo genau finde ich die Konfigurationsdatei? Ist das die configs.py im TTS Ordner? Gibt es auch eine Möglichkeit, die Fehler am Ende von Sätzen und in den Stellen zwischen den Sätzen zu vermeiden? Oft entstehen da auch eine Art Geisterstimmen, die echt seltsam klingen xD

@ThorstenMueller 4 ай бұрын

@@IngridUterus Hast Du die config Datei gefunden?

@IngridUterus 4 ай бұрын

@@ThorstenMueller Ja, ich habe eine bessere variante für coqui-tts gefunden, die wesentlich einfacher für Anfänger ist. Kann ich dir nur empfehlen: Alltalk_tts

@MrScesher 6 ай бұрын

Hi Thorsten, I can't get it to run. I always receive "No module named 'TTS.api'; 'TTS' is not a package" Even though the tts package is installed. Pip lists it in the installed packages. The few threads I found are no help. Maybe you have an idea?

@ThorstenMueller 6 ай бұрын

This is strange. If "pip list" shows the tts package then it seems that everything is installed correctly. Are you running your python script really in the right python venv? Can you run "tts --help" in the command line successful?

@MrScesher 6 ай бұрын

@@ThorstenMueller The tts command in the console works. tts --list_models too. And yes i am running the created venv.

@MrScesher 6 ай бұрын

@@ThorstenMueller I managed to get it running briefly when I use the setup of the git repo. But it is only working in that terminal and after closing it everything is gone with it. Thats not a solution, because the setup is taking too long.

@tsunderes_were_a_mistake 5 ай бұрын

I tried it on huggingface with Japanese but it sounded robotic. Can you make a tutorial on how to finetune xtts on local?

@ThorstenMueller 5 ай бұрын

Thanks for your topic suggestion. I've added it on my TODO list but it might take some time.

@nomadhgnis9425 5 ай бұрын

have a question for you. IF I wanted to pause for a number of seconds between sentences then how can I do that. Piper is really cool. Thanks.

@ThorstenMueller 5 ай бұрын

Normally this is an aspect of SSML (Speech Synthesis Markup Language), which is by now not supported by Coqui and Piper. Maybe you can try a workaround and add multiple dots (....) to create a pause. But i didn't try it out myself.

@nomadhgnis9425 5 ай бұрын

@@ThorstenMueller thanks. will try that.

@nomadhgnis9425 5 ай бұрын

@@ThorstenMueller just tried it. I put dots where I wanted to pause bit it does not work. It only responds to one dot.

@ThorstenMueller 5 ай бұрын

@@nomadhgnis9425 Okay, then maybe it's a workaround to create multiple tts wave files and merge them together including pauses. That's not an optimal way but it could do the job.

@nomadhgnis9425 5 ай бұрын

@@ThorstenMueller I found a way. I am using debian. I had to create a 3 second silent wav file and split the paragraphs into different wav files and then merge them together with the ilent wav where I need it. I done this with a bash script. So problem solved. Do you know where I can get more voice files other then the ones listed.

@callmefred Ай бұрын

It's sad that they've discontinued the project.

@ThorstenMueller Ай бұрын

Yes, but they did not just discontinue the project, but Coqui AI (the company) behind XTTS shut down.

@GESTOR-SITES Ай бұрын

How to fix "ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects" chatgpt cannot help me. it´s necessary downgrade python?

@ThorstenMueller Ай бұрын

Did you update the python dependencies in your environment? So running "pip install setuptools wheel pip -U"

@characters1210 2 ай бұрын

Can i make code clone Arabic voice and read arabic text

@ThorstenMueller 2 ай бұрын

I've no experience using Arabic with XTTS. Did you already try it using their Huggingface space?

@insanitytoons 2 ай бұрын

Cloning a voice with a sample of just 6 seconds even though it's not 100% identical, for me that's an AI that really needs to be improved, these AI that need dozens of hours to clone a voice didn't interest me much, I did it several tests using samples longer than 30, 60, 80 seconds in various languages and some were perfect, I also copied dozens of voices available on websites and the results were also very good, I suggest saving each audio generated in a different file because each The generated audio will never be the same as the previous one.

@ThorstenMueller 2 ай бұрын

Josh Meyer (co-founder of Coqui AI) mentioned in my XTTS interview that 6 seconds audio input duration should be perfect for XTTS model. kzfaq.info/get/bejne/jtl_gJSIv5bPaGg.html

@RossDCurrie Ай бұрын

"ERROR: Failed building wheel for tts" - What version of python are you running?

@ThorstenMueller 27 күн бұрын

This error often occurs when you use an older version of pip. Did you run "pip install pip setuptools wheel -U" before installing Coqui?

@RossDCurrie 26 күн бұрын

@@ThorstenMueller This may have been the issue. Played around with it a bit and got it working again, but can't recall exactly which thing I did differently. Thanks for the reply though! If you're looking for content ideas, one thing I am struggling with is how this all fits together now, in June 2024. Specifically - when I start the server and hit the local webserver, I get a very different UI than what I see in other videos on XTTS. And I know there are all different UIs for XTTS - there's a fine tuning one, a web UI, RVC, etc. and some of them have bits that don't work, and it sounds like Coqui has abandoned the project now and... it's hard to catch up on it all when coming into it for the first time, and it changes so rapidly. So I guess what I'm trying to figure out is - if I want to build an AI voice clone of me, today, what's the strategy/stack you recommend?

@stefanporath8392 4 ай бұрын

Hello Thorsten, great video tutorials but xtts is not for me. No support for windows and never will be. No chance on older macs with nvidia cards because of lacking drivers. No support on linux without cuda. I was really looking forward to this but I simply don't have the time to fidel around for days or weeks. Thank you.

@orcunaicovers17 Ай бұрын

It says Cuda is not available on this machine

@ThorstenMueller Ай бұрын

I'm working on a video about CUDA. If you want i can post an update here when it's online 😊.

@orcunaicovers17 Ай бұрын

@@ThorstenMueller I've solved the problem. Torch and CUDA version should be compatible with each other

@ThorstenMueller Ай бұрын

@@orcunaicovers17 Happy you could solve it 😊.

@ratside9485 6 ай бұрын

Kannst du auch zeigen, wie man es finetune kann? Aber Lokal? Danke

@ThorstenMueller 6 ай бұрын

Danke für deinen Themenvorschlag 😊. Ich habe es auf meine TODO Liste gesetzt.

@ratside9485 6 ай бұрын

@@ThorstenMueller gibt inzwischen auch auf GitHub ein WebUI fürs finetunen 🙌 funktioniert ganz gut. Das einzige was noch ein Problem ist das sich die Einstellungen ändern Temperatur und Co hab da Stunden ausprobiert es werden immer Sätze übersprungen.

@TNMPlayer 5 ай бұрын

For some reason my terminal doesn't run in the venv.

@ThorstenMueller 5 ай бұрын

Could you successfully create a venv and just can't activate it or can't you create it?

@TNMPlayer 5 ай бұрын

@@ThorstenMueller the venv created just fine but I couldn’t open a terminal within it

@ThorstenMueller 5 ай бұрын

@@TNMPlayer That's strange. Do you use the .bat or powershell (.ps1) file to activate the venv?

@TNMPlayer 5 ай бұрын

@@ThorstenMueller I used the .ps1

@ThorstenMueller 5 ай бұрын

@@TNMPlayer Maybe try out the .bat version, this could have an effect.

@developerzava Ай бұрын

TTS is available on python 3.12?

@ThorstenMueller Ай бұрын

According their README python 3.11 is the max supported version. As Coqui AI hat shut down i'm not sure if or when this will be adjusted to higher python version.

@NoxmilesDe 6 ай бұрын

Is there a TTS for Android?

@ThorstenMueller 6 ай бұрын

IMHO by now there's no support for Coqui und Piper TTS on Android. But this would be really cool 😎. Did you ask already at their communities?

@bill4320 6 ай бұрын

Not for commercial use. We need a truly open solution.

@juanjesusligero391 6 ай бұрын

Yeah, it's a shame it's not 100% open. Fortunatelly, we'll always have Tortoise tts :)

@chryseus1331 6 ай бұрын

Who cares it's not like they're going to sue you if you do.

@juanjesusligero391 6 ай бұрын

@@chryseus1331They could, though. If you have a company and want to use a software for commercial use, I wouldn't recommend ignoring its license.

@animations.ki.anokhi.duniya 2 ай бұрын

Coqui tts is shotting down?

@ThorstenMueller 2 ай бұрын

Sadly, yes. I've made a short about it. kzfaq.infoQMruRTlQu7I?si=JyDY8ziFJC8omAPY

@MuhammadChanif-cp2ut 2 ай бұрын

Anjai

@michaelroberts1120 4 ай бұрын

This is only interesting to developers and programmers. Regular hpbbyists will find this video useless, because Coqui has no GUI or server.

@ThorstenMueller 3 ай бұрын

Coqui TTS has a simple web UI if you run it locally where you can synthesize audio.

@user-ub5cx4tw9x 3 ай бұрын

must GPU?

@ThorstenMueller 3 ай бұрын

Generally (not sure for XTTS in special) CPU might work but way slower than using a CUDA enabled GPU.

@user-ub5cx4tw9x 3 ай бұрын

if i want to clone my own voice,i need to train this?how?@@ThorstenMueller

@ThorstenMueller 3 ай бұрын

@@user-ub5cx4tw9x I'd recommend you taking a look to Piper TTS for that. kzfaq.info/get/bejne/mMWnmMKb0seWYmQ.html

@user-ub5cx4tw9x 3 ай бұрын

thanks!@@ThorstenMueller

@FrankGlencairn 6 ай бұрын

Leider ist das ohne UI ein verdammter Alptraum für jeden der kein Programmierer ist.

@starbuck1002 6 ай бұрын

Ja, dann benutz doch einfach das UI! xD

@ratside9485 6 ай бұрын

Kannst Pinokio nutzen, mit automatischer Installation hat das Web UI von Huggingface

@FrankGlencairn 6 ай бұрын

@@ratside9485 leider bekomm ich da immer ne Fehlermeldung bei der installation,

@tonysolar284 5 ай бұрын

coqui is now dead

@ThorstenMueller 5 ай бұрын

Sadly yes, at least the company, let's see what's happening with the code and community.

@downloadpcgamesdirectlinkb7590 6 ай бұрын

i review its documentation you can't use this commercially, why waste time on this haha.

@Abwaham Ай бұрын

To learn? Or like, for fun?