Voice Cloning In Multiple Languages - Open Source

  Рет қаралды 82,017

Prompt Engineering

Prompt Engineering

Күн бұрын

In this video, we will look at Bark, the state of the art Open Source Text to Speech model from Sun AI. I will show you how to set it up locally and generate audio from your text prompt. You will also learn how to CLONE voices with Bard by utilizing the Conqui TTS.
Have fun :)
#voicecloning #voice #ai
▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Support my work on Patreon: Patreon.com/PromptEngineering
🦾 Discord: / discord
▶️️ Subscribe: www.youtube.com/@engineerprom...
📧 Business Contact: engineerprompt@gmail.com
💼Consulting: calendly.com/engineerprompt/c...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
LINKS:
Suno Bark: github.com/suno-ai/bark
Suno HuggingFace: huggingface.co/suno/bark
Coqui-ai: github.com/coqui-ai/TTS
Bark in Coqui: tts.readthedocs.io/en/dev/mod...
Long-Form Generation: github.com/suno-ai/bark/blob/...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Timestamps:
Intro: [0:00]
Bark: [0:17]
Bark installation: [3:25]
Running locally: [5:10]
TTS - Examples: [7:45]
Longer Audio: [9:50]
Voice Cloning: [10:50]
Conqui AI: [11:51]
Suno Bark repo: [12:50]
Voice Cloning: [15:30]
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 109
@CS-hh1mn
@CS-hh1mn 11 ай бұрын
Great video! The second clone audio sounded more like the actual one. How did you train the model?
@kevinehsani3358
@kevinehsani3358 Жыл бұрын
Thanks for the video. Is this really doing voice cloning or rather just use different voices for tts. Coqui studio claims to clone any voice from 3 seconds of recording. Would this repository do the same. I did not hear an option where it would allow you to clone a voice first and then place it in the voices directory.
@agenticmark
@agenticmark 8 ай бұрын
Great work! I have been deep in voice cloning for awhile and Bark does pretty damn good when compared to tacotron2 and the like!
@bardaiart
@bardaiart 11 ай бұрын
Thank you for making this video! I'm hoping for a true competitor to ElevenLabs -- even if it's a commercial one. Thing is, Bark is very unpredictable, even when choosing the same speaker -- making it kinda useless for lengthy content :(
@MasterBrain182
@MasterBrain182 11 ай бұрын
This is pure gold 🔥🔥🔥
@user-pm5bc9wb7j
@user-pm5bc9wb7j 5 ай бұрын
Thanks you for this very valuable video. I have just 2 questions, can I save the voice ? and can I retrain a voice that i trained with other audios so that it become better. ( without the old audios )
@piclezwd
@piclezwd 5 ай бұрын
Great video, thanks! What is a good amount of recorded voice to use to train?
@Lawh
@Lawh 10 ай бұрын
How do I use the terminal to type down the first command? it's not working.
@revengeua
@revengeua 10 ай бұрын
Help please, could you share a working file for long text? I can't get it to work with the GPU, it's always an error. I have a 3070 8 gigabyte. I would be very grateful to you, thank you!
@mcarwa
@mcarwa 9 ай бұрын
Very interesting project. Can it do dubbing from English to Russian? I've already trained English native speaker dataset at RVC, but when I'm trying to speak in Russian with trained model it has English accent. How to remove English accent and save voice speaker identity?
@BilalKhan-pv7vi
@BilalKhan-pv7vi 9 ай бұрын
It is difficult to download the bark files one by one. i have cloned the bark github repo but the files are different in github repo as compare to the hugging face.Kindly tell us how you have downloaded the files .
@ozzy1987mr
@ozzy1987mr Жыл бұрын
tengo tiempo usando bark pero no sabia como hacer la clonación de voces, excelente video
@engineerprompt
@engineerprompt Жыл бұрын
Thank you 😊
@grillodon
@grillodon 9 ай бұрын
When i run the .py file I have this error: "line 1, in from transformers import AutoProcessor, BarkModel ModuleNotFoundError: No module named 'transformers' Obviously, I installed the transformers module has you explained. Thx
@bluebeam287
@bluebeam287 6 ай бұрын
what is vocoder model in Coqui tts. how to generate that? and how to genrate pth,index file in tts
@equilibriointeligente
@equilibriointeligente Жыл бұрын
gracias por compartir.” is: This video is very useful for me since I was looking for a free or open source AI tool to clone my voice, thank you for sharing.
@ronaldoromerovergel8373
@ronaldoromerovergel8373 9 ай бұрын
pudiste clonar tu voz??
@equilibriointeligente
@equilibriointeligente 9 ай бұрын
@@ronaldoromerovergel8373 si pero me salió en español con acento americano jaja.
@ronaldoromerovergel8373
@ronaldoromerovergel8373 9 ай бұрын
bro me puedes ayudar, a mi no me funciona el programa jajaj@@equilibriointeligente
@saddamabdulsamiuadam3781
@saddamabdulsamiuadam3781 Ай бұрын
commenting to always come back for this
@nhexplorers
@nhexplorers Жыл бұрын
How do you determine what version of python to use in your virtual environment? Is this specified somewhere?
@engineerprompt
@engineerprompt Жыл бұрын
Yes, for different projects you will usually see that it's specified. If not, 3.10 is a safe bet in most of the cases unless the project is using a very old version.
@mundoartitech
@mundoartitech Жыл бұрын
link Google clave?
@trendkillsp
@trendkillsp Жыл бұрын
The small model takes almost 3 minutes to generate audio using Google Collab Pro. I think it is a good model, but it has to improve generation time in order to be useful. I doubt this runs almost live in GPU, as they promise in Github
@Jwoodill2112
@Jwoodill2112 11 ай бұрын
it generates voice clips for me in about 15 seconds on GPU
@ESGamingCentral
@ESGamingCentral 10 ай бұрын
@@Jwoodill2112what GPU?
@Jwoodill2112
@Jwoodill2112 10 ай бұрын
​@@ESGamingCentral rtx 3060. it's a laptop so this one only has 6gb vram
@ESGamingCentral
@ESGamingCentral 10 ай бұрын
@@Jwoodill2112 thank you. I’m trying to find good software to upscale audio files to a higher audio quality
@TheJoepanelli
@TheJoepanelli 10 ай бұрын
Thank you, but How can we use our voice into other languages ? using our SAME voice. Thank you
@mequavis
@mequavis Жыл бұрын
ive been using bark on my chan for months now, its great
@engineerprompt
@engineerprompt Жыл бұрын
Nice, that's great to hear.
@moviesexplained5129
@moviesexplained5129 11 ай бұрын
Channel name
@user-ub5cx4tw9x
@user-ub5cx4tw9x 4 ай бұрын
did you train your voice in coqui ai?
@mequavis
@mequavis 4 ай бұрын
@@user-ub5cx4tw9x i remember we tried coqui, but it never worked right. We ended up using a modified version of Bark that had voice cloning hacked into it. it was very hit and miss to make a good npz file, but when you got a good one they were pretty dang good. We did a bunch of celebrities' and stuff but then decided to cool down because youtube was complaining :P
@nibo100
@nibo100 Жыл бұрын
there is no link to the longer audio method.
@engineerprompt
@engineerprompt Жыл бұрын
Sorry, just updated the description.
@BrillianceSpark
@BrillianceSpark 9 ай бұрын
can we clone our own voice using this and how long audio we can get
@IO-fz2sm
@IO-fz2sm 7 ай бұрын
Awesome! you just earned a subscriber...
@engineerprompt
@engineerprompt 7 ай бұрын
Thank you 😊
@CapitanMegaa
@CapitanMegaa Ай бұрын
My problem with this tts is that it takes like 45s to read the tts.. I need it to be the faster possible.. even 20s is already VERY LATE for me. Is normal for it to take that long? If is then this tts aint for me
@metaveta
@metaveta Жыл бұрын
waiting for dataset video🙌
@engineerprompt
@engineerprompt Жыл бұрын
Recording is done, editing soon :)
@basbasmounir3943
@basbasmounir3943 10 ай бұрын
🎯 Key Takeaways for quick navigation: 00:35 🌐 Bark, an Open Source Text-to-Speech model from Sun AI, stands out due to its high audio quality and support for multiple languages. 00:58 🎶 Bark can generate non-speech sounds like laughs, music, and emphasis, making the generated audio more natural. 01:25 💻 You can run Bark on both GPUs and CPUs, but the generated audios are typically around 13-14 seconds. 02:20 🌍 Bark supports multiple languages and even allows mixing languages in a single audio prompt. 03:29 🛠️ To set up Bark locally, create a virtual environment, install the Bark package, and the Transformers package. 07:53 🔊 Bark can generate high-quality, natural-sounding audio that can be customized with various options. 10:08 📜 To generate longer audios, you can split the text into sentences, generate audio for each sentence, and combine them. 11:02 🗣️ While not directly available in Bark, you can clone voices using the Kukui AI package, which supports Bark integration. 14:56 🎙️ Cloning voices involves importing the Bark model, configuring speaker and text information, and generating the cloned audio. 16:30 🤖 Cloning voices with Bark is probabilistic, and the quality of the output may vary depending on the input audio quality. Made with HARPA AI
@okachobe1
@okachobe1 7 ай бұрын
I dont think its Kukui AI is it? I think its Conqui AI
@m.rr.c.1570
@m.rr.c.1570 7 ай бұрын
after training the model on a specific voice how can i make say something in hindi rather than in english
@mtnmecca_ej
@mtnmecca_ej 11 ай бұрын
Thanks for the video. How do you specify using gpu in the code?
@cgd1602
@cgd1602 3 ай бұрын
Pytorch: Firstly you have to check if cuda is available by this line of code: device = "cuda" if torch.cuda.is_available() else "cpu" .... mostly you wil use the GPU for these lines of code: model = BarkModel.from_pretrained(model_id, torch_dtype=torch.float32).to(device) audio_array = model.generate(**inputs, do_sample=True).to(device)
@MichealAngeloArts
@MichealAngeloArts Жыл бұрын
The description is a bit confusing as it says Bard, not Bark, so I've initially thought we can do TTS with Google Bard (which was surprising!) 😃
@engineerprompt
@engineerprompt Жыл бұрын
Oh, fixed it. Thanks for pointing it out :)
@buenosdiasbendiciones
@buenosdiasbendiciones 11 ай бұрын
you are the best
@ilanser
@ilanser Жыл бұрын
Great video, you are an amazing content creator! keep on the good job. BTW, you sound very shy (speaking about voice), try to be more clear and happy when you voice over, like the American vlogers. Good Luck!
@user-cn6ur5mq7j
@user-cn6ur5mq7j 9 ай бұрын
Do you have colab version? It took me a few hours to download everything!
@lilianburdianov8494
@lilianburdianov8494 8 ай бұрын
Here it is: colab.research.google.com/drive/1dWWkZzvu7L9Bunq9zvD-W02RFUXoW-Pd?usp=sharing#scrollTo=FwXohRuBAfBN
@93simongh
@93simongh Жыл бұрын
is the voice cloning part in your script running on gpu or cpu?
@engineerprompt
@engineerprompt Жыл бұрын
CPU of M2 Max
@virtuakamp
@virtuakamp 7 ай бұрын
Hi, I followed all the steps but I'm getting the error: line 1246, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '
@marcosgodinhofilho8700
@marcosgodinhofilho8700 4 ай бұрын
in load_model.py file, comment these lines: if ( not config.USE_SMALLER_MODELS and os.path.exists(ckpt_path) and _md5(ckpt_path) != config.REMOTE_MODEL_PATHS[model_type]["checksum"] ): logger.warning(f"found outdated {model_type} model, removing...") os.remove(ckpt_path) if not os.path.exists(ckpt_path): logger.info(f"{model_type} model not found, downloading...") _download(config.REMOTE_MODEL_PATHS[model_type]["path"], ckpt_path, config.CACHE_DIR)
@harshmishra5331
@harshmishra5331 3 ай бұрын
@@marcosgodinhofilho8700 Not helping. Same error again.
@sebastianmoraugalde1997
@sebastianmoraugalde1997 3 ай бұрын
This means that your .pt files from /bark are corrupted, make sure that every single by the is downloaded as in the Hugging Face page of Bark
@abli00346
@abli00346 Жыл бұрын
i like ur voice
@elvismorellidigitalvisuala6211
@elvismorellidigitalvisuala6211 11 ай бұрын
Is a TTS? there's a way to use audio as input? and get another voice as ouput?
@engineerprompt
@engineerprompt 11 ай бұрын
Yes, you will need to add another layer into it. Something like whisper that takes your audio and convert it into text. That can be combined with the code in the video
@elvismorellidigitalvisuala6211
@elvismorellidigitalvisuala6211 11 ай бұрын
@@engineerprompt thanks for your answer, but I was thinking about something different, is still a TTS output, I need to preserve intonation, cadence, musicality of the input audio, think to an audio from an actor, with emotions, a tts output remove all emotions
@mipiaceiltubo
@mipiaceiltubo 10 ай бұрын
@@elvismorellidigitalvisuala6211 like rask ai? i'm actually looking for same
@m3nafsy
@m3nafsy Ай бұрын
so how to use all this in google colab?
@nirsarkar
@nirsarkar Жыл бұрын
What about Mac? Does wav work there? Great Video though!
@engineerprompt
@engineerprompt Жыл бұрын
Yes.
@qwertyntarantino1937
@qwertyntarantino1937 8 ай бұрын
Voice generation didn't work for me. I've got error `_pickle.UnpicklingError: invalid load key, '
@harshmishra5331
@harshmishra5331 3 ай бұрын
How have you solved it?
@qwertyntarantino1937
@qwertyntarantino1937 3 ай бұрын
@@harshmishra5331 I didn’t. This is still a new technology with tons of bugs so it’s not ready for wide usage yet
@harshmishra5331
@harshmishra5331 3 ай бұрын
I am not using it for wide use .
@harshmishra5331
@harshmishra5331 3 ай бұрын
Just a small project. But am not able to get across this error
@chanansiegel834
@chanansiegel834 Жыл бұрын
and you can run all this in cpu?
@engineerprompt
@engineerprompt Жыл бұрын
Yes
@vivekakaviv
@vivekakaviv 4 ай бұрын
Thanks for the video! But the cloning is terrible. The code works but the cloned voice does not sound anything like the original. Anyone facing similar issue? Any solutions?
@harshmishra5331
@harshmishra5331 3 ай бұрын
which python version are you using?
@zoybot2254
@zoybot2254 11 ай бұрын
tbh, bark still has a lot of instability. especially in languages other than English. unfortunately it is not a reality yet
@RikusLategan
@RikusLategan 9 ай бұрын
1:12 Limitation = Maximum length of generated audios = 15 seconds; but there are ways around this
@netgabo
@netgabo Ай бұрын
How?
@RikusLategan
@RikusLategan Ай бұрын
@@netgabo That was 8 months ago. I hardly think it still works/applies 😅
@rodrigocozta
@rodrigocozta 11 ай бұрын
should have win11 new pc or old win8 old cpu run it,thanks
@yahya_elistinsary
@yahya_elistinsary 6 ай бұрын
Do Bark also support TTS for Arabic language?
@engineerprompt
@engineerprompt 6 ай бұрын
I don't think so
@rsunghun
@rsunghun Жыл бұрын
Waiting for GUI 🤗
@magnus948
@magnus948 6 ай бұрын
Coqui ai shut down not working anymore!!
@JOHNSMITH-sj3lg
@JOHNSMITH-sj3lg 8 ай бұрын
hello my friend great video. my only problem is that is has a english accent. try to clone my german voice
@vivekkarumudi
@vivekkarumudi 5 ай бұрын
its extremely slow to the point where it doesnt make any sense to use it.. It would rather stick to my other options.
@billyindrajaya
@billyindrajaya 10 ай бұрын
Too short audio !
@mr2octavio
@mr2octavio Жыл бұрын
Description says BARD
@engineerprompt
@engineerprompt Жыл бұрын
fixed it :)
@lusineharutyunyan4523
@lusineharutyunyan4523 5 ай бұрын
can you do this for the Armenian language?
@engineerprompt
@engineerprompt 4 ай бұрын
I think it only supports english.
@nathanjohnson4721
@nathanjohnson4721 3 ай бұрын
is dutch also there
@engineerprompt
@engineerprompt 3 ай бұрын
I believe so.
@nathanjohnson4721
@nathanjohnson4721 3 ай бұрын
@@engineerprompt i want to do text to speech with dutch voices
@Hollywood1127
@Hollywood1127 10 ай бұрын
Is BARK able to speak more than 13 seconds of text?
@DavidBrown-tv8fx
@DavidBrown-tv8fx 10 ай бұрын
of course
@gRosh08
@gRosh08 9 ай бұрын
Uncle Amos said he is going to make a sandwich and try again later.
@docxonglabiet
@docxonglabiet Жыл бұрын
Language Vietnamese
@kitlt8476
@kitlt8476 8 ай бұрын
not any language!!!!
@mekkicharfi5454
@mekkicharfi5454 Жыл бұрын
Dark on dark , we see nothing !
@QHawk7
@QHawk7 10 ай бұрын
*No Arabic?* :(
@henrylawson430
@henrylawson430 11 ай бұрын
Lots of scams can be generated with this…
@c01000100
@c01000100 11 ай бұрын
Ugg... Cringe each time I hear visual code studio (vs visual studio code). Appreciate the content though.
@antonpictures
@antonpictures Жыл бұрын
Your channel is great, your voice and accent are bad, this is perfect. Use it! My voice is Annoying too so,
@danieletorrigiani
@danieletorrigiani Жыл бұрын
there is nothing wrong with his voice or accent, you are free to not appreciating your own voice, but please be considerate when making comments on other people attributes.
How to Clone Most Languages Using Tortoise TTS - AI Voice Cloning
29:40
The Secrets Behind Voice Cloning & AI Covers
16:54
bycloud
Рет қаралды 71 М.
Slow motion boy #shorts by Tsuriki Show
00:14
Tsuriki Show
Рет қаралды 10 МЛН
Как бесплатно замутить iphone 15 pro max
00:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
Ouch.. 🤕
00:30
Celine & Michiel
Рет қаралды 24 МЛН
Jumping off balcony pulls her tooth! 🫣🦷
01:00
Justin Flom
Рет қаралды 27 МЛН
Local voice cloning with 6 seconds audio | Coqui XTTS on Windows
20:22
FREE Text to Speech with YOUR Voice with Applio!
18:23
Bob Doyle Media
Рет қаралды 26 М.
How to Clone Any Voice With AI | Tortoise-TTS Tutorial
8:42
Prompt Engineering
Рет қаралды 124 М.
How To Create Your Own AI Clone for Videos (No More Shooting)
11:50
100x Engineers
Рет қаралды 566 М.
Unveiling the New AI Voice Cloner | OpenVoice
16:36
Wingnut Labs
Рет қаралды 16 М.
STOP Using Elevenlabs! Free Elevenlabs Alternative
6:51
Ai Lockup
Рет қаралды 228 М.
Training Any Language in AI Voice Cloning - Tortoise TTS
20:40
Jarods Journey
Рет қаралды 11 М.
Applio vs. Kits: Multilingual TTS (and lip sync Face Swap!)
15:14
Bob Doyle Media
Рет қаралды 5 М.
Копия iPhone с WildBerries
1:00
Wylsacom
Рет қаралды 8 МЛН
Looks very comfortable. #leddisplay #ledscreen #ledwall #eagerled
0:19
LED Screen Factory-EagerLED
Рет қаралды 11 МЛН
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 8 МЛН