Speech Recognition in Python | finetune wav2vec2 model for a custom ASR model

Рет қаралды 6,956

9 ай бұрын

In this KZfaq tutorial, we'll explore the Wav2Vec2 model, a powerful tool for speech recognition and representation learning. If you're in the field of speech recognition or interested in top-notch models, you've likely heard of Wav2Vec2. This video focuses on practical steps, guiding you through fine-tuning Wav2Vec2 with your own speech data without delving deep into technicalities.
Wav2Vec2 is designed for Connectionist Temporal Classification (CTC) loss, and we'll show you how to use it effectively for your tasks. You can leverage pre-trained models and adapt them to your needs, saving you from starting from scratch.
We'll walk you through the code, ensuring you have the necessary requirements like PyTorch and Transformers. You'll also learn how to apply audio augmentations to enhance data quality.
Throughout the tutorial, you'll discover how to monitor your model's progress with TensorBoard, implement early stopping, and save the best checkpoints. We'll also cover converting your PyTorch model to ONNX for easier deployment on various platforms.
To validate the model's performance, we'll run inference on a test dataset, checking character and word error rates to showcase the model's accuracy.
This tutorial aims to empower you to use Wav2Vec2 effectively for speech recognition tasks, whether you're a beginner or an experienced practitioner.
GitHub link: github.com/pythonlessons/mltu...
Trained model: drive.google.com/drive/folder...
#transformers #nlp #wav2vec #tensorflow #pytorch

Пікірлер: 28

@infinitewebrevolution 3 ай бұрын

Thank you so much sir with your hard work and pertained model, it has helped me alot I would always thank you

@PyLessons 3 ай бұрын

Glad to hear that! You are welcome

@PyCode.academe 5 ай бұрын

God bless you!

@PyLessons 5 ай бұрын

You are welcome :)

@AmitYadav-rp3ot 7 ай бұрын

Hi there, great video! I wanted to know your opinion on training a model like this just for recognising numbers and couple of words from an audio file. will such a custom training help to reduce the size of the model ? I want to create a very small model so that I can run it on a sub GHz clock CPU. please share what you think. Many thanks

@PyLessons 7 ай бұрын

Hi, thanks! No, training model on simpler data doesn't reduce model size. Check my other videos to create your own custom model for simpler data, such as numbers and words. But if your variety of words is simple, maybe you should consider classification task. Also, to reduce size of the model check quantization and pruning techniques

@N3ONGNCS 19 күн бұрын

i want to create an ASR for an African Vernacular/local language ,could i use this for that, ill create my own dataset if need be, or what would you suggest, im attempting this for the first time an am a little lost and overwhelmed

@hugok6212 4 ай бұрын

Excellent video and explanation. I have a question, if I train a model this way, can I use it for speech recognition in real time?. Thank you

@PyLessons 4 ай бұрын

Hey, yes and no. If depends on what hardware you'll run model (cpu, gpu or other). It depends on your "real time" requirements. You need to test it and you'll see :)

@djrocks5678 7 ай бұрын

Hi there! Thanks a lot for this. I wanted to ask you - I am working on a desktop voice assistant project as part of my university work. I wanted to train my own speech recognition model. How would I go about this? I saw datasets and something like Mozillas 79GB data is too much for my needs and was wondering how I'd go about making a smaller scale speech recognition model for my project.

@PyLessons 7 ай бұрын

Hi, usually its impossible to get great results, without huge datasets and GPU computing. But you may try to create a custom ASR model with my another tutorial, what you can check here: kzfaq.info/get/bejne/bs2PZ694l9LUgmw.html. Also, there are a lot of trained ASR models that usually you need only to integrate (just an idea)

@user-ow5ck4by7u 2 ай бұрын

your contact please ?

@shafiqrhmankeliwall8019 3 ай бұрын

Hi Great job Keep it up, I have one question that : I want to build/Train model for some low resource languages such as Pashto, I will make a dataset from scratch. any idea how to start or any useful links. Thanks

@PyLessons 3 ай бұрын

Thanks! I do not recommend to make a dataset from scratch alone, I believe you should be able to find something in open source. I don't have dataset, but check my dataset structure and you'll see what format it required

@user-gl9fq3rk8i 4 ай бұрын

Good job👏..........but i'm getting errors on onnx installation, ....what python version did you use

@PyLessons 3 ай бұрын

I used it with 3.10 python. What error you receive, often it might be related with protobuf version

@maimunahmaskur7525 Ай бұрын

its a great code! Could you please help, if I want to use this code for a dataset labeled phonemes and use PER (Phoneme Error Rate) for test and validation, what should I do? I mean which parts of the code do I need to adjust? Thank You!

@PyLessons 24 күн бұрын

I am not familiar with PER, so I can't tell you

@victormessias107 3 ай бұрын

When I'm training, its freezes on the end of the first epoch. Any idea?

@PyLessons 3 ай бұрын

It shouldn't be like that, try to debug it. For example iterate through training data provider and validation data provider, for example "for data in data_provider" and check if it can reach the end. If you still face these issues open issue on GitHub with more details