OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python

  Рет қаралды 8,609

Data Science Garage

Data Science Garage

Күн бұрын

Fine-tune OpenAI's Whisper to different language is simple using Python and Google Colab with GPU. In this tutorial, I selected the small version of Whisper AI model to fine-tune to Lithuanian language. Whisper can transcribe 96 other languages along with also being able to translate from those languages to English
Also, this video partly explains Whisper AI paper (tokenizer, encoder, decoder, padding, and other) and the model itself.
Before starting hands-on with Whisper, you should create your Hugging Face token at: huggingface.co/settings/tokens
You can check language dataset from Mozilla-foundation used in this tutorial at: huggingface.co/datasets/mozil...
Using Whisper for transcription in Python is very easy.
Whisper is an automatic speech recognition (ASR) system released by OpenAI and trained on 680000 hours of multilingual and multitask supervised data collected from the web. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.
For this video example, we will use a small version of WhisperAI model.
You can check for all available versions in the official Whisper AI Github model card: github.com/openai/whisper/blo...
The sections are:
0:00 - Hands-on steps
3:14 - Install PyTorch for WhisperAI with CUDA
3:34 - Set GPU Runtime in Google Colab
4:14 - Install ffmpeg package on the machine
4:40 - Install dependencies for fine-tuning
5:35 - Step 0. Log in to Hugging Face
6:09 - Step 1. Loading the dataset
7:16 - Step 2. Prepare Feature Extractor and Tokenizer
8:24 - Step 3. Combine elements with WhisperProcessor
9:06 - Step 4. Preapare data
11:03 - Step 5. Training and Evaluation
11:09 - Step 5.1. Initialize the data collator
12:26 - Step 5.2. Define evaluation metrics
12:56 - Step 5.3. Load a pre-trained Checkpoint
14:13 - Step 5.4. Define the training configuration
15:48 - Step 5.5. Train the Whisper AI model (fine-tune)
The Github repo with the full code available at: github.com/vb100/whisper_ai_f...
Technical definitions mentioned in the video:
- WhisperFeatureExtractor: huggingface.co/docs/transform...
- WhisperTokenizer: huggingface.co/docs/transform...
- WhisperProcessor: huggingface.co/docs/transform...
- WhisperForConditionalGeneration: huggingface.co/docs/transform...
- LogMel Spectogram: / understanding-the-mel-...
‪@DataScienceGarage‬ - subscribe and get more high quality content soon!
#whisperai #openai #transcription

Пікірлер: 34
@DataScienceGarage
@DataScienceGarage Жыл бұрын
Thank you watching this video. I appreciate your time being here. Subscribe the channel to get more high quality videos soon! @DataScienceGarage The best place to learn Data Science with the best in the industry - Turing College. READ MORE HERE: turingcollege.org/DataScienceGarage - see you there!
@smkrishn
@smkrishn Жыл бұрын
Very nicely presented! Have subscribed to your channel and am eager to explore and learn!
@DataScienceGarage
@DataScienceGarage Жыл бұрын
Thanks for such feedback, really appreciate! :)
@hamzarashid714
@hamzarashid714 Жыл бұрын
Thanks for this video, super helpful! If I want to train the whisper model to transcribe Arabic audio into Arabic text, and also translate into english is this possible within 1 trained model by feeding it such a dataset? Or will I have to train 2 separate models?
@RZRRR1337
@RZRRR1337 8 ай бұрын
did you find anything?
@football-uj4yg
@football-uj4yg 10 ай бұрын
what about preprocessing the transcriptions, isn't it important, or it is handled with the whisper processor
@user-ef2pv2du3j
@user-ef2pv2du3j 9 ай бұрын
This is great, do you have any advice on training it on mixed language audio? Alot of our meetings are held with at least two spoken languages, would I have to create my own dataset for that?
@RZRRR1337
@RZRRR1337 8 ай бұрын
did you find anything?
@AntonioKrizmanic
@AntonioKrizmanic Жыл бұрын
This was a really great video, as short as possible without losing any of the important content. Can you please give some directions, for I can not use Mozilla datasets for my language (not yet built) so i would like to use a dataset i found in other places, locally downloaded to my computer. Every row contains a pair of sentences and names of correspondent .wav files. The wav files need to be transformed into 16000Hz sample-rate and turned into a spectrogram format, but i am not familiar with the whole datasets package environments (mostly using pandas with numpy). I don't expect you to guide me through the whole process, i just want to know where my code would deviate from yours. I can change sample rate myself, i can probably find a package that will create a spectrogram from those files and i can create a pandas dataset /csv file in which every row will be a pair of audio array (spectrogram) and the tokenized sentence it corresponds to, could i use the same DataCollectorSpeechSeq2SeqwithPadding class on that format and just continue from there?
@ruizard9583
@ruizard9583 Жыл бұрын
Hello, thank you for the great video! Did the Lithuanian language training dataset already exist or did you insert your dataset?
@DataScienceGarage
@DataScienceGarage Жыл бұрын
Hi! Good question. You can prepare the dataset by yourself, or you can download it from somewhere. For this tutorial, I used it from huggingface.co/datasets/mozilla-foundation/common_voice_11_0 , where you can choose the language you want from the list. And then you can define the url for that dataset in your Python code (6:23).
@babanana2431
@babanana2431 9 ай бұрын
@@DataScienceGarage hi, thank you for the great video, but i am having this error while it is doing eval on training step 10/50 because my eval step is 10 and max step is 50: RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
@levangelashvili7353
@levangelashvili7353 5 ай бұрын
Hello, I further trained the whisper model and received the file. And now I want to get the final ggml file format. Please tell me how to do this
@levangelashvili7353
@levangelashvili7353 5 ай бұрын
I tried running it locally. It runs on a medium db size. But I only have 8gb gpu. But the medium model requires 12gb for training. What can be changed to run on 8GB.
@babanana2431
@babanana2431 9 ай бұрын
Why i have this error when i try to fine tuning with my own data: RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
@daychow4659
@daychow4659 Жыл бұрын
wow, amazing! one question. how can we use the trainned model in whisper
@DataScienceGarage
@DataScienceGarage Жыл бұрын
Thanks for feedback! To use pre-trained model, that the idea for another video. I will keep that in mind.
@MW-dg7gl
@MW-dg7gl 2 ай бұрын
Can you provide resources or the code to show how you can create and upload a custom data set that you created yourself instead of common voice dataset. Thank you.
@nithinreddy5760
@nithinreddy5760 Жыл бұрын
Hello, this video is very helpful. Can you please put the link of the full notebook file after training, testing and making the predictions?
@DataScienceGarage
@DataScienceGarage Жыл бұрын
Hello! For now, I don't have the full notebook after the training once I did not waited 6 hours. What I will do, I will transform all the code to the Google Cloud VM with GPU and see how it will go there. I will update thst on this channel.
@nithinreddy5760
@nithinreddy5760 Жыл бұрын
@@DataScienceGarage Ok, please let me know when it's done. Thank you.
@worldbywatcher
@worldbywatcher Жыл бұрын
@@DataScienceGarage That would be really great since trying to type everything manually is pretty error prone
@rishabhsrivastava6282
@rishabhsrivastava6282 Жыл бұрын
At 15:01 executing the code with function "Seq2SeqTrainingArguments" is now throwing error as :- ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.19.0`: Please run `pip install transformers[torch]` or `pip install accelerate -U` Please help. P.S. I am using google colab.
@efkanerkmen8040
@efkanerkmen8040 11 ай бұрын
I have same problem did u remember did u fix this ? @DataScienceGarage
@syalwadea
@syalwadea 10 ай бұрын
yeah, can u help me? i got the same problems
@football-uj4yg
@football-uj4yg 10 ай бұрын
Hi ! I had the same issue , you should exucute this "`pip install accelerate -U" after that you need to restart the session again
@syalwadea
@syalwadea 10 ай бұрын
thank u for solution 😃@@football-uj4yg
@syalwadea
@syalwadea 10 ай бұрын
@@football-uj4yg i got error when step after training, i can't run the model (push_to_hub=False), can u help me :(
@bryantgoh1888
@bryantgoh1888 Жыл бұрын
How do i add my own voice into the trained data set?
@Shivam-nj9ly
@Shivam-nj9ly 11 ай бұрын
Did u get anything?
Fine Tuning ChatGPT is a Waste of Your Time
9:40
Stable Discussion
Рет қаралды 19 М.
ОСКАР ИСПОРТИЛ ДЖОНИ ЖИЗНЬ 😢 @lenta_com
01:01
когда повзрослела // EVA mash
00:40
EVA mash
Рет қаралды 3,8 МЛН
Heartwarming: Stranger Saves Puppy from Hot Car #shorts
00:22
Fabiosa Best Lifehacks
Рет қаралды 17 МЛН
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
28:10
Efficient NLP
Рет қаралды 4,5 М.
Best FREE Speech to Text AI - Whisper AI
8:22
Kevin Stratvert
Рет қаралды 908 М.
5 Things I Wish I Knew Before Learning Streamlit
21:34
Fanilo Andrianasolo
Рет қаралды 22 М.
Fine-Tuning GPT Models with Python
23:14
NeuralNine
Рет қаралды 8 М.
OpenAI API with Python | 3 Use Cases with codes
8:12
Data Science Garage
Рет қаралды 4 М.
I Built a Personal Speech Recognition System for my AI Assistant
16:32
Спутниковый телефон #обзор #товары
0:35
Product show
Рет қаралды 2 МЛН
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 2,5 МЛН
Simple maintenance. #leddisplay #ledscreen #ledwall #ledmodule #ledinstallation
0:19
LED Screen Factory-EagerLED
Рет қаралды 19 МЛН
После ввода кода - протирайте панель
0:18
Up Your Brains
Рет қаралды 1 МЛН