No video

Florence 2 - The Best Small VLM Out There?

  Рет қаралды 14,310

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 42
@parkerspitzer
@parkerspitzer 2 ай бұрын
Thanks for your work on sharing this information. Much easier to watch your content than keep my ear to the ground all day trying to keep up. Much appreciated, sir.
@danielmz99
@danielmz99 2 ай бұрын
Thanks for the great content. A video going through the fine-tuning process on this one would be amazing. I am not sure how this could scale to a video implementation (probably passing a frame each time).
@coolmcdude
@coolmcdude 2 ай бұрын
I also would love a video/notebook for a Florence 2 fine tune
@shangonghowe
@shangonghowe Ай бұрын
Another video which I appreciate a lot! thank you for sharing. I would also like it if you could do another going through the fine-tuning process :)
@IsxaaqAcademy
@IsxaaqAcademy 2 ай бұрын
It's also good at OCR for hand written documents
@jeremybristol4374
@jeremybristol4374 2 ай бұрын
I'm enthusiastic about these smaller models. Thanks for covering this!
@IanScrivener
@IanScrivener 2 ай бұрын
Thanks Sam!! Please keep up the great work...
@mukkeshmckenzie7386
@mukkeshmckenzie7386 2 ай бұрын
Vqa tutorial would be nice!
@richardobiri2642
@richardobiri2642 Ай бұрын
Thanks a lot for this I wish you could consider the continuing process for identifying authentic and fake certificates 🙏🙏🙏
@aa-xn5hc
@aa-xn5hc 2 ай бұрын
Great, yes, fine tune would be very interesting.
@jefframpe5075
@jefframpe5075 2 ай бұрын
Thanks, Sam! I always appreciate your videos. I would love your take on how Florence-2 compare with Apple's 4M-21.
@RishabhMathur06
@RishabhMathur06 Ай бұрын
@samwitteveenai please make a fine-tuning video about VLMs such as: Llava, Florence-2 and if possible try to use Ollama so that we can make the inference on local device.
@GiovaniFerreiraS
@GiovaniFerreiraS 2 ай бұрын
I'd love seeing a fine tuning video, specially if it's not question answering, just so it's a different use case from the documentation. Maybe with a quick intro talking about what are possible scenarios where fine tune would be specially helpful.
@samwitteveenai
@samwitteveenai 2 ай бұрын
Noted!
@marcoscipioni132
@marcoscipioni132 Ай бұрын
Yes, I'm trying to use it for table extraction out of scanned pdfs with little success so far. Would love to see how you implement that.
@unclecode
@unclecode 2 ай бұрын
This is what people should call "small", anything below 1B! Thanks for your video. By the way, I played around with the quantized version, the result is unbelievably good! I shared a post on Twitter and mentioned you and shared the Colab. Take a look at it. I tried 8 bits and 4 bits. It's odd how 4 bits is almost the same as the base model!
@samwitteveenai
@samwitteveenai 2 ай бұрын
I saw you tweet and retweeted it, very cool stuff. I will check it out. just been knee deep in Gemma stuff for last few days
@unclecode
@unclecode 2 ай бұрын
@@samwitteveenai Thanks, and yes, it's Gemma2's turn. Waiting for your KZfaq notification about the Gemma video!
@micbab-vg2mu
@micbab-vg2mu 2 ай бұрын
Thank you - it looks interesting:)
@ariramkilowan8051
@ariramkilowan8051 2 ай бұрын
I think fine-tuning for OCR would be a good demo. OCR in the real world with images of documents is much harder than OCR on electronic documents so would be cool to see how a small model like this does as an alternative to Claude/GPT4.
@MH-ke2wi
@MH-ke2wi 2 ай бұрын
I tried the OCR and OCR with region on images converted (no scanned) from PDF pages. Nothing fancy, standard text with some titles, sections, lists... it is absolutely unusable. When it detects something, it usually got it right, but it could only see around 25% of the text.
@ariramkilowan8051
@ariramkilowan8051 2 ай бұрын
@MH-ke2wi yeah also been struggling to get decent results with OCR
@Dodomiaolegemi
@Dodomiaolegemi Ай бұрын
Thank you so much!!
@ranu9376
@ranu9376 2 ай бұрын
I've tried this model, describing the image is great. I've also tried the docvqa, but giving only one word answers and not getting even simplest questions right. i had hoped to do some classification and compare with other models.
@ALEXPREMIUMGAME
@ALEXPREMIUMGAME 2 ай бұрын
awesome, thanks
@mshonle
@mshonle 2 ай бұрын
I wonder how much performance would be affected when something so distilled then gets quantized? Also, it seems amazing that it can handle segmentation for an unspecified set size! With Phi3 Vision you would need to provide a token to represent, say, each giraffe you want to identify.
@samwitteveenai
@samwitteveenai 2 ай бұрын
quantization is a good question! I would expect it to suffer more than a big model. Might give it a test tomorrow.
@srk5702
@srk5702 2 ай бұрын
We request you to do fune tuning on object detection. Because, all llms are useful generating text oupit only. Thanks in advance
@sohitshivhare1541
@sohitshivhare1541 2 ай бұрын
Thanks for the information this is great. Can i fine tune it for certain specific images like few short learning. Can you put a tutorial for the same it will be great full.
@ShravanKumar147
@ShravanKumar147 2 ай бұрын
what would you pick for fine-tuning ? Any specific application ideas?
@toadlguy
@toadlguy 2 ай бұрын
Would be interested on how much memory is required to run these models. they seem pretty small even unquantized. Maybe I will try it later on my 8GB M1 Mini. One thing I am curious about: at 3:38 , the description for the image is wrong in ways that seem odd. The title is described as being on top with the "20 Years of ..." underneath and Ron's tie is described as red and hair blonde. I wonder if this is just vagaries of the model (placement data would be strange) or over reliance on training data. Or a straight up mistake in 'creating' the paper (which would probably be the most disturbing😉).
@SaiManojPrakhya-mp4oe
@SaiManojPrakhya-mp4oe 2 ай бұрын
It would be great if you can show a finetuning example!
@yassinebouchoucha
@yassinebouchoucha Ай бұрын
When will you release a demo on to fine-tune such model ?
@tonyrungeetech
@tonyrungeetech 2 ай бұрын
Hi Sam. Thank you for the videos. I've been playing around with some of the smaller vision models and trying to implement batched inferencing with little success. If you were trying to accomplish running multiple VQA style questions against the same image quickly, how would you go about that goal? Is batching even in the right direction I should be looking?
@pandian1537
@pandian1537 28 күн бұрын
Is it possible to train ocr task prompt for custom dataset and if we train Florence-2 for ocr task the will it affect performance of the model?
@AbhishekKotecha
@AbhishekKotecha 2 ай бұрын
Hi Sam, thanks for the video. What do you think about how does it compare with Phi3-V? My take is that this is more raw and better for fine tuning, do you also think so?
@Walczyk
@Walczyk 2 ай бұрын
this is completely better and more advanced than phi 3 v crap image detection
@JustEmbraceTheChallenge
@JustEmbraceTheChallenge 2 ай бұрын
Please do fine-tuning for Object detection
@SinanAkkoyun
@SinanAkkoyun 2 ай бұрын
Where is the dataset? I couldn't find the release
Gemma 2 - Local RAG with Ollama and LangChain
14:42
Sam Witteveen
Рет қаралды 17 М.
7 Days Stranded In A Cave
17:59
MrBeast
Рет қаралды 97 МЛН
Люблю детей 💕💕💕🥰 #aminkavitaminka #aminokka #miminka #дети
00:24
Аминка Витаминка
Рет қаралды 724 М.
艾莎撒娇得到王子的原谅#艾莎
00:24
在逃的公主
Рет қаралды 54 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 293 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 979 М.
LlamaFS - The Ultimate AI File Organizer You've Been Waiting For
5:48
Anubhav Shrimal
Рет қаралды 10 М.
Testing Microsoft's New VLM - Phi-3 Vision
14:53
Sam Witteveen
Рет қаралды 12 М.
5 Problems Getting LLM Agents into Production
13:12
Sam Witteveen
Рет қаралды 13 М.
What is an LLM Router?
9:16
Sam Witteveen
Рет қаралды 27 М.
What are AI Agents?
12:29
IBM Technology
Рет қаралды 230 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 197 М.
What is Hugging Face? - Machine Learning Hub Explained
10:05
NeuralNine
Рет қаралды 18 М.
7 Days Stranded In A Cave
17:59
MrBeast
Рет қаралды 97 МЛН