Using GPT-4o to train a 2,000,000x smaller model (that runs directly on device)

Рет қаралды 117,634

Ай бұрын

More here: www.edgeimpulse.com/blog/llm-...
The latest generation LLMs are absolutely astonishing - thanks to their multi-modal capabilities you can ask questions in natural language about stuff you can see or hear in the real world ("is there a person without a hard hat standing close to a machine?") and get relatively fast and reliable answers. But these large LLMs have downsides; they're absolutely huge, so you need to run them in the cloud, adding high latency (often seconds per inference), high cost (think about the tokens you'll burn when running inference 24/7), and high power (need a constant network connection).
In this video we're distilling knowledge from a large multimodal LLM (GPT-4o) and putting it in a tiny model, which we can run directly on device; for ultra-low latency, and without the need for a network connection, scaling to even microcontrollers with kilobytes of RAM if needed. Training was done fully unsupervised, all labels were set by GPT-4o, including deciding when to throw out data, then trained onto a transfer learning model w/ default settings.
One of the models we train has 800K parameters (an NVIDIA TAO model with MobileNet backend), a cool 2,200,000x fewer parameters than GPT-4o :-) with similar accuracy on this very narrow and specific task.
The GPT-4o labeling block and TAO transfer learning models are available for any enterprise customers in Edge Impulse. There's a 2-week free trial available, sign up at edgeimpulse.com !

Пікірлер: 143

@PetterBruland 26 күн бұрын

Commercial use here would be a cat door for cats to get in/out of the house, and check if the cat has an object it its mouth, like a dead mouse, rabbit, bird, neighbor kid, etc, then not allow the door to open. Otherwise allow cat in.

@prabhattiwari5089 16 күн бұрын

That's a great idea.

@triton62674 13 күн бұрын

Data for a dead neighbour's kid might be tough to collect.

@peterferguson3374 Ай бұрын

Really exciting concept. Watching that had just stretched my mind with the intersection between cloud and device.

@blossom_rx Ай бұрын

Dear Jan, I just found your video in my feed. Recognized in the first minutes how much quality your content has. I took a look at your company and did not know Edge Impulse before. Even if I am yet a personal user of AI products I really got interested as I am planning to change my career path to the AI branch, as I think this is the right way. So to give you a feedback: If you create videos like this one, where you speak to the audience and show simple use cases as an example, you will attract more people. I will take an in depth look into your company now, it looks very interesting! Salutations and keep pushing!

@DDubyah17 Ай бұрын

This is a fantastic demo, really compelling. Can’t wait to try this out. Subscribed!

@kylelau1329 21 күн бұрын

I've been looking for this for a couple months, now I see this video, my mind is blowing!

@StaticFreq 20 күн бұрын

Agreed! Humans are never satisfied, isn't it wonderful! :) Solutions are there, if you look...

@zyzzyva303 Ай бұрын

Super cool application. Very impressive.

@stecj366 18 күн бұрын

This is so cool, congratulations on that!

@TheeHamiltons Ай бұрын

This is the way for distribution and actualization of LM impact in more real world scenarios

@NLPprompter Ай бұрын

the more I'm watching AI videos the more.... i encounter extraordinary peoples who are really share their knowledge... thank you.

@ketohack Ай бұрын

use large and mature modal as a labeller for specialized smaller trainning set.... very interesting thought!

@oleandrummer 16 күн бұрын

A real world use case i was thinking of a few weeks ago was a grocery store assistant. likely built into AR glasses, you would first build up grocery list then the model would look for those items via the camera as you're walking down the isles. probably would take a long time to train with all the different items you would need to account for but solves one of the biggest gripes i have with going to the grocery store.

@ArseySOAD 26 күн бұрын

You've now seen an example of overfitting en.wikipedia.org/wiki/Overfitting. The results aren't related to GPT at all. You could manually label the images and achieve similar outcomes. While GPT and its API can assist in labeling datasets at minimal cost, don't be misled by what you see. The fine-tuned network likely wouldn't work for new, differently designed toys or in different environments. You would need many more videos for a production-level classification solution and would end up spending more on GPT's API calls.

@Caellyan 11 күн бұрын

I'm guessing that's what they're doing - use GPT for labeling and then train a smaller classification model with data labeled by a bigger model.

@omarnug 6 күн бұрын

@@Caellyan What he meant is that the model trained in this video is only capable of recognising the toys it was trained on. It won't recognise any new toys. To do that, you'd need to train the model with many many more examples of toys. You could use GPT to label all those new examples, but would it be worth it?

@jensrenders4994 5 күн бұрын

"Overfitting" depends on what you want to fit. If you are making a robot that will only move around in that house, and will only see the objects in that house, then this works perfectly fine, no overfitting. There are many industrial usecases that are equally specific as this. If you claim that this is a universal toy detector, then you could say it is an example of overfitting. But this is a "toy in this persons house detector", and the tests show it generalizes beyond the training data (but still in this persons house).

@petersobolewski1354 2 күн бұрын

What'd be more interesting is having the LLM also segment the image automatically, although this can already be achieved through SAM or similar. I guess that it's overall an interesting direction to training small specific models like Yolo 😊

@Caellyan 2 күн бұрын

@@petersobolewski1354 Language models don't work with images.

@rahulcse03 Ай бұрын

Loved it ❤

@esoa1000 Ай бұрын

Very cool!

@Adventure1844 28 күн бұрын

Note that your 'toy' prediction is based on your training data and not on any other content than what it was trained on. Any other toys would never be recognized. Therefore, the model is very limited. However, for your purposes, it is certainly suitable and clearly illustrates how small a model can become when it is restricted to specific purposes.

@__hz__6983 26 күн бұрын

Is that the case? What's the purpose of the transfer learning step with the MobileNet v2 backbone then?

@goktug123123 19 күн бұрын

oh how could they not foresee such a problem and launch the service

@dimii27 18 күн бұрын

That's the point, you're "overfitting" your model so it is very efficient and lightweight on your data, let's say detecting when your car is parked by using your security camera. Perhaps it is not very impressive, but it does its job and only its job. Low latency and low resource with the cost of some accuracy and ability to recognize things it was not trained for. Basically, you're just making a goon that does its job at a minimal cost

@juliand3630 13 күн бұрын

Note that no one expects that. He said factory pretty often for a reason I guess. S T A N D A R D I Z E D So that thing has a pretty good chance to work well. (Imagine the same picture everyday from a camera that just sits in a backyard “did (XXX) change? As a prompt- and footage over some full years as training to not get confused with seasons)

@HazMozz Ай бұрын

Awesome!

@OzGoober Ай бұрын

Great stuff

@moudrik12 Ай бұрын

Really impressed by this feature. I will try it out with my students.

@matteovillani8983 Ай бұрын

This is insane!! 😮

@imlucas999 31 минут бұрын

Amazing Application, I would think about apply it to my screen reading for faster processing.

@danielmessias2811 Ай бұрын

This is awesome

@theoriginalrecycler 27 күн бұрын

All good stuff.

@user-xc2yc3vz5e Ай бұрын

amazing! thx

@iancampbell4582 27 күн бұрын

Great Demo. Some potential applications would be monitoring turnstiles for employee IDs for access to restricted areas, wearing safety equipment in a warehouse, authorized vehicles in a lot, speed of vehicles/forklifts, listening for unsafe sound levels in an area? all based on visuals or sound. What models did you have in mind demonstrating sound monitoring? Great Job !!

@husamburhan6 26 күн бұрын

Realy great!

@SaivenkatBalabadruni-ym6jb Ай бұрын

this is so awesome ................. wow!!

@KiteTurbine Ай бұрын

Very cool. Don't be shy, tell us the rough costs of training and downloading. I reckon we might be able to use it to recognise when a Kite is flying the way we want

@myceliumtechno 29 күн бұрын

amazing!

@hqcart1 Ай бұрын

can i use your website to train a smaller model for a specific LLM task or is it going to be large no matter what? for example, i am doing classifications for companies , i.e. finance, technology, oil and gas... etc, i usually send the LLM the company profile, and it classifies the company into which category it belongs to. thank you..

@MunzirSuliman 28 күн бұрын

that's awesome!

@flaviojmsouza Ай бұрын

I already use your platform! Great idea to explore! Thanks for sharing

@user-yr2nf9cr4v Ай бұрын

True use case of convergence of Generative AI and Embedded Machine Learning. Very Nice...

@Aristocle Ай бұрын

8:50 you could improve the result with : 1. a third 'background' category, 2. or filter it through an anomaly detection when there are no indicator objects (perhaps putting a trash hold on the soft-label values) 3. or edit the dataset so that it can see more empty parts where the label is no.

@allenytw Ай бұрын

maybe someday, an edge device can get an upgrade from cloud LLMs if there's error happens, by uploading the new error video for cloud LLMs to learn & transfer back to the edge. sounds interesting in IoT with giant LLMs network

@thesimplicitylifestyle 25 күн бұрын

Yes, it is possible to use a multimodal AI assistant that can both see your computer screen and respond to voice commands. This type of technology is often referred to as "vision-enabled voice assistants" or "visual voice assistants." Some popular examples of such assistants include Amazon Alexa with the Amazon Lookout for Gadgets service, Google Assistant with its Vision API integration, and Microsoft Cortana with the Windows Eye Control feature (which requires specialized eye-tracking hardware). These assistants can perform various tasks such as identifying objects on your screen, providing visual feedback based on voice commands, or even controlling your mouse and keyboard using only your voice. However, keep in mind that these features may require additional setup and configuration, including enabling accessibility settings, installing necessary software or services, and granting permissions to the AI assistant to access your computer's camera and microphone. Additionally, privacy concerns should be taken into consideration when using vision-enabled voice assistants, as they involve sharing more personal information than traditional text-based voice assistants.😎🤖

@VorpalForceField 26 күн бұрын

Impressive ..!!!

@DaTruAndi Ай бұрын

You trained a classifier for the combination of your house, your type and style of toys, your camera, and your lighting conditions. If someone would run that in a very different house (eg one with more patterns and colors) and different toys it would surprise me if you would get great results at that model size.

@delightfulThoughs 28 күн бұрын

As I understand that is the purpose of it, to be as specialized as possible. But keep in mind that what he did there was just an example. Obviously you should use way more case scenarios and pictures. The beauty of the system is just really the way the LLM can label the the training data. This is not supposed for deployment so anyone can use it, but for personal use, or internal use in a factory or office. One thing that could be use for would be gun detection in an image, maybe someone getting hurt or in hurms way.

@alexcook4851 23 күн бұрын

Really interesting ; wondering if it could be used for uav navigation (vision based) on commodity hardware

@petersobolewski1354 2 күн бұрын

So you basically used the GPT-4o for labelling the training dataset (that you created manually), that you then used for training your small model. 😊

@tirthb 28 күн бұрын

Wow, this is mind blowing. Great job!

@paulm1237 Ай бұрын

This is a great example of how AGI will almost immediately lead to ASI.

@maloukemallouke9735 Ай бұрын

greate idea if we do the same with small robots to help the bigest one that can change the tuff jobs to be more easy. Thanks for share.

@DoBaMan77 Ай бұрын

This is really nice. Is there an open source version available on github?

@demej00 11 күн бұрын

So is there a way to connect the micro-LLM to sensors so that for example I could video a self balancing robot , train the microprocessor and then the microprocessor would control the motorspeed to balance the robot?

@vaughan6562 28 күн бұрын

Wow. As they say, what a time to be alive.

@Luftbubblan 18 күн бұрын

Awesome

@atlas3650 25 күн бұрын

@edgeimpulse how feasible is it, using your tool, to create a model sophisticated enough to track facial expressions with a raspberry pi equipped with a micro camera ? Would this likely only work with a specific face it was trained with? Thanks for your thoughts

@Sam-ev1oi Ай бұрын

Is the way the GPT interprets an image is by using a CLIP model to generate a caption that describes the image, and then GPT just takes and uses this text?

@fahadxxdbl Күн бұрын

I really love this concept and def gonna try it but how's the license? like if you wanna train a commercial model then will GPT4o license would allow this? cuz as far as i know its not so it only works for personal stuff, please lmk if otherwise

@GoWithAndy-cp8tz 24 күн бұрын

Hi! I listened to you and understood what you said, but I still have no idea how to start with my Raspberry Pi and reproduce what you did there. I would like to do the same thing, but I don't know how. It would be amazing if I could follow your step-by-step instructions. Cheers!

@Philip8888888 7 күн бұрын

It would be nice if the webapp could take a live video stream from the camera for the inferencing.

@blank-404 Ай бұрын

Well, the demo is cool, but how well would this work on a new set of images/video. If it's only fitted for this particular video or for images very similar to the frames in the video, I do not see how it could be useful.

@zholud Ай бұрын

It will not work. It was trained on a degenerate dataset and can only interpolate this particular toy on this particular bed in that particular position.

@JeremyJanzen Ай бұрын

The question is, if the dataset was 10x or 100x larger with more varied data, could it distill a tiny model with more generalized inference? And if that is the case, how expensive will this be to train using GPT-4O?

@janjongboom7561 Ай бұрын

It will retain some generalised behaviour because we don’t train a model from scratch, but rather use a pretrained frozen backbone (here MobileNet). So rather than just map pixels to “toy or no toy” we force the model to learn that behaviour based on general embeddings trained on a huge dataset - overfits a lot less, and generalises better. However, yes, this model is not going to be great at detecting toys at completely random places - but it often does not matter. Most customers actually have constrained problems (just need to find out when X happens at site Y) and thus constrained models just specialized at detecting this at that specific site are fine. Constrained models for constrained problems. Can always expand this with more training data of course, we’ve had one customer use 5TB of raw data to eventually train a 80kB small model and beat existing state of the art models by 10 percent point (after heaps of signal processing and clinically validated labeling, not with LLMs).

@jackflash6377 Ай бұрын

@@janjongboom7561 Exactly what I need. A model that can look at a sound signature and give a yes or no if it fits a certain set of patterns. Will it run on a Jetson?

@cokomairena Ай бұрын

it's a great tool to evaluate single states of things on single locations... like is there people on this room, is the water boiling, are the dogs hungry/over the table... whatever single thing you want to check programaticaly

@quercus3290 27 күн бұрын

how does this compare to automated image tagging with gpt-4o?

@martinwinther9006 26 күн бұрын

im woundering if this is missleading, as all the toys looked basically the same, strong prime colors. would it have worked with the teddy bear and what would happen if you pointed the camera at your company logo on the wall..

@AngeloXification 13 күн бұрын

Is it possible to have a collection of small models on device with a "director" that choose the model based on the input?

@garethsmith7628 Ай бұрын

that opening safety concept won't occur as the standards and regulations are today - there is a thing called Functional Safety that applies, parent standard is IEC61508 and any time a programmable machine might harm a human it should be applied, and it is not quick to apply. Legislation would need to change to do anything else.

@Wisdomofthings 22 күн бұрын

عالی بود 👌🏼👌🏼

@hamburgengineer 23 күн бұрын

As the poet said, this is f'n awesome :)

@AlexandreCassagne 29 күн бұрын

What algorithm are you using for displaying the embeddings clustering?

@janjongboom7561 29 күн бұрын

Here using MobileNetV2 pretrained model, take the embeddings at 3 layers before the last layer; then tsne over the embeddings docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition/data-explorer

@AlexandreCassagne 27 күн бұрын

Thanks!

@Mr.Andrew. 29 күн бұрын

And that's how you spend millions of dollars to make a live "Is it a hotdog?" app. :)

@jeffg4686 27 күн бұрын

NICE! I had a different, but somewhat similar idea recently. Filtered LLMs - filtered to relevant topics for a specific purpose. The reason would be for memory, oc - can still quantize if needed. So like, if this LLM is geared for a law firm that does personal injury law, for instance, there's ALL sorts of stuff that can be whacked from the model. we want medical - related to injuries We want auto and traffic. We want insurance. We want similar cases. We don't want astronomy, non-related biology (like insects and such), chemistry, music, dance, surfing, etc, etc, etc. Take the LLM and have something that whacks it down, by stripping and rebuilding. But then again, this is a bad example, because law firms would pay for backend service... (don't care about local llm) - and they don't need the speed like IOT/edge devices

@Heisenberg2097 Ай бұрын

Nice video. But since I work in the medical field the data needed to create maybe close to good LLMs is hard to get. But I'm in contact with certain authorities... can you build me an LLM that automatically convinces Squareheads and makes them cooperate with the click of a button? Now that would be great. THX³ in advance.

@ljuberzy 26 күн бұрын

like local limited voice recognition on some smaller microcontrollers that would understand other languages besides english?

@JuanFlores-il4yv 25 күн бұрын

This video is an ad. Why did I get it in my feed not tagged as an advertisement?

@VadimChes 7 сағат бұрын

ok. But what will it answer when it sees toy unseen before? I think, CharGPT will say YES, but your model will say unexpected answers )

@TheThetruthmaster1 25 күн бұрын

So will it be equivalent to gpt2 ...3

@JazevoAudiosurf 29 күн бұрын

could gpt-4o actually repeat your voice in the same exact way?

@stock99 Ай бұрын

ya. we need more of this. llm is too fat. We need slim version of llm which just enought to do a few selected type of tasks.

@johnbollenbacher6715 29 күн бұрын

It seems that one could use this process to train a plant pathology AI. This AI could then run on a device that is in the field somewhere looking at agricultural environments.

@clumsy_en Ай бұрын

In all fairness have you ever read OpenAI terms of service regarding use of their content or outputs to train other AI models? 😅

@ilanlee3025 Ай бұрын

Do they forbid it?

@AG-et6jp 28 күн бұрын

Is Chatgpt 4o LLM or Generative AI?

@technologic5031 7 күн бұрын

wow, so we can detect bad product directly on Convair belt

@richardurwin Ай бұрын

I would say it's not in the model, it's not in the deployment, it's in the method of construction! Is it possible to iterate that to deployment????

@LukePighetti Ай бұрын

Hey! I'm a staff software engineer (mobile) who wants to get into building these AI pipelines. Any idea how long it would take someone like me to reskill? And what's the best way for me to skill up quickly?

@mindurownbussines Ай бұрын

Well you could get a much optimized and better model just using a cnn and it works in every single house

@AK-ox3mv 22 күн бұрын

Sam Altman left the chat

@andririan6342 Ай бұрын

Hot dog or not hot dog

@mcombatti 17 күн бұрын

It's against openAI policy to use gpt to train other models 😅

@LokeKS Ай бұрын

minicpm can run on mobile head an apk installer

@hassanimani5290 Ай бұрын

Nice but its just knowledge distillation!

@RoadPirateFilms 28 күн бұрын

“just” 😂

@MrPyro91 20 күн бұрын

incoming openai lawsuit

@delightfulThoughs 28 күн бұрын

It could be used to train a model to detect any sexual activities in your house, or where people are supposed to be doing work 😉 , like you get a message immediately.

@a_sobah 24 күн бұрын

I don't think they understand anyway

@PirateOnYoutube 17 күн бұрын

But the data base is way too small, if you show any color object to it , like orange it will be toy ( color x pixel ) . So I would say , it's click bait ;)

@kingcheese3795 7 күн бұрын

External storage is simply added as required

@sephirothcloud3953 Ай бұрын

You can't use openai model to train other model, they can sue you

@lobo5727 28 күн бұрын

😅😅

@theendarkenedilluminatus4342 Ай бұрын

4:00 As a parent, what are your thoughts on this level of free access? Doesn't it seem to be openly invasive? Where is this visual data processed, how securely is it stored? Is it stored? If so, how are users being compensated and provided the ability to utilize the platform without exposing their lives to the highest bidder? What are the limits of a universal stake company's fiscal responsibility?

@solidsnake2428 Ай бұрын

Use ChatGPT to train a smaller model answering yes or no to the question, “Is there an adult close to the camera holding a baby incorrectly?” Set it up in your kids’ room. How could anyone be against keeping kids safe?

@atomictraveller 9 күн бұрын

this is kind of a dumb post. i'm kind of the world's old time secret leader in procedural generation, as in through the 90s, 00s and a bit more, my procedural poetry and music generators were much more encompassing than any visible work. so i'm kind of like the world's most experienced procedural media experiencer. my issue is that my work and related factors drew intense abuse from some of the less visible factors in contemporary culture over a sustained period of time, so "trust issues". i'd have to train my own AGI amusements. this is keen to see. what i'd like to say is this is keen to see, technology applied in some kind of useful way for real lives. not what i expect from society whatsoever.

@AK-ox3mv 22 күн бұрын

small language models retire large language models

@solidsnake2428 Ай бұрын

So weird that we create people who are so smart in terms of computers and so dumb in terms of ethics and morality.

@genkidama7385 19 күн бұрын

isnt that illegal

@ojay666 Ай бұрын

It is just using gpt4o to fastly label the data…nothing special. And you can not call this distillation (scientifically)..it is just using a small model to achieve the same function, and the small model is definitely overfitting.

@RoadPirateFilms 28 күн бұрын

Did you even watch the part of the video where he trained the smaller models to perform the same function? Distillation is a perfect word for it (scientifically) if you understand how LLMs work at a technical level

@ojay666 27 күн бұрын

@@RoadPirateFilms Please read at least one distillation related academic paper, bro. This video didn’t use the teacher model for getting soft labels, typically the loss function would combine the cross-entropy loss of the soft label with the task loss of the hard label so that the small model can better learn from the teacher model. You can only call this video “knowledge transfer or label generation”, and I think it’s just chasing GPT4o’s clout.

@babali1014 24 күн бұрын

He could have manually labelled the toy pictures in this particular demo and would not make a difference. The gpt4o value here is unattended labelling of large data. Suppose he had 10,000 hours of video. He could run the video through gpt4 to produce the labels. Then feed the labelled images to MobileNet.

@MCRuCr 15 күн бұрын

"cloud on the edge" is pure marketing bullshit. Cloud is another-mams computer

@ai-bokki Ай бұрын

15 mins of fluff distilled by chatgpt: Large language models (LLMs) like ChatGPT-4 can understand and respond to real-world scenarios using multimodal capabilities (text, images, audio), similar to human understanding. Deploying these powerful but large models on the edge (e.g., in a factory) is challenging due to their size, cost, and latency issues. A solution is to distill knowledge from large LLMs to train smaller, specialized models that can run locally, reducing latency and cost. Edge Impulse facilitates creating these smaller models for edge devices, demonstrated with an example project detecting children's toys in a home environment. Smaller models, once trained with LLM knowledge, can run efficiently on simple hardware like microcontrollers, making AI applications more accessible and practical for various use cases.

@sourcecaster 11 күн бұрын

Unfortunately they don't understand a thing. And they are very gullible to availability bias, even if something contradicts logic.

@maddscientist1050 9 күн бұрын

Chat gpt is dead now,

@vladimirg8384 27 күн бұрын

Folk, I'm sorry, this is going to be a toxic comment, but it looks like we'll soon die with this kind of technology. Progress accelerates with each new generation

@__hz__6983 26 күн бұрын

AI will help us live longer.

@MadeInJack Ай бұрын

While the idea is great, it never stops scaring me how much we don't care about resource and energy usage. Climate change still coming...

@phen-themoogle7651 Ай бұрын

Climate change won't be solved until you can terraform a planet with very 'sci-fi' technology...but maybe if we (/humans) are alive long enough to reach a technological singularity maybe we can witness something capable of manipulating the climate to be perfect always, and won't have to worry about resources in a hyperabundance after millions of humanoids are growing trees/farming (100x-1000x more than we ever could) and making the world a much better place. Smarter AI systems/species will do a far better job than we ever could, and that's why hundreds of billions of dollars is invested to reach AGI/ASI nowadays (i.e project stargate). And we won't have control over something billions of times smarter than the human race combined, so it doesn't matter if temporarily some greedy billionaires just want to become trillionaires which is why they are investing so much (unless they actually care about saving all of humanity and the planet?) . we will have systems smart enough to improve everything -- for all life on Earth. (unless they have other plans x_x)

@zorayanuthar9289 Ай бұрын

Climate is changing all the time. Climate change is an agenda to enslave humanity

@jasonn5196 Ай бұрын

Yeah efficiency is key, I don’t care about climate change cause it’s going to change with or without us, this earth does ice ages and all kinds of weird phenomena.

@jasonn5196 Ай бұрын

We would do more good by planting forests.

@MadeInJack Ай бұрын

@@jasonn5196 It's just too late in 2024 to still think the main cause of the current climate change crisis is not human activity induced by CO2 emissions, please do your proper research, we don't have time anymore for these old debates. Hence, adding millions of additional chips in consumer products sounds like a CO2 nightmare more than a human progress.

@marcusaurelius6607 6 күн бұрын

they do not _understand_ anything

@OMNI_INFINITY 14 күн бұрын

*Have tried having the model that chatgpt labeled the data for label data for another model and then have that label data for another model and so on until....?*