Deep-dive into the AI Hardware of ChatGPT

Рет қаралды 312,578

Күн бұрын

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnor...
Or use code highyieldnordpass at checkout.
Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass: www.nordpass.com/highyieldbus... with code highyieldbusiness in the form.
....
What hardware was used to train ChatGPT and what does it take to keep it running? In this video we will take a look at the AI hardware behind ChatGPT and figure out how Microsoft & OpenAI use machine learning and Nvidia GPUs to create advanced neural networks.
Support me on Patreon: www.patreon.com/user?u=46978634
Follow me on Twitter: / highyieldyt
Links
The Google research paper that changed everything: arxiv.org/abs/1706.03762
The OpenAI research paper confirming Nvidia V100 GPUs: arxiv.org/abs/2005.14165
0:00 Intro
0:28 AI Training & Inference
2:22 Microsoft & OpenAI Supercomputer
4:08 NordPass
5:57 Nvidia Volta & GPT-3
9:05 Nvidia Ampere & ChatGPT
13:23 GPT-3 & ChatGPT Training Hardware
14:41 Cost of running ChatGPT / Inference Hardware
16:06 Nvidia Hopper / Next-gen AI Hardware
17:58 How Hardware dictates the Future of AI

Пікірлер: 402

@HighYield Жыл бұрын

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnordpass Or, use code highyieldnordpass at the checkout. Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass at www.nordpass.com/highyieldbusiness with code highyieldbusiness in the form.

@fitybux4664 Жыл бұрын

Are we all speaking to the same ChatGPT? What if OpenAI trained a sub-model with fewer parameters if users don't ask complicated questions or only for a certain subject? Then they could maybe run inference cheaper by using a smaller model. Maybe this could be detected after the first question or two that ChatGPT is asked? If I had a bill of a million dollars a day, these sorts of optimizations would definitely make sense!

@GermanMythbuster Жыл бұрын

Who the F*** listens to 2 Min. of Ads?! Skipped that crap. Any more than 20 sec. and nobody cares any more. Don't know if Nord had this stupid idea or you to make the ad so long but who ever it was it is a f***ing stupid idea!

@akselrasmussen3386 Жыл бұрын

I prefer Avira PWM

@memerified Жыл бұрын

@wiredmind Жыл бұрын

I think in a few years, AI accelerator cards will be the next video cards. A race to the top for the most powerful accelerator to be able to train and run AI locally on our own PCs, bypassing the need to pay for filtered models from large companies. Once people can run this kind of thing independently, that's when things will start getting _really_ exciting.

@mishanya1162 Жыл бұрын

Its not about running AI locally. Todays gpus (40 series) mostly gain fps from AI upscaling. Frame generation and so. So, who makes better AI and soft - will win

@mrmaniac9905 Жыл бұрын

training a model is a very involved process.. I could see consumer cards that can run pre-trained sets, but the average consumer will not be training on their own

@Amipotsophspond Жыл бұрын

a break threw is coming out that will enable black boxes to be in part of the networks, this will enable to build networks in small parts that go in big networks rather then training it all at once and getting rid of it for the next unrelated network. it will not be a service forever that's just to trick WEF/CCPchina money in to giving them start up money.

@ralphclark Жыл бұрын

This will require someone to create an open source ontology that you can use to preload your AI as a starting point. Training one completely from bare metal with only your own input will be beyond everyone except large corporations with deep pockets.

@Tomas81623 Жыл бұрын

I don't think people in their home will really have a need to run large models as they would still be very expensive and other than privacy, little advantage. On the other hand, I can definitely see business having a need for them, multiple.

@matthewhayes7671 9 ай бұрын

I'm a newer subscriber, working my way back through your recent videos. I just want to tell you that I think this is the best tech channel on KZfaq right now, hands down. You are a wonderful teacher, you take frequent breaks to summarize key points, you provide ample context and visual aids, and when you do make personal guesses or offer your own opinions, it's always done in a transparent and logical manner. Thank you so much, and keep up the amazing work. I'll be here for every upload going forward.

@garyb7193 Жыл бұрын

Great video! Hopefully it will put things into perspective, that Nvidia, Intel, and AMD's world does not revolve around graphic card sales and squeezing the most performance out of Cyberpunk. Hundreds of millions of dollars are a stake in areas much more lucrative than $500 CPUs or $800 videocards. They must meet the demands of all their various customers as well as investors and stockholders too. Thanks!

@marka5968 Жыл бұрын

This is the single whale > millions of peons attitude that is in video game now. Video games are not designed for fun but for grind and feeding the whales. Apparently video cards are going to be designed that way as well. I don't know how tens of millions of customers are lower priority than some big whale for a tech that hasn't made a red cent yet and costs millions per day to run. Certainly data center makes nVidia more money than gamers buying cards but I don't know that all works out. nVidia is no. 8 on most valuable companies in the world and I guess selling GPUs once every 3-4 years to games isn't going to make that much revenue to be no.8 in the most valuable company in the world. These numbers don't seem to make sense in my mind.

@garyb7193 Жыл бұрын

@@marka5968 Okay?!

@jimatperfromix2759 Жыл бұрын

In its last quarter results, although AMD is improving market share in its consumer divisions (CPUs and gamer GPUs), it took a slight loss on consumer products. Partly that's the recession coming in for an ugly landing. Good thing for consumers, though, is that AMD is using its massive profits in servers and AI (plus some new profits in the embedded area via its recent purchase of Xilinx) to "support its addiction to making good hardware for the computer/gamer retail market. By the way, one of its next-gen laptop APU models not only contains an integrated GPU that rivals the low-end of the discrete GPU market, but also contains a built-in AI engine (thanks to the Xilinx people). So you can get its highest-end laptop CPU/APU chip (meant for the big boys like Lenovo/HP/Dell/Asus/Acer et al. to integrate into a gemer laptop along with a discrete GPU from AMD or NVidia (or even Intel)), or its 2nd-from-the-top series of laptop CPU/APU chip (the one described above that already has a pretty darn good integrated GPU plus an AI engine (think: compete with Apple M1/M2)), or one of a number of slower series of CPU/APU (that are meant for more economy laptops, and mostly just faster versions of older chips that have been redone on faster silicon to fill that market segment at a cheaper cost). Think of the top-two tiers of laptops built on the new AMD laptop chips as each being about 1/10,000th of the machine they trained ChatGPT on - sort-of. By the way, did I mention you can do AI and Machine Learning on your laptop, starting about *next month*.

@snailone6358 Жыл бұрын

800$ video cards. We’re past that point for a while now

@emkey6823 Жыл бұрын

...and we were told btc was consuming too much of that co2 power, and because of it the cards went so expensive. I think the bad people in cHARGE did make those models using AI for their profits which worked out pretty well for them. Let's keep our eye on that and give the AI they created some good vibes individually

@newmonengineering Жыл бұрын

I have been an OpenAI beta member for 3 years now. It has only become better over the years. I wonder what it will look like in 5 years.

@kaystephan2610 Жыл бұрын

I find the increase in cumpute performance incredible. As shown at 7:38 the GV100 from the beginning of 2018 had 125 TFLOPS of FP16 Tensor core compute performance. The current generation of Enterprise AI accelerators from NVIDIA are the NVIDIA H100. And the H100 provides up to 1979 TFLOPS of FP16 Tensor core compute Performance. And the Fp32 and FP64 Tensor Core performance has also obviously increased massively. Within ~5 years the raw compute performance of Tensor Cores has increased by around 16x. What previouslym required 10,000 GPUs could now be done with ~632.

@guyharris8007 Жыл бұрын

Dev here... gotta say I love it. Thoroughly enjoyable thank you for your time!

@klaudialustig3259 Жыл бұрын

Great video! I'd like to add one thing: in the segment starting at 16:06 where you talk about the Nvidia Hopper H100, in the context of Neural Networks the most important number to compare to the previous A100 should be the memory. As far as I know, as long as there is *some* kind of matrix multiplication acceleration, it doesn't matter much how fast it is. Memory bandwidth becomes the major bottleneck again. I looked it up and found the number of 3TB/s, which would be 50% higher than the A100 80GB-version. I wonder where the number of 4.9TB/s shown in the video at 18:50 comes from. It seems unrealistically high to me. Nvidia's marketing does not like to admit this. They like to instead compare other numbers, where they can claim some 10x or 20x or 30x improvement.

@klaudialustig3259 Жыл бұрын

They call that 4.9TB/s "total external bandwidth" and I think they get it by adding the 3TB/s HMB3 memory bandwidth, plus 0.9TB/s NVLink bandwidth, plus something else? Also I have seen Nvidia claim that H100 has 2x higher memory bandwidth than A100. Note that this only when comparing it to the A100 40GB-version, not the 80BG-version.

@JohnDlugosz Жыл бұрын

I recall the Intel roadmap showing the next big thing is being able to put or use memory resources in different places. The PCIe will be so fast that you don't have to put all the RAM on the accelerator card. You'll be able to use system RAM or special RAM cards, and thus easily expand the RAM as needed.

@Fractal_32 Жыл бұрын

I just saw your post on the community page, I wish KZfaq would have notified me when the video was posted instead of pushing the post without the video. I cannot wait to see what is talked about in the video! Edit: This was great, I’m definitely sharing it with some friends, keep up the great work!

@marka5968 Жыл бұрын

Great and very informative video, sir. I remember watching your very early videos and thought it was bit meh. But, this is absolutely world class stuff and happy to listen to such a sharp and insightful mind.

@HighYield Жыл бұрын

Thanks, comments like this really mean a lot to me!

@SteveAbrahall Жыл бұрын

Thanks for the tech background on what it's running from a hardware angle. The interesting thing is I think when some one comes up with a hunk of code that saves billions of hours of computational power. That disruptive type of thing from a software angle. It is an amazing time to live. Thanks for all your hard work, and an interesting vid!

@KiraSlith Жыл бұрын

Been beating my head against the Cost vs Scale issue of building my own AI compute rig for training models at home, and this gave me a better idea of what kind of hardware I'll need long-term by looking at what the current bleeding edge looks like. Thanks for doing all the research work on this one!

@Transcend_Naija 8 ай бұрын

Hello, how did it go?

@KiraSlith 8 ай бұрын

@@Transcend_Naija I ended up just starting cheapish for my personal rig, a T7820 with a pair of 2080tis on an NVlink bridge. I couldn't argue the V100s just yet, and the P40s I was using at work lacked the grunt for particularly large LLM (works fine for large-scale oject recognition though).

@og_jakey Жыл бұрын

Fantastic presentation. Appreciate your pragmatic and reasonable research, impressive work. Thank you!

@AgentSmith911 Жыл бұрын

10:14 is so funny because "what hardware are you running on?" is one of the first questions I asked that bot 😀

@prodromosregalides3402 Жыл бұрын

15:19 That's not a problem at all. Even if only 10 millions out of 100 are using OpenAI servers at every single time, that's 29000/10000000 gpus, or 0.0029 gpus per user. Probably less. So instead of them running the model , the end-users could , easily on their machines. Bloody hell , even small communities of few thousands of people could train their own AIs on their machines, soon to be a much smaller number. Few major problems with that. They lose much control on their product. They haven't figured out , yet , the details of monetizing these models, so they are restricted to running them in their own servers instead. Third major problem for Nvidia , it will be forced to return to the gaming gpus , their rightful capabilities , from which they were stripped back in 2008. This would mean no lucrative sales of the same hardware (with some tweaks ) to corporations and instead rely on massive sales of cheaper units. And last but not least , an end-user, gamer or not, will be able to acquire much more compute power with their 1000-3000 dollar purchases. Because , now a pc may be sporting the same cpus and gpus, but difference is gpus will be unlocked to their full computing potential . We are talking about many tens to hundreds of teraflops performance available now for the end-user to do useful work. Anf how will the mobile sector compare to this? Due to the fact that it runs on lower power budgets there is no way it could compete with fully-fledged pcs. Many will start forgetting buying a new smartphone especially the flagship-ones ; in fact the very thought of spending to buy sth that is an order of magnitude less compute-capable would be hugely annoying. Now, that I am thinking of it , losing control , worries them much more than anything else. And it would not only be control lost on a corporate level , but on a much much higher level. Right now , top heads at computer companies and possibly state planners must have shit their pants, because of what is seeminlgly an unavoidable surrendering of power from power centers to the citizens. To paraphrase Putin "Whoever becomes the leader in this sphere will not only forget about ruling this world, but lose the power he/she already has" All top leaders got this all thing wrong. This is not necessarily a bad thing.

@frizzel4 Жыл бұрын

Congrats on the sponsorship!! Been watching since you had 1k subs

@Embassy_of_Jupiter Жыл бұрын

Kind of mind blowing that we can already run something very similar on a MacBook. The progress in AI is insane and it hasn't even started to self-improve, it's just humans that are that fast.

@kanubeenderman Жыл бұрын

MS will for sure use its Azure based cloud system for hosting its ChatGPT, so that they can load balance the demand, and be able to scale out to more VM's and instances if needed to meet demand, and to increase resources on any individual instance if needed. That would be the best use of that set up and provide the best user experience. So basically, the hardware specifics would be whatever servers are running in the 'farms'. I doubt if they will have separate and specific hardware set aside just for ChatGPT as it would run like any other service out there.

@Noobificado Жыл бұрын

Some time around 1994, the era of search engines started. And now, the era of free access to general purpose Artifical intelligence, is becoming a reality in front of our eyes. What a time to be alive.

@zerodefcts Жыл бұрын

I remember when I was growing up, I thought to myself...geez...it would have been great to live in the past, as there were so many undiscovered things that I could have figured out. Grown up, I have been working in AI for the past 7 years, and looking at this very point in time I can't help but think reflect on that moment...geez...there is just so much opportunity for new discovery.

@HighYield Жыл бұрын

I'm honestly excited to see what's coming next. If we use it to improve our lives, it will be amazing.

@vmooreg Жыл бұрын

Thank you for this! I’ve been looking around for this content. Great work!!👍🏼

@novadea1643 Жыл бұрын

Logically the inference costs should scale pretty linearly to the amount of users since it's pretty much a fixed amount of computation and data transfer, or can you elaborate why the requirements would scale exponentially as you state at @15:40?

@HighYield Жыл бұрын

The most commented question :D I meant if the amount of users increases exponentially, so does the amount of inference computation. I now realize it want very clear. Plus, scaling is a big problem for AI, but thats not what I meant.

@EmaManfred Жыл бұрын

Good job here sir! Mind if you did a quick breakdown of language model like Bluewillow that also utilizes diffusion?

@backToFreedom Жыл бұрын

Thank you very much for bringing this kind of information. Even chatgpt is unware about the hardware is running on!

@hendrikw4104 Жыл бұрын

There are interesting approaches like LLaMA, which focus on inference efficiency over training efficiency. These could also help to bring down inference costs to a reasonable level.

@theminer49erz Жыл бұрын

Fantastic!! First of all, I cpuld be wrong, but I don't remember you having an in video sponsor before. Either way, that I awesome!! I'm glad you are getting the recognition you deserve!! You must have done a lot of work to get these numbers and configurations. Very interesting stuff! I am looking forward to AI splitting off from GPUs too. Especially with the demand for them going up as investment in AI grows. I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them. Plus I'm sure they will do a much better job. The design, at least in regards to Nvidia because of it is quite annoying too. Tensor cores for example were mainly put there for AI and Mining use, the marketing of them for upscaling and the cost added for a gamer to use it is kinda ridiculous. If you have a lower end card with them where you wpuld benifiet from the upscaling, you could probably buy a card without them that wouldnt need to upscale. It seems to me that their existence is almost the cause for their need in that use case. I don't know how much of the cost of the card is just for them, but I imagine it's probably around 20-30% maybe?? IDK, just thinking "aloud". Anyway, thanks again for the hard work and please let us know when you get a Patreon account!! I would be proud to sponsor you as well!! Cheers!!

@brodriguez11000 Жыл бұрын

" I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them." Blame cryptocurrency for that. Otherwise those non-gaming sectors are what's keeping the lights on and driving the R&D that gamers enjoy the fruits of.

@memejeff Жыл бұрын

I asked gpt3 half a year ago what it was running on. I kept asking more leading questions. I was able to get to a point where it said that it used specific mid range FPGA accelerators that retail between 4000-6000 dollars. The high end fpga are connected by pcie and the lower end use high speed uart. The servers used a lot of K series gpu's too.

@jordanrodrigues1279 Жыл бұрын

The specs aren't in the training dataset, there's no way for it to have that information; it's like asking it to give you my passwords. Or in other words you just told yourself what you wanted to hear with extra steps.

@josephalley Жыл бұрын

Great to see this video well. I loved your m2 chip video breakdowns ages ago

@senju2024 Жыл бұрын

Very very good video. SUBSCRIBE. Reason. You did not talk about hype. You explain tech concepts based on AI. I knew about Nvidia hardware running chatGPT but not the details. Thank you.

@alexcrisara4902 Жыл бұрын

Great video! Curious what you use to generate graphics / screenshot animations for your videos?

@tee6942 Жыл бұрын

What a valuable information👌🏻 thank you for sharing, and keep up the good work

@Speak_Out_and_Remove_All_Doubt Жыл бұрын

A super interesting video, really well explained too, thanks for all your hard work. It's always impressive how Nvidia seems to always be playing the long game with it's hardware development and as you mention I can't wait to see what Jim Keller comes up with at Tenstorrent because I can't think of a job he's had where he hasn't changed the face of computing with what he helps develop. I just wish Intel had backed him more and done whatever was needed to keep him for a little longer and we would maybe be in a very different Intel landscape right now.

@AjitMD Жыл бұрын

Jim Keller does not stay at a company for very long. Once he creates a new product, he moves on. Hopefully he gets paid well for all his contributions.

@Speak_Out_and_Remove_All_Doubt Жыл бұрын

@@AjitMD I think more accurately, what he does is stay until he's achieved what he set out to achieve and then wants a fresh challenge, he didn't get to do that at Intel. He was essentially forced out or put into a position that he was not comfortable with so chose to leave but he still had huge amounts of unfinished work left to do at Intel plus becoming the CEO or at least head of the CPU division would have been that fresh new challenge for him.

@StrumVogel Жыл бұрын

We have 8 of those at the Apple data center I worked at. NVidia cheaped out on the CMOS bracket, and it’ll always crack. You’ll have to warranty work the whole board to fix it.

@sa1t938 Жыл бұрын

Something important to note is that OpenAI works with Microsoft and likely doesn't pay all that much for the GPUs. Microsoft OWNS the hardware, so their cost per day is just electricity, employees, and rent. The servers were already paid in full when the datacenter was built. Its up to Microsoft how much they want to charge OpenAI (who they are working with and just supplied Microsoft with Bing Chat, which made Bing really popular), so I'm guessing Microsoft gives them a huge discount or for free

@knurlgnar24 Жыл бұрын

If you own a shirt is the shirt free because you already own it? If you own a car and let your friend drive it is it free because you already owned a car? This hardware is extremely expensive, it depreciates, requires maintenance, floor space, etc. Economics 101. Ain't nothin' free.

@sa1t938 Жыл бұрын

@@knurlgnar24 did you read my comment? I literally mentioned all of those costs, and I said they are the only thing Microsoft is ACTUALLY paying. Microsoft chooses how much they want to charge, so they could charge a business partner like openAI almost nothing, or just foot the bill of maintenance costs instead. If they did either of those they already would have made their money back by how popular bing is because of bing chat.

@sa1t938 Жыл бұрын

@@knurlgnar24 And I guess to your analogy, is a shirt free if you already own it? And the answer is, yes. You can use the shirt for no cost, minus the maintenance of washing it. You can also give that shirt to a friend for free. The shirt wasn't free, but you paid for it up front and now it's free every time you use it.

@lolilollolilol7773 Жыл бұрын

AI progress is far more software bound than hardware bound. The deep learning algorithms are incapable of making logical reasoning and thus knowing if a proposition is true or not. That's the real breakthrough that needs to be done. Once deep learning gains this capability, we will really be confronted to superintelligence, with all the massive consequences that we are not really ready to face.

@BGTech1 Жыл бұрын

Great video I was wondering about this

@genstian 7 ай бұрын

We do run into lots of problems where general AI models isn't good, the future is to make a new submodel that can specifiacally solve specific tasks or just add weights to general models, but such a version of chatgpt would probably require 1000xbetter hardware.

@Alexander_l322 Жыл бұрын

I literally saw this on South Park yesterday and now it’s recommended to me on KZfaq

@theminer49erz Жыл бұрын

Yay!! Happy day! Been looking forward to this! Thanks!

@hxt21 Жыл бұрын

I want to say thank you very much for a really good video with good information.

@virajsheth8417 Жыл бұрын

Really insightful video. Really appreciate.

@elonwong Жыл бұрын

From what I understood from chatgpt, it’s a strip down version of gpt3. Where its hardware requirement and model size are massively cut down. It’s a lot lighter running the model compared to gpt3. Itself even said chatgpt can even be ran on a high end pc.

@HighYield Жыл бұрын

That’s what I gathered to. IMHO it’s also using a lot less parameters, but since there is nothing official I’m rather careful with my claims.

@1marcelfilms Жыл бұрын

The box asks for more ram The box asks for another gpu The box asks for internet access

@HighYield Жыл бұрын

What's in the box? WHAT'S IN THE BOX???

@garydunken7934 Жыл бұрын

Nice one. Well presented.

@OEFarredondo Жыл бұрын

Mad love bro. Thanks for the vid

@omer8519 Жыл бұрын

Anyone got chills when they heard the name megatron?

@alb.1911 Жыл бұрын

Do you have any idea why they are back to Intel CPU for the NVIDIA DGX H100 hardware?

@legion1791 Жыл бұрын

Cool I was exactly wanting to know that!

@petevenuti7355 Жыл бұрын

Out of curiosity and considering my current hardware, How many orders of magnitude slower, would a neural network of this magnitude, run on a simple CPU and virtual memory?

@DeadCatX2 6 ай бұрын

When I look at AI right now, I liken ChatGPT to the first transistor radio. It was the first time that the unwashed masses had access to the magic unlocked by science and engineering. With that said, imagine bring born in the era where you see the transistor radio with wonder as a child, and then how everything progresses with television, home computers, the internet, and mobile phones. That is what my father has seen in his lifetime, and that is the kind of exponential improvement I expect a 12 year old of today to see over the course of their life. And that's my conservative guess, because the growth of science and engineering is exponential itself.

@user-zk4xq3mn2q 6 ай бұрын

Really good video, and lots of efforts! Thanks man!

@shrapnel95 Жыл бұрын

I find this video funny in that I've asked ChatGPT about what kind of hardware it runs on; never got straightforward answer and it kept running me around loops lol

@HighYield Жыл бұрын

I've noticed exactly the same thing, that's where I got the idea for this video from!

@robertpearson8546 6 ай бұрын

The main difference between Expert Systems and Neural Networks is that Expert Systems use facts and logic to obtain an answer. Neural Networks during the "learning phase" adjust their weights to get the answers they are given. No facts or reasoning are involved. Therefore, no neural network can "prove" anything. When asked why a neural network gives an answer, you can only get a reply "?????? That Does Not Compute! ?????". One example was when they tried to use a neural network to diagnose skin cancer. It was usually wrong. However, there was a strong correlation between the fact that the image included a ruler and the diagnosis of skin cancer. The output of the learning phase is the "garbage in" for the "inference" phase. (GIGO).

@zyxwvutsrqponmlkh Жыл бұрын

For GAI we really need to be perpetually training during inference. You want this stuff to run cheaply open source it, folk have gotten Llama to run on an RPI.

@olafschermann1592 Жыл бұрын

Great research and presentation

@paulchatel2215 Жыл бұрын

You don't need that much power to train ChatGPT. You can't compare the full training of a LLM (GPT-3) with an instruct finetune (ChatGPT). Remember that Stanford trained Vacuna which has performances similar to ChatGPT 3 for only $300 , by instruct finetuning the LLM Llama. And other open source chatbots have been trained on single gpu setups. So it's unlikely that OpenAI needed a full datacenter to train ChatGPT, the data collection was the hardest part here. Maybe they did, but then the training would have lasted less than one second, it seems useless to use 4000+ GPUs.

@nannesoar Жыл бұрын

This is the type of video im thankful to be watching

@HighYield Жыл бұрын

I’m thankful you are watching :)

@vincentyang8393 Жыл бұрын

Great talk! Thanks.

@Tomjones12345 Жыл бұрын

i was wondering how far off running inference is on a local machine. Or could a more focused model (one language, specific subjects/sites) run on today's hardware?

@ZweiBein Жыл бұрын

Good and informative video, thanks a lot!

@TheGabe92 Жыл бұрын

Interesting conclusion, great video!

@HighYield Жыл бұрын

I’m usually quiet resistant to hype, but AI really has the potential to fundamentally change how we work. It’s gonna be an interesting ride for sure!

@zahir2942 Жыл бұрын

Was cool working on these servers

@TheFrenchPlayer Жыл бұрын

🤨

@LukaszStafiniak Жыл бұрын

Hardware requirements increase linearly for inference, not exponentially.

@MemesnShet Жыл бұрын

And to think now anyone can run a GPT 3.5 Turbo-like AI on their local computer without the need for crazy specs is just incredible Stanford Alpaca and GPT4All are some models that achieve it

@legion1791 Жыл бұрын

I would love to have a local and unlocked offline chatGPT

@endike Жыл бұрын

me too :)

@fletcher9328 3 ай бұрын

Great video!

@dorinxtg Жыл бұрын

Thanks for the video I was looking at the images you created with the GPU specs, and I'm not sure if your numbers are correct. Just for comparison, I checked the numbers in TechpowerUp GPU DB. So if we'll look at GH100, for example, you mention 1000 TFlops (FP16) and 50 TFlops (FP32) On TechPowerUP GPU DB, an H100 (I checked both the SXM5 and PCIe versions) the numbers are totally different: 267 TFlops (FP16) and ~67 TFlops (FP32).

@HighYield Жыл бұрын

TechPowerUp doesn’t show the Tensor core FLOPS. If you look up the H100 specs at Nvidia you can see the full performance.

@dorinxtg Жыл бұрын

@@HighYield I see. Ok, thanks ;)

@MultiNeurons Жыл бұрын

Yes it's very interesting, thankyou

@karlkastor Жыл бұрын

15:50 Now with GPT-3.5 Turbo they have decreased the cost 10 times, but likely not with new hardware, but with an improved, discretized and/or pruned model.

@HighYield Жыл бұрын

That sounds super interesting! Do you have any further links for me to read up on this?

@davocc2405 Жыл бұрын

I can see a rise in private clouds particularly within government at least on a national level. The utilisation of the system itself may give away sensitive information to other nations or even corporations that may have competing self interests so a few of these systems may pop up in the UK, Australia, several in the US and probably Canada to start with (presuming each European nation may have one or two as well). Whosoever develops a homogenised and consistent build for such a system will be suddenly in demand with competing urgency.

@markvietti Жыл бұрын

could you do a video on memory cooling..Seems most of the video card manufactures don't cool the memory. some do . why is that?

@lucasew Жыл бұрын

Fun fact: DGX A100 has a configuration with a quad socket EPYC 7742, 8 DGX A100 and 2TB of RAM Source: I know someone who work with one He said it works nice with Blender renders too, but the focus is tensor number crunching using PyTorch

@HighYield Жыл бұрын

All I could find is this dual socket config: images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-a100-80gb-datasheet.pdf But a quad one would be much nicer :D

@glenyoung1809 Жыл бұрын

I wonder how fast ChatGPT would have trained on a Cerebras CS-2 system with their Wafer scale 2 architecture?

@willemvdk4886 Жыл бұрын

Of course the hardware and infrastructure behind this application is interesting, but what I find even more interesting is how this is done in software. How are alle these GPU's clustered? How is the workload actually divided and balanced? How do they maximize performance during training? And how in the world is the same model used by thousands of GPU's to server the inferencing by many, many users simultaneously? That's mindboggling to me, actually.

@albayrakserkan Жыл бұрын

Great video, looking forward to AMD MI300.

@DSTechMedia Жыл бұрын

AMD made a smart move in acquiring Xillinix, and it mostly went unnoticed at the time. But it could pay off heavy in the long run.

@n8works Жыл бұрын

15:30 You say that the inference hardware must scale exponentially, but that must be hyperbole right? At the very most it's 1 to 1 and I'm sure there are creative ways to multiplex. The interesting thing to see would be transactions/sec for a single cluster instance.

@HighYield Жыл бұрын

I meant in relation to its users. If users increase exponentially, so do the hardware requirements. Since you are like comment no 5 about this I realize I should have said it differently. A point that could play into that question is as scaling, but that’s not what I was talking about.

@n8works Жыл бұрын

@@HighYield ahh. I understand what you were saying, yes. It scales with users in some way. The more users the more hardware in some degree

@FakhriESurya 7 ай бұрын

makes me giggle a bit when the "Most Replayed" part is the part where the ads ends

@HighYield 7 ай бұрын

That's normal, most ppl skip over the add or try to jump to a point right after it. On the chart it looks like a hole :D

@zwenkwiel816 Жыл бұрын

everyone always asks what is ChatGPT? but no one ever asks how is chatGPT? :(

@THE-X-Force Жыл бұрын

Excellent excellent video! (edit to ask: at 19:25 .. "In-Network Compute" of ... *_INFINITY_* ... ? Can anyone explain that to me, please?)

@auriplane Жыл бұрын

Love the new thumbnail!

@TMinusRecords Жыл бұрын

15:39 Exponentially? How? Why not linearly

@maniacos2801 Жыл бұрын

What we need are locally ran AI models. Optimisations will have to be made for this, but it is a huge flaw that this type of high-speed interactive knowledge is in the hands of a few multi-billion global players. And we all know by now, "Open"AI is anything but open. This is what scares me most about this whole development. In the early days of the internet, everyone could run a server at home or people could get together and run a dedicated hardware in some co-location. With AI this is impossible because no one can afford this hardware requirement. If Chat-AI is the new internet, we need public access to the technology otherwise only few will be in control of such a huge power to decide what information should be available and what should be filtered or even altered.

@jagadeesh_damarasingu Жыл бұрын

when thinking about huge capital costs involving in setting up AI hardware farm, Is it possible to take advantage of shared computing power of public peer network like we are doing now with blockchain nodes ?

@jagadeesh_damarasingu Жыл бұрын

also isn't Intel in race with NVIDIA and AMD?

@robinpage2730 Жыл бұрын

How powerful would a model be that could be trained on a gaming laptop rtx 1650 ti? How about a natural language compiler, that translates English input into executable machine code like GCC?

@PixelPhobiac Жыл бұрын

What a time to be alive

@SRWeaverPoetry Жыл бұрын

I run this stuff locally, but I dont run mine based on training data. The analogy I use, and AI can know how to ride a bike, but if he has no bike, he isnt riding anywhere. A bot needs both the brains and tools the body uses.

@gab882 Жыл бұрын

Would be both amazing and scary if AI neural networks run on quantum computers or other advanced computers in the future

@stefanbuscaylet Жыл бұрын

Does anyone have any references on how big the storage required for this was? Was it a zillion SSDs or was it all stored on HDDs?

@dougchampion8084 Жыл бұрын

The training data itself is pretty small in relation to the compute required to process it. Text is tiny.

@stefanbuscaylet Жыл бұрын

@@dougchampion8084 I feel like that is over simplifying things. When there are over 10K cores distributed over a large network and the training data is “all Wikipedia and tons of other data” there has to be quite a bit of disaggregated storage for that along with every node seems to have some local/fast NAND SSD storage. As far as I can tell they mostly use the CPUs to orchestrate and feed the data to the GPUs and the GPUs then feed the data back to the CPUs to be pushed to storage. Be nice if someone just mapped this all out along with capacity and bandwidth needs.

@drmonkeys852 Жыл бұрын

My friend is actually already training the smallest version of GPT on 2 A100s for his project in our ML course

@HighYield Жыл бұрын

That's really interesting, I wonder how much time the training takes. And having access to 2x A100 GPUs is also nice!

@drmonkeys852 Жыл бұрын

@@HighYield Yea it's from our uni. We still have to pay for time on it unfortunately but it's pretty cheap still. He estimates it'll cost around 30$, which is not bad

@gopro3365 Жыл бұрын

@@drmonkeys852 $30 for how many hours

@boronat1 Жыл бұрын

wondering if we could run a software that uses your gpu to give power to ai network? like we do with crypto mining

@johannes523 Жыл бұрын

Very interesting! I was wondering about Tom Scott's statement on the curve, and I think your take on it is very accurate 👍🏻

@HighYield Жыл бұрын

I really feel like the moment you don't look at what AI does, but how it's "created", you get a much clearer picture. Ofc I might be completely wrong in my assumption :p

@seraphin01 Жыл бұрын

Great video thank you Been trying to ask chatgpt about its hardware but obviously you don't get the answer haha Those who think we're already reaching the top of ai right now are grossely mistaken. Those Terra flops we're talking for the new architecture will sound so ridiculously crap in a few years time, just like a top end gpu in 2010 wouldn't even be good enough for a cheap smartphone nowadays. And with focus turning to AI only for hardwares now it's just gonna improve exponentially for a while Although like the guys at openAI stated, don't expect chat gpt4 to be skynet level of AI, the results might look like minor improvements at first glance, but the cost, reliability, speed a'd accuracy of those models will improve a lot before going to the next phase with is actual artificial INTELLIGENCE. By 2030 the world won't be the same as it is now, that's granted imo, and most people are not ready for it

@simonlyons5681 Жыл бұрын

I am interested to know the hardware requirements to run inference for a single user. What do you think?

@HighYield Жыл бұрын

I think its VRAM bound. So you might still need a full Nvidia DGX/HGX A100 server, but not because of the raw computing power, but because of the VRAM capacity. Maybe 4x A100 GPUs would work too, depending on how much smaller ChatGPT is compared to GPT-3. It's really hard to say since we don't have official numbers.

@miroslawkaras7710 Жыл бұрын

Does quantum coputer could be used for AI training?

@TheGrizz485 Жыл бұрын

a sponsor .... congrats

@HighYield Жыл бұрын

Kinda nervous about it, but I waited until I was able to get a sponsor that offers a actual useful product instead of cheap copies of Windows :p

@yogsothoth00 Жыл бұрын

Power efficiency comes in to play as well, will AMD be able to compete with slower hardware? Only if they can beat Nvidia in the overall value proposition.

@timokreuzer1820 Жыл бұрын

Question: if INT8 tensor cores are useful for AI. why doesn't everyone use it? Shouldn't an 8 INT8 mul-add operation require much less transistors than a 16 bit FP mul-add? I know FP multiplications are "cheap" compared to same-size integer, but addition adds quite some overhead.

@0xEmmy Жыл бұрын

int8 is fast with the right hardware, but this comes at the cost of precision. A 16-bit number can have up to 65536 different values, while 8 bits only gets 256 different values. Some applications need more precision than others, and while many AI applications can handle int8, we can't assume that everything can. If your AI outputs a 32-bit image, for instance, at least some of the AI needs to run in 32 bits (which, if it's even possible, is highly wasteful on 8-bit hardware). Further, there's a very big difference between float and int. A float handles really large and small numbers just fine, while int's get stuck rounding to 0 or overflowing. If you're multiplying two really large or really small numbers, this becomes a major problem fast. If you need more than 3 decimal places, you can't use int8. Even moving up to 16 bits, you only get 5 decimal places, and 32 bits only gets you 9 - to get more, you have to use floats (e.x. IEEE 754 standard 32-bit floats have 256 decimal places). Adds aren't quite as complex as multiplies, so I wouldn't worry about them too much. One also needs to actually have the right hardware. If your GPU is designed for 16 bits, but you can only use 8, you're not gonna double performance - you're gonna end up ignoring half your GPU. 8-bit hardware is readily available in some contexts (especially internet services), but if you're writing code that runs entirely offline on existing devices, you're not gonna ignore half the machine unless you have a genuine reason. And if you need the same hardware to do multiple things, this gets even more complicated. If you're a price-conscious end-user, you probably don't want to pay for extra hardware unless you personally use it. If you're an investor, you don't want to buy specialized equipment unless you're absolutely sure it's the right tool for the job.

@sa1t938 Жыл бұрын

int8 and int4 aren't good enough for training (at least right now). Training involves nudging all the values slightly in the right direction over and over and over again. You can't do as precise nudges as you need to with int4 and int8. However, thats just for training. During inference of the model, you aren't changing any of the values so you don't need the same precision, hence you can get away with int8. However, until recently, int4 would just fall apart most of the time. However, theres a new quantization technique called GPTQ which makes int4 have almost the same performance as fp16. As to why its not used for inference, some things need the extra precision. For example, Stable Diffusion kind of falls apart at int8 from what I've heard. I'm actually training a stable diffusion model myself, and I had to use fp32 while training because fp16 just ended up not being enough. Language models usually do fine with less precision. As to why not all language models are being ran at int4 or int8, probably because "if it ain' broke, don't fix it"

@frostilver Жыл бұрын

*Quantum computers will skyrocket AI in the next 10 years*

@JazevoAudiosurf Жыл бұрын

I think it's a very simple equation: more layers and thus more params lead to better abstraction and deeper understanding. the brain would not be so huge if it wasn't necessary. we need to scale transformers up until they reach couple trillion params and for that we need stuff like H100 and whatever they announce at GTC next month. transformers are probably enough to solve language. that combined with CoT and as papers have shown, it will surpass humans