Self-Host and Deploy Local LLAMA-3 with NIMs

  Рет қаралды 3,767

Prompt Engineering

Prompt Engineering

Күн бұрын

In this video, I walk you through deploying Llama models using NVIDIA NIM. NVIDIA NIM uses microservices to enhance the deployment of various AI models, offering up to three times improvement in performance. I demonstrate how to set up an NVIDIA Launchpad, deploy the Llama 3 8 billion instruct version, and stress test it to see throughput. I also show you how to utilize OpenAI compatible API servers with NVIDIA NIM.
LINKS:
NIM: nvda.ws/44u5KYH
org.ngc.nvidia.com/setup/pers...
NIM Previous Video: • Deploy AI Models to Pr...
💻 RAG Beyond Basics Course:
prompt-s-site.thinkific.com/c...
Let's Connect:
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
TIMESTAMPS
00:00 Introduction to Deploying Large Language Models
00:13 Overview of NVIDIA NIM
01:02 Setting Up and Deploying a NIM
01:51 Accessing and Monitoring the GPU
03:39 Generating API Keys and Running Docker
05:36 Interacting with the Deployed Model
07:16 Stress Testing the API Endpoint
09:53 Using OpenAI Compatible API with NVIDIA NIM
12:32 Conclusion and Next Steps
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 16
@DearGeorge3
@DearGeorge3 2 күн бұрын
It's not clear can I run NIM locally and get 5x in perfomance or not.
@petergasparik924
@petergasparik924 2 күн бұрын
Im curious too
@engineerprompt
@engineerprompt Күн бұрын
Here are the configurations that they used for running the tests on H100 [Llama 3-70b-instruct, input token length: 7,000, output token length: 1,000. Concurrent client requests: 100. 4xH100 SXM NVLink. NIM Off: FP16, TTFT: ~120s, ITL: ~180ms. NIM On: FP8. TTFT: ~4.5s, ITL: ~70ms You can run NIM locally on Tensor Core GPU but the performance you will get is dependent on your configurations and hardware. So your milage may vary.
@user-nl7ur5mc2p
@user-nl7ur5mc2p 2 күн бұрын
Thank you. Amazing channel
@engineerprompt
@engineerprompt 2 күн бұрын
Thanks
@petergasparik924
@petergasparik924 2 күн бұрын
Hi, are you sure that inference speed on H100 is correct? Because on my RTX 4090 with Llama 3 Instruct 8B Q8_0 inference speed is about 72t/s, so you have lower speed than me
@orlingueorguiev
@orlingueorguiev Күн бұрын
Can you provide a benchmark comparison fortwhen using ollama server? I really want to see if the claimed performance improvement is actually there.
@engineerprompt
@engineerprompt Күн бұрын
Let me see if I can do a comparison between different options (ollama, llamacpp, vllm and NIM). Here is a blogpost from NVIDIA that might be helpful (note numbers here are for 8B, the results I showed in the video are for different configuration - 70B) tinyurl.com/as7uvbv8
@Nihilvs
@Nihilvs 2 күн бұрын
Thanks ! what do you actually pay for, when buying NIM ?
@engineerprompt
@engineerprompt 2 күн бұрын
You are paying for the license fee. My understanding is you can run this on your own hardware but paying licensing fee for using the software stack.
@Nihilvs
@Nihilvs Күн бұрын
@@engineerprompt Good to know ! thanks
@rousabout7578
@rousabout7578 2 күн бұрын
Is this correct? For production use, NIM is part of NVIDIA AI Enterprise, which has different pricing models: - On Microsoft Azure, there's a promotional price of $1 per GPU per hour, though this is subject to change. - For on-premises or other cloud deployments, NVIDIA AI Enterprise is priced at $4,500 per year per GPU.
@engineerprompt
@engineerprompt 2 сағат бұрын
Here is the info: resources.nvidia.com/en-us-ai-enterprise/en-us-nvidia-ai-enterprise/nvidia-ai-enterprise-licensing-guide
@zikwin
@zikwin 2 күн бұрын
i dnt have friend kind enough to give me acess to H100
@eod9910
@eod9910 2 күн бұрын
So I'll say this because evidently other people are too polite, but this is absolute garbage. Who has an H100 hanging around to do this? Don't post stuff that 99% of the people can't do. If you want to post stuff that only people with tens of thousands of dollars and access to this type of hardware can use, go work for one of those companies. Otherwise, you're wasting everybody's time.
@christosmelissourgos2757
@christosmelissourgos2757 2 күн бұрын
Actually I don’t agree. We are building a product and that is something that we are really interested in
@vitalis
@vitalis 2 күн бұрын
dude why are you so bitter? Go out and touch grass for a bit. Have you learnt nothing from the last two decades in tech history? All industrial tech sips through to prosumer and then mainstream. Your local GPU performance would be considered alien tech not too long ago. Sheesh
Official PyTorch Documentary: Powering the AI Revolution
35:53
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 803 М.
Василиса наняла личного массажиста 😂 #shorts
00:22
Денис Кукояка
Рет қаралды 9 МЛН
터키아이스크림🇹🇷🍦Turkish ice cream #funny #shorts
00:26
Byungari 병아리언니
Рет қаралды 28 МЛН
Жайдарман | Туған күн 2024 | Алматы
2:22:55
Jaidarman OFFICIAL / JCI
Рет қаралды 1,5 МЛН
Making Long Context LLMs Usable with Context Caching
13:39
Prompt Engineering
Рет қаралды 2 М.
Easiest way to build LLM apps - Langflow 1.0 demo and deep dive!
1:00:51
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 38 М.
The Linux Experience
31:00
Bog
Рет қаралды 387 М.
You've been using AI Wrong
30:58
NetworkChuck
Рет қаралды 395 М.
Claude 3.5 Sonnet Projects Tutorial: NEXT LEVEL AI Programming!
29:57
Claude Artifacts: What it can do and limitations
12:37
Prompt Engineering
Рет қаралды 9 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Low Price Best 👌 China Mobile 📱
0:42
Tech Official
Рет қаралды 718 М.
Hisense Official Flagship Store Hisense is the champion What is going on?
0:11
Special Effects Funny 44
Рет қаралды 2,4 МЛН
Best mobile of all time💥🗿 [Troll Face]
0:24
Special SHNTY 2.0
Рет қаралды 1,6 МЛН
Ультрабюджетная игровая мышь? 💀
1:00
Неразрушаемый смартфон
1:00
Status
Рет қаралды 2,2 МЛН