Can you run an AI LLM on an old server?

Рет қаралды 2,397

5 ай бұрын

Old servers are cheap, especially the memory meaning you can get a lot of RAM at a low cost. Does this make it viable to run AI language models at a low cost?
Ollama:
github.com/jmorganca/ollama
Issue relating to AVX:
github.com/jmorganca/ollama/i...
PDF for the server I have:
www.bargainhardware.co.uk/med...

Пікірлер: 18

@NetrunnerAT Күн бұрын

Old PC with 128-512gb ddr3 RAM + a lot pci-e slots + some cheap Nvidia P can lift heavy workload without issue. Also in a tight 1000€ budget.

@chris_php 10 сағат бұрын

Yeah can do some good work with a Setup like that decent for also training your own data for your own models.

@porter__8205 5 ай бұрын

Will do once I get more RAM xD

@geekdomo 4 ай бұрын

please consider a microphone. Its really difficult to understand you with the mic/noise gate cutting out so often. Thanks for the video.

@chris_php 4 ай бұрын

Thanks, I've been making improvements to my mic in my more recent videos.

@SB-qm5wg 2 ай бұрын

So yeah, but hella slowly

@ewasteredux 5 ай бұрын

Hi Chris! Great video. Do you know what the minimum specs are for setting up an older system with no GPU to run ollama locally with reasonable speed? I know there are many variables here including the size of the LLM, however, if we choose a small to medium model and assume that RAM is not an issue, how many cores and what speed would allow for a speed that would not require you to take a nap waiting for the generation to finish? I have also recently purchased an old nVidia GRID K1 (16GB VRAM) for extremely cheap and could not get this to run with ollama. Currently my workstation specs are Dell T5600 with two E5-2690 CPU's and 128GB DDR3 RAM. I could not use the GRID K1 with this unit as these older Dell workstations will not even POST with one in it. I am not planning on leaving the system on 24x7 so I am not considering energy costs as being substantial to run this for short periods of time. FYI, the RAM was gifted to me and the t5600 was under $100 US so I really could not afford not trying...

@chris_php 5 ай бұрын

Hello, That's a good offer you got for that system and it's speed might already be good since those CPUs have the AVX instruction set which will greatly increase the speed for ollama. generally more cores the better since it's a large dataset. The speed of the ddr3 ram will be important and might be a bottleneck but this system might be fine running a 7B or 13B at a good speed since the CPU has the AVX instruction set, so no need to disable it in the generate_linux,go.

@solotechoregon 2 ай бұрын

i have plenty of old servers..but the requirements have other hidden dependencies... such as avx2 or better. While avx2 hit intels cpus in 2012 with the haswell tech... the cpu in your server could predate avx2.

@chris_php 2 ай бұрын

Yes the CPU predates any avx which contributes to it's very slow speed which is why I had to disable avx entirely to get to even run.

@perrymitchell7118 3 ай бұрын

What would you recommend to run a chatbot trained on website data locallY? Thanks for the video.

@chris_php 3 ай бұрын

If it's a small model like a 3B you don't need much ram like 8gb so it can even run on lower end graphics like a 2060 and get quick responses.

@IpfxTwin 3 ай бұрын

I have my eye on a Proliant Gen 8 server with 393 gigs of ram (dual socket / 12 threads each). I know more ram would handle more parameters, but would more ram speed up simpler models?

@chris_php 3 ай бұрын

RAM speed is important since the whole LLM needs to be read from so the smaller the model the quicker you will get a response. If the LLM is 40gb in size and you can ready 40gb a second in bandwidth that means you'd have 1 token every second.

@dectoasd3644 5 ай бұрын

This generation of server is E waste, and anything with dual CPU of this gen worse still. Minimum would be E5 V3,V4 with quad channel, not the cheap AliExpress remade boards. 20b Q5 on E5 2660 V3 is usable but cash would be better spent on P40