How to run Ollama on Docker

No video

How to run Ollama on Docker

Рет қаралды 28,710

5 ай бұрын

Ollama runs great on Docker, but there are just a couple things to keep in mind. This covers them all.
Visit hub.docker.com... for more details.
Be sure to sign up to my monthly newsletter at technovangelis...
And if interested in supporting me, sign up for my patreon at / technovangelist

Пікірлер: 137

@technovangelist 5 ай бұрын

someone just commented about finding another way to upgrade the container. I can't find the comment now, so if this was you, post again. But no, do not upgrade the install inside a container, that’s a whole lot of work for no benefit. The models are stored in the volume you mounted as part of the install, so deleting the image will not affect the models. If you have gone against the recommendations and stored models inside the container, then best approach is to move them to the correct spot and update the container.

@Kimomaru 3 ай бұрын

I really wish more videos were made like this. No nonsense, gets straight to the point, clear, concise. Thank you.

@technovangelist 3 ай бұрын

And yet some complain that I take too long and waste time. But thankyou so much for the comment. I do appreciate it.

@jwerty Ай бұрын

@@technovangelist Amazing video ! Finally, I understand the docker.

@mercadolibreventas 5 ай бұрын

Matt your a great teacher, no one explains things like you do. They just read the command in one sentence and do not explain the actual function of that command in parts. Lots of videos showing how to do something and 75% never work. So thanks so much!

@ashwah 2 ай бұрын

Thanks Matt this helped me understand the Docker side of things. Namely keeping the models in a volume. I will restructure my project based on this. Keep it up ❤

@ToddWBucy-lf8yz 2 ай бұрын

thank you Sir! You just took the mystery of how to set this up right. I love me some docker. It really helps to keep the work stuff separated from the personal project stuff.

@ErnestOak 5 ай бұрын

Does it make sense to use ollama in production as a server?

@Makumazaan 3 ай бұрын

much respect for the way you deliver information

@xXWillyxWonkaXx 5 ай бұрын

Straight to the point, no fluff, very informative. Very updated. You just earned a fan/subscriber. Howdy Matt 🎩

@technovangelist 5 ай бұрын

there are some who say I am all fluff, but I try to always be closer to your observation.

@TimothyGraupmann 5 ай бұрын

Learned that containers can be remote and the alias. Yet another great video! I need to take advantage of that. I have a bunch of RPI security cameras and remote containers might make administration even easier!

@tristanbob 5 ай бұрын

This is my new favorite channel! I learned like 10 things just in this video. I love learning about AI, modern tools such as docker and tailscale, and modern hosting platforms and services. Thank you!

@technovangelist 5 ай бұрын

you left off the most important part.... NERF can be expensed!!

@tristanbob 5 ай бұрын

Good point! So I learned 11 things :)@vangelist

@tiredofeverythingnew 5 ай бұрын

In the realm of ones and zeros and LLM models, Matt is the undisputed sovereign.

@technovangelist 5 ай бұрын

wow, you are too kind

@AnkitK-wi3wk 3 ай бұрын

Hi Matt, your videos are super useful and right on point. Thank you putting this together. I have a quick ques on this topic. I have created a RAG streamlit app in python using Ollama llama3 and ChormaDB. The app runs fine on my Mac localhost but I wanted to create a docker image of this app. I am unable to figure out how to include Ollama llama3 in my docker image. Can you help point to any resources which can guide me on this or cover this in one of the topics? Again,thanks a mil for the content. Great stuff!!! Cheers

@diaaessam5540 Ай бұрын

Did you find any resources?

@fuba44 5 ай бұрын

This is my new favorite content, the way you explain it just beams directly into my brain and i get it right away. Thank you. Is there a way to show support, donations or similar?

@technovangelist 5 ай бұрын

Folks have asked me about that. I’ll be looking into something like Patreon soon.

@technovangelist 5 ай бұрын

The big thing for now is to just share the video with everyone you know.

@technovangelist 3 ай бұрын

Well I do have that patreon now. Just set it up: patreon.com/technovangelist

@bjaburg 5 ай бұрын

There are not many people that can explain these steps in such an easy and entertaining way as you do Matt. I often pride myself in being able to do so, but you can be my teacher. I often find myself watching the progress-bar beause I don't want it to end (seriously :-))! A request: could you do an explainer video on how to train a model (say Microsoft/Phi-2) on your own dataset and deploy the trained model? OpenAI makes it super easy by deploying a JSONL file and after a while it 'returns' the trained model. But I want to train my own models. I have been looking around YT but get lost in parameters, incorrect JSONL-files (or csv)., etc. Surely, this must be easier. (hopefull your answer is "it is easier, and don't call me Shirley") Thanks so much again. You have a happy subscriber (and many more to come)> Kind regards, Bas

@omarlittle5802 23 күн бұрын

Hey @Matt Williams! Thanks for the great content! Am I crazy, or did you do a version of this somewhere where you added a file to /etc/systemd/system that called up docker at startup and ran ollama that way? i.e. letting systemd handle the startup and restarting of the container as a service?? If not, would you?! Please and thank you!!?? 😅

@MohammadhosseinMalekpour 2 ай бұрын

Thanks, Matt! It was a straightforward tutorial.

@CaptZenPetabyte 2 ай бұрын

Ive been running a lot via Docker but when I found out about the difficulty of GPU pass-through (on any machine) I have been swapping things over to proxmox which does have a GPU pass-through *and* can also use CPU to emulate GPU as it is needed ... what do you think about running on Proxmox?

@sampellino 4 ай бұрын

A fantastic, clear instructional. Thank you so much! This helped me a ton.

@EcomGraduates 5 ай бұрын

How you speak in your videos is refreshing thank you 🙏🏻

@Guy-qx3qo 28 күн бұрын

what an amazing video! Thank you lots for providing for the tech community

@s.b.605 3 ай бұрын

how do you swap models in the same container? I think I'm doing it wrong and it's affecting my container memory

@JM-sn5eb 2 ай бұрын

This is exactly what I've been looking for! Could you please tell (or maybe create a video) how to use ollama completeley offline? I have a PC that I can not connect to the internet.

@vishalnagda7 4 ай бұрын

Could you kindly assist me in clarifying how to specify the model name when running the ollama Docker command? For instance, I aim to utilize the mistral and llama2:13b models in my project. Thus, I request our dev-ops team to launch an ollama container configured with these specific models.

@devgoneweird 5 ай бұрын

Does it make possible to limit resource consumption of ollama? I'm looking for some way to run a background computation and I don't really care about how much time it takes (if it is able to process a stream's avg load), but it would be annoying if it would be hanging the main activity on the machine.

@Slimpickens45 5 ай бұрын

🔥good stuff as always Matt!

@jonascale 2 ай бұрын

So, I think you cleared up most of the problems that i have been having trying to get this setup. But i have one last one that i just cant seem to get past. So, my setup is on proxmox and i first tried to create a lxc container then once i have my NVidia passthrough working for my gpu i installed ollama and downloaded my first model. That all went fine then i tried to see if the api was listening on port 11434 by opening a browser and going to the address:11434. according to the documentation i should get a message that ollama is ready. unfortunately, i get no errors the page simply doesn't open. So i approached it from the other side and just created a lxc and installed docker and portainer on it. much to my surprise when i navigated to the address i got the message ollama is ready. My questions is why? im sure this is something easy that im missing but 24 hours later i am still not sure why. any ideas?

@Lemure_Noah 5 ай бұрын

Excellent, Matt! For some reason, I had to run docker commands with "sudo" , to use my GPUs.

@gokudomatic 5 ай бұрын

That sounds like your user is not in the right group. I had once issues like that, and it was a matter of not being in docker group. Now, I can use my gpu in my docker container.

@technovangelist 5 ай бұрын

good answer. I knew it, but couldn't remember. and this is what I remember.

@mxdsgnlzent1843 22 күн бұрын

Excellent content.

@95jack44 5 ай бұрын

Searching for a full airgap install on docker to use on Kubernetes. This is a start ^^. Thx

@robertdolovcak9860 5 ай бұрын

Nice and clear tutorial. Thanks! 😀

@sushicommander 5 ай бұрын

Great video. Now i'm curious about how you setup ollama on brev ... What is your recommended setup & host service for using Ollama as an endpoint?

@entzyeung Ай бұрын

hi Matt, think is an excellent video, thank you. I have an app built with ollama, How do I containerize the entire thing with Docker? My aim is to push this image online and share around. Thank you in advance.

@technovangelist Ай бұрын

You want to know how to put your app in docker? There are plenty of tutorials for that? What specific aspect do you need help with?

@entzyeung Ай бұрын

@@technovangelist My simple q&a app is built with "from llama_index.llms.ollama import Ollama". I tried to containerize it with the usual docker command. Then i run the image. But when i go to the local host, it is an error . so what is the proper way to bundle ollama and the model into the docker image file?

@technovangelist Ай бұрын

Put the app and dockerfile somewhere, maybe in a gist or somewhere else so I can try it and give advice

@brentfergs 4 ай бұрын

Great video as always Matt, I love them. I would like to know how to load a custom model in docker with a model file. Thank you so much.

@technovangelist 4 ай бұрын

Same way as without docker. you create the model using the modelfile, then run it. or am i missing something

@ricardofernandez2286 4 ай бұрын

Hi Matt, thank you for such a clear an concise explanation!! I have a question that may or may not apply in this context, and I'll let you be the jugde of it. I'm running on CPU on an 8 virtual core server with 30Gb RAM and NVme disk on ubuntu 22.04, and the performance is kind of poor (and I clearly understand that GPU will be the straightforward way to solve this). But I've noticed that when I run the models, for example Mistral 7b, ollama only uses about half the CPUs available and less than 1 Gb of RAM. I'm not sure why it is not using all the resources available, or if using them will improve the performace. Anyway it would be great to have your advice on this, and if it is something that can be improved/configured how would you suggest to do it? Thank you very much!!!

@technovangelist 4 ай бұрын

You will need a GPU. Maybe a faster CPU would help, but the GPU is going to be the easier approach. You will see 1 or 2 orders of magnitude improvement adding even a pretty cheap GPU from nvidia or amd.

@ricardofernandez2286 4 ай бұрын

@@technovangelist Thank you! I know the GPU is the natural way to go. I was just wondering why it is using less that half the resources available, when it has plenty of extra CPU and RAM; and if using these idle resources could improve at least in a x% the performance. And unfortunately I can't add GPU to this current configuration I have. My CPUs are AMD EPYC 7282 16-Core Processor which I think are quite nice CPUs. Thank you!!

@Lemure_Noah 5 ай бұрын

I would like to suggest the ollamas support to embeddings, when it becomes available through REST API. If they really choosed the nomic-ai/nomic-embed-text-v1.5-GGUF, it would be perfect as this model is multi-language

@technovangelist 5 ай бұрын

It does support embeddings. Using Nomic-embed-text. Check out the previous video. It covers that topic.

@michaeldoyle4222 4 ай бұрын

Any idea where I can see docker logs for local install (i.e. not docker) on mac....

@technovangelist 4 ай бұрын

If it’s a local install that isn’t docker there is no docker log

@csepartha 5 ай бұрын

Kindly make a tutorial to fine tune an open source LLM model on many pdfs data. The fine tuned LLM must be able to answer the questions from the pdfs accurately.

@user-ok9vj5js7e Ай бұрын

thanks for your help!

@mrRobot-qm3pq 5 ай бұрын

Does it consume less resources and run better with OrbStack instead of with Docker Desktop?

@Tarun_Mamidi 5 ай бұрын

Cool tutorial. Can you also show how we can integrate ollama docker with other programs, say, langchain script inside docker. How to connect both of them together or separately? Thanks!

@technovangelist 5 ай бұрын

would love to see a good example of using langchain. often folks use it for rag where it only adds complexity. Do you have a good usecase?

@chandup 5 ай бұрын

Nice video. Could you also please make a demo video on how to use ollama via nix (nix shell or on nixos)?

@user-kg1di9ed3z Ай бұрын

Can I run Ollama without having access to internet? I would like to run it locally, with no internet connection at all.

@technovangelist Ай бұрын

Yes

@mohammedibrahim-hd2rs 2 ай бұрын

you're amazing bro

@MarvinBo 5 ай бұрын

Make you Ollama even better by installing Open WebUI in a second container. This even runs on my Raspi5!

@technovangelist 5 ай бұрын

Some like the webui. But that’s a personal thing. Its an alternative.

@ravitejarao6201 5 ай бұрын

Hi bro. When try to deploy ollama on awa lambda with ecr docker image I am getting error can you please help me Error:http:connecterror:[errno 111] connection refused Thank you

@technovangelist 5 ай бұрын

need a lot more info. where do you see that. when in the process? Is running a container like that even going to be possible? Do you have access to a gpu with lambda? if not, its going to be an expensive way to go.

@michaelberg7201 5 ай бұрын

I recently had the opportunity to try Ollama in docker and it worked pretty much as shown in this video. I do think it would be nice if it was somehow possible to start a container and have it ready to serve a model immediately but i couldn't find an easy way to do this. You basically have to run one docker command to start Ollama, then wait a bit, then run another docker "it" command to tell Ollama to load whatever model you happen to need. How do i achieve the same thing using just one single docker command?

@carlosmendez3363 2 ай бұрын

docker-compose

@CC-zr6fp Ай бұрын

I am wondering where I am going wrong. I followed step-by-step but when I run ollama run llama3 I get 'ollama: command not found'

@technovangelist Ай бұрын

Are you running that in the container? If not you need to run the right docker command. But best if you don’t use docker if you aren’t a docker user

@CC-zr6fp Ай бұрын

@@technovangelist Thank you for the quick reply. I was running it in one, however I decided it was going to be faster for me to use a spare NUC I had and using Windows. The LXC I was testing it in, even though I gave it 8 cores it was still sorta slow even using llama3:text

@technovangelist Ай бұрын

Without a gpu it will be slow

@RupertoCamarena 5 ай бұрын

did you hear about jan ai? Would be got a Tutorial for docker. Thanks

@xDiKsDe 5 ай бұрын

hey matt, appreciate your content - has been very helpful to get everything running so far! I am on a windows 11 pc and managed to get ollama + anythingllm running on docker and communicate w/ each other. Now I want to try to get llms from hugging face to run in the dockerized ollama. I saw how it works, if you have ollama installed directly on the system. But how do I approach this with using docker? Thanks in advance and keep it up 👏

@technovangelist 5 ай бұрын

Is the model not already in the library. You can import but I can be a bit of extra work. Check out the import doc in the docs folder of the ollama repository

@xDiKsDe 5 ай бұрын

ah yes they are, but I meant custom trained llms - I stumbled across the open_llm_leaderboard and wanted to give those a try - will check out the import doc, thanks! @@technovangelist

@alibahrami6810 5 ай бұрын

is it possible to manage multiple instances of ollama on docker for scaling the ollamas for production? how ?

@technovangelist 5 ай бұрын

You could but it will result in lower performance for everyone.

@nagasaivishnu9680 5 ай бұрын

Running the docker container as ROOT user is not secure,is there anyway to run it as non root user

@kevyyar 5 ай бұрын

Juat dound this channel. Coould you make a video tutorial on how to use inside vscode for code competions?

@kaaveh 5 ай бұрын

I wish there was a clean way to launch an Ollama docker container with a preconfigured set of models so it would serve and then immediately pull the models. We are overriding the image’s entry point right now to run a script shell that does this…

@kiranpallayil8650 2 ай бұрын

would ollama still work on a machine with no graphics card?

@technovangelist 2 ай бұрын

Absolutely. It will just be 1-2 orders of magnitude slower. The work models do requires a lot of math that gpu really help accelerate

@masterfulai 5 ай бұрын

some of the model links r broken so I had to add it to requirements and edit the Dockerfile

@technovangelist 5 ай бұрын

What do you mean by that? Is this something I’m a file you made?

@masterfulai 5 ай бұрын

@@technovangelist I was referring to the Ollama Webui, perhaps this isn't the same repo?

@technovangelist 5 ай бұрын

different product made by unrelated folks

@bodyguardik 5 ай бұрын

In wsl2 docker version DONT PUT MODELS OUTSIDE WSL2 on mounted windows drive - I/O performance will be x15 times slower.

@technovangelist 5 ай бұрын

Yup. Pretty standard stuff for docker and virtualization. Docker on wsl with Ubuntu means the ollama container is running in the Ubuntu container on the wsl virtual machine. Each level of abstraction slows things down. And translation between levels is going to be slow.

@SharunKumar 5 ай бұрын

For Windows, the recommended way would be to use WSL(2), since that's a container in itself

@technovangelist 5 ай бұрын

Well recommended way on windows is the native install. But after that is docker. And wsl is a vm, not a container. Ubuntu on wsl is a container that runs inside the wsl vm.

@AdrienSales 5 ай бұрын

Hi, would you also share podman commands ? Did you give it a try ?

@technovangelist 5 ай бұрын

I tried it a bit when Bret fisher and I had them on the show we hosted together but it didn’t have much reason to stop using docker. I didn’t see any benefit.

@AdrienSales 5 ай бұрын

@@technovangelistThanks for the feedback. It was not about dropping docker, but rather be sure both work as in some cases, podman is used (beacause of the rootless mode eg.) and not docker. So it may help some of us spreading ollama even in theses cases in enterprise context ;-p

@technovangelist 5 ай бұрын

I thought they were supposed to be command line compatible. should be the same, right? Try it and let us know.

@akshaypachbudhe3319 5 ай бұрын

how to connect this ollama server with a streamlit app and run both on docker

@madhusudhanreddy9157 Ай бұрын

From the questions I understood you should be two containers with different ports 1. Ollama 2. Streamlit app Run separately and access apis of Ollama in front end app

@AlokSaboo 5 ай бұрын

Loved the video…can you do something similar for LocalAI. Thanks!

@technovangelist 5 ай бұрын

Hmm. Never heard of it before now. I’ll take a look

@AlokSaboo 5 ай бұрын

@@technovangelist github.com/mudler/LocalAI - Similar to Ollama in many respects. One more tool for you to learn :)

@Vinn.V 2 ай бұрын

It's better to write a docker file and package it as docker image

@lancemarchetti8673 5 ай бұрын

Hey Guys.... Mistral just launched their new model named Large!

@95jack44 5 ай бұрын

If anyone has insights on a particular LLM model that has low halucination rate on Kubernetes native resource generation, please leave me a comment ;-). Thx

@technovangelist 5 ай бұрын

Usually when someone has a hard time with the output of a model it points to a bad prompt rather than a bad model.

@jbo8540 5 ай бұрын

Mistral:Instruct is a solid choice for a range of tasks

@bobuputheeckal2693 3 ай бұрын

How to run as a dockerfile

@technovangelist 3 ай бұрын

Yes that’s what this video shows

@bobuputheeckal2693 3 ай бұрын

@@technovangelist I mean, how to run as a dockerfile, not as a set of docker commands.

@technovangelist 3 ай бұрын

Docker commands that runs an image using the dockerfile

@basilbrush7878 5 ай бұрын

Mac not allowing GPU pass through is a huge limitation

@technovangelist 5 ай бұрын

Docker has known the issue for a long time. But mostly its because there aren't linux drivers for the apple silicon gpu

@florentflote 5 ай бұрын

@kwokallsafe5642 2 ай бұрын

VID SUGGESTION ~ (Resolve Error Response: Invalid Volume Specification) - Thanks test@xz97:~$ docker run -d --gpus=all -v /home/test/models/:root/.ollama -p 11434:11434 --name ollama ollama/ollama docker: Error response from daemon: invalid volume specification: '/home/test/_models/:root/.ollama': invalid mount config for type "bind": invalid mount path: 'root/.ollama' mount path must be absolute.

@technovangelist 2 ай бұрын

that’s more of a support request.... the error message is all you need. You specified a relative path rather than a absolute one. refer to the docs on docker hub for the image

@kwokallsafe5642 2 ай бұрын

@@technovangelist - Thanks Matt for your reply - Discovered the is a "/" SLASH missing before root - (Problem solved) Thanks again.

@themax2go 3 ай бұрын

waaiiiit wait wait a sec... i specifically remember in a vid (don't remember which, it's been months) that on Mac in order for ollama to utilize "metal" 3d acceleration that it needs to run in docker... strange 🫤

@technovangelist 3 ай бұрын

Sorry. You must have remembered that wrong. Docker on Mac with apple silicon has no access to the gpu. And ollama doesn’t work with the gpu on Intel Macs either