Chatbot Memory for Chat-GPT, Davinci + other LLMs

Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4

Рет қаралды 43,755

Күн бұрын

Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.
The memory allows a Large Language Model (LLM) to remember previous interactions with the user. By default, LLMs are stateless - meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.
There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.
There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the `ConversationChain`.
🌲 Pinecone article:
pinecone.io/learn/langchain-c...
📌 LangChain Handbook Code:
github.com/pinecone-io/exampl...
🙋🏽‍♂️ Francisco:
/ fpingham
Part 1 (Intro): • Prompt Engineering wit...
Part 2 (PromptTemplate): • Prompt Templates for G...
Part 3 (Chains): • LLM Chains using GPT 3...
🎙️ AI Dev Studio:
aurelio.ai/
🎉 Subscribe for Article and Video Updates!
/ subscribe
/ membership
👾 Discord:
/ discord
00:00 Conversational memory for chatbots
00:28 Why we need conversational memory for chatbots
01:45 Implementation of conversational memory
04:05 LangChain's Conversation Chain
12:00 Conversation Summary Memory in LangChain
19:06 Conversation Buffer Window Memory in LangChain
21:35 Conversation Summary Buffer Memory in LangChain
24:33 Other LangChain Memory Types
25:25 Final thoughts on conversational memory
#artificialintelligence #nlp #openai #deeplearning #langchain

Пікірлер: 99

@kevon217 Жыл бұрын

Another masterpiece of a tutorial. You’re an absolute gem James!

@decodingdatascience 9 ай бұрын

Thanks Kames to elaborate about Langchain Memory , For the viewers here are some 🎯 Key Takeaways for quick navigation: 00:00 🧠 Conversational memory is essential for chatbots and AI agents to respond coherently to queries in a conversation. 01:23 📚 Different memory types, like conversational buffer memory and conversational summary memory, help manage and recall previous interactions in chatbots. 05:42 🔄 Conversational buffer memory stores all past interactions in a chat, while conversational summary memory summarizes these interactions, reducing token usage. 14:13 🪟 Conversational buffer window memory limits the number of recent interactions saved, offering a balance between token usage and remembering recent interactions. 23:05 📊 Conversational summary buffer memory combines summarization and saving recent interactions, providing flexibility in managing conversation history. We are also doing lots of workshops in this pace , looking forward to talk more

@GergelyGyurics Жыл бұрын

Thank you. I was way behind langchain and had no time to read documentations. This video saved me a lot of time. Subscribed.

@MrFiveDirections Жыл бұрын

Super! For me, it is one of the best tutorials on this subject. Much appreciated, James.

@jamesbriggs Жыл бұрын

thanks, credit to Francisco too for the great notebook

@cloudshoring Жыл бұрын

Cool! This video addressed the question that I had posed in your earlier (1st) video about the Token size limitations due to adding conversational history. The charts provide a good intuition of the workings of the memory types. Two takeaways. 1.When to use which mem. type 2. How to do performance tuning for a Chatbot app. due to the overheads posed by token tracking, memory appending so on..

@daharius2 Жыл бұрын

Things really seem to get interesting with the knowledge graph! Saving things that really matter like relation context, along with a combination of the other methods, starts to sound very powerful. Add in some embedding/vectorDB and wow. The other commenters idea about a system for bots evolving sentiment, or even personality, over time is worth thinking about as well.

@jamesbriggs Жыл бұрын

yeah this is fascinating to me, looking forward to working on these

@user-sc6cc2gv6b Жыл бұрын

Very powerful! Any idea or resources on how to add a embedding/vectorDB to this? I would like this memory chatbot to be able to reference my own data stored in the vectorDB but I can't seem to make it work together. Either the chatbot has memory OR it references the embedding bot I can't seem to combine it..

@vintagegenious Жыл бұрын

@@user-sc6cc2gv6b It's done in video #9

@DavidGarcia-gd2vq Жыл бұрын

Thanks for your content! looking forward to watching the knowledge graph video :)

@davidmoran4623 Жыл бұрын

Great explaining to the memory in langchain, when you show the chart is more clearly for my

@MatheusGamer14 Жыл бұрын

Thanks for this content James, awesome!

@jamesbriggs Жыл бұрын

you're welcome

@Davipar Жыл бұрын

Thank you! Awesome work!! Appreaciate it!

@jamesbriggs Жыл бұрын

thanks!

@Sciencehub-oq5go Жыл бұрын

James, thanks so much!

@satvikparamkusham7454 Жыл бұрын

These lectures are really helpful, thanks a lot! Is there a way to use Conversational Memory along with VectorDBQA (generative question answering on a database)?

@goelnikhils 8 ай бұрын

Amazing Content

@ylazerson Жыл бұрын

you are awesome - thanks again!

@TomanswerAi Жыл бұрын

Great demo James

@jamesbriggs Жыл бұрын

thanks Tommy I appreciate it!

@adamsardo Жыл бұрын

Love the video! Question about wanting to put this behind a UI, how hard would that process be?

@gutgutia Жыл бұрын

James - are you still planning to work on the KG video? Seems like a powerful method that solves for scale and token limits.

@NextGenart99 Жыл бұрын

Oh wow you just destroyed my project lol I gave chat GPT long term memory, autonomous memory store and recall,speech recognition, audio out put, self reflect. Thought I was the only working on stuff like this. Well I’m basically trying to build a sentient, I need vision tho. Hopefully GPT 4 is multimodal because I’m struggling to give me project vision recognition.

@jamesbriggs Жыл бұрын

yeah I think you might be in luck for multimodal GPT-4 :) - that's awesome though, I haven't done all of that yet, very cool!

@ericgeorge7667 Жыл бұрын

Great work bro! Keep it up! 👍

@ObservingBeauty Жыл бұрын

Helpful! Thanks

@m.branson4785 Жыл бұрын

Great video! I love the graphs for token usage. I kept meaning to graph the trends myself, but I was too lazy! I was talking to Harrison Chase as he was implementing the latest changes to memory, and it's had me thinking about other unique ways to approach it. I've been using different customized summarizers, and I can bring up any subset of the message history as I like, but I'm thinking also to include some way to flag messages as important or unimportant, dynamically feeding the history. I also haven't really explored my options in terms of local storage and retrieval of old chat history. One note that I might make for the video too... I noticed you're using LangChain's usual OpenAI class and just adjusting your model to 3.5-turbo. My understanding is that we have been advised to use the new ChatOpenAI class for now when interacting with 3.5-turbo, since that's where they'll be focusing development and they can address changes there without breaking other stuff, necessary since the new model endpoint differs in how it takes a message list as parameter instead of a simple string.

@jamesbriggs Жыл бұрын

dynamically feeding the memory sounds cool, would you do this explicitly or implicitly? langchain moves super fast, I haven't seen the new ChatOpenAI class, thanks for pointing this out!

@omnipedia-tech Жыл бұрын

@@jamesbriggs My notions are to create a chat client where the bot is controlling the conversation, instead of the user, for the purpose of guided educational experiences - like a math lesson performed with the Socratic method, where you want to elicit the solution from the user rather than just provide it to them. I'm imagining I'll need an internal model of the user's cognition and an outline of the lesson, then implicitly determining the importance of any interaction or lesson detail by how logically connected it is to both, feeding only the immediately relevant context to the external facing LLM. I'm really still brainstorming, and I just started a month-long vacation to play with the idea.

@SaifBattah 6 ай бұрын

what if i want to use it for my own fine-tuned gpt3.5 model?

@max4king Жыл бұрын

Does anyone know the difference between the run vs predict method? Cause they seem the same to me. If there is a difference, which one is better?

@souvickdas5564 Жыл бұрын

How do I use memory with ChatVectorDBChain where we can specify vector stores. Could you please give code snippet for this. Thanks

@jason_v12345 Жыл бұрын

Skimming through the docs, LangChain seems like a complicated abstraction around what's essentially auto copy and paste.

@jamesbriggs Жыл бұрын

the simpler stuff yes, but they have some other things like knowledge graph memory + agents that I think are valuable

@sysadmin9396 4 ай бұрын

Hi Sam, how do we keep the Conversation context of multiple users on different devices separate ?

@isaiahsgametube2321 Жыл бұрын

thank you great topic

@jamesbriggs Жыл бұрын

glad you liked it!

@jashwanthl9618 Жыл бұрын

How would I be able to use this with a pinecone vector DB for context ?

@bill13579 Жыл бұрын

Hi James, Great video. I have all my documents stored in Pinecone. I use the in-context approach, i.e., taking the utterance and getting the two most relevant docs. Is there an approach to using conversational memory with pinecone to get the 2 docs?

@jamesbriggs Жыл бұрын

yes we can use retrieval augmentation, I cover it a little in this video kzfaq.info/get/bejne/mriFfKqYs6jahp8.html and this video kzfaq.info/get/bejne/qNhxdsuhx93dl3k.html but these aren't conversational memory or langchain specific I'll be covering the langchain specific approach for retrieval augmentation later this week, and working on figuring out how it can work well with conversational memory

@vinaynaman5697 Жыл бұрын

How to use this conversational memory for custom chatbot along with lagnchain?

@bwilliams060 Жыл бұрын

Hi James, great video. This is probably a stupid comment but here goes.…Could you not just ask the LLM to capture some key variables that summarise the completion for the prompt and then feed that (rather than the full conversation) as ‘memory’ for subsequent prompts? I’m imagining a ‘ghost’ question being added to each prompt like ‘Also capture key variables to summarise the response for future recall’ and then this being used as the assistant message (per GTPTurbo 3.5) rather than all of the previous conversation?

@jianleichen7750 Жыл бұрын

Just curious, what's the openAI cost to complete this course if you choose the pay as you go plan?

@isaacyimgaingkuissu3720 Жыл бұрын

Great content. thanks for that. I'm working on a summary tweets use case, but I don't want to break the overall corpus into pieces, build summary to each one, and combine those summaries into a larger one. I want something more clever. Suppose I have 10 tweets. 6 are related (same topics) and the last 4 are different from each other. I think I can build a better summary from "lang chain summary" by only summarizing the 6 related tweets and adding the 4 raw tweets. This can help not to lose the context for the future.

@jamesbriggs Жыл бұрын

I'm not sure how exactly to implement this, but possibly: 1. embed the tweets 2. when looking to summarize, embed the current query and perform semantic search to identify tweets over a particular similarity threshold to return 3. summarize those retrieved tweets

@FCrobot 10 ай бұрын

In the scenario of conversational robots, how to limit the token consumption of the entire conversation? For example, once the consumption reaches 1,000, it will prompt that the tokens for this conversation have been used up.

@binstitus3909 6 ай бұрын

How can I keep the conversation context of multiple users separately?

@adumont Жыл бұрын

If I understand correctly the graphs, what is represented is the token used per interaction, in the case of the Buffer Memory (the quasi linear one), the 25th interact is about 4k tokens. But the price (in tokens) of the whole conversation up to the 25th interaction is the sum of the price of all the interactions up to the 25th. So basically the price of the conversations, in each case, is the area under the curves you showed, not the highest point it reached. The Summarized conversations, with the flat tendency towards the end, it means the price just keep adding almost the same tokens per each new interaction, not that the price of the conversation has reached a top.

@fire17102 Жыл бұрын

If my math isnt off that should be 25/2 * 4k = 12.5 * 4k = 50k tokens after 25 interactions at $0.002 per 1k tokens (on turbo) that is $0.1 dollars or 1 dime for that whole conversation

@jamesbriggs Жыл бұрын

yeah you're logic is correct, the graphs ended up like this as I wanted to show the limit of buffer memory (ie hitting the token limit) - we had intended to include cumulative total graphs but I didn't get time, planning on putting together a little notebook to show this in the coming days token math checks out for me - it adds up quickly

@sanakmukherjee3929 Жыл бұрын

do u have a substitute of langchain

@agritech802 Жыл бұрын

Can someone let me know where i can get an off the shelf LLM with long term memory? I need it to be able to remember things i tell it, remember where i put stuff etc, I don't mind paying for it.

@antoniosalzano6235 Жыл бұрын

I know that OpenAI’s text embeddings measure the relatedness of text. I am new to this field, so probably for some of you this question would be trivial. Anyway, I was wondering if is it possible to use this technique with source code. I was trying to figure out a way to analyse a source code, but due to token limitation, one way to save prior knowledge could have been this. For example if I have a list of source codes, I can search similarities within the list. Any advice? Is it possible or I am just blathering on?

@jamesbriggs Жыл бұрын

interesting question, I'm not sure as I haven't seen this done before but generally speaking, these language models are just as good (if not better) at generating good code to good natural language, so I'd imagine generating embeddings for code *might* work For dealing with token limits, you can try comparing chunks of code, rather than the full code - if your use-case allows for that

@THCV4 Жыл бұрын

Check out David Shapiro’s latest approach with salient summarization when you get a chance. Essentially: The summarizer can more efficiently pick and choose which context to preserve if it is properly primed with specific objectives/goals for the information.

@jamesbriggs Жыл бұрын

fascinating, love Dave's videos they're great!

@Sciencehub-oq5go Жыл бұрын

How is the model able to judge whether it needs to come to the conclusion: "I don't know."

@adityaroy4261 Жыл бұрын

Can you please please please make a video on how to connect mongoDB with langchain?

@kevinkate4500 9 ай бұрын

@jamesbriggs why transformers are stateless

@billykotsos4642 Жыл бұрын

I swear you have the coolest shirts! Make a drip video too! would watch !

@jamesbriggs Жыл бұрын

Thanks Billy! A drip video??

@did.dynamics8504 Жыл бұрын

no exemple???

@younginnovatorscenterofint8986 Жыл бұрын

Hello, this was interesting. I am currently developing a chatbot with llama index model_name="text-ada-001 or davinci-003. So, based on thousands of documents (external data), the user will ask questions, and the chatbot must respond. When I tried it with just one document, the model performed well, but when I added another, the performance dropped. Could you please advise on a possible solution to this? thank you in advance

@younginnovatorscenterofint8986 Жыл бұрын

my documents are in a form of pdf

@bagamanocnon Жыл бұрын

Hey James, can you share the Collab notebook for this?

@jamesbriggs Жыл бұрын

Yes it’s the chat notebook here github.com/pinecone-io/examples/tree/master/generation/langchain/handbook

@thedailybias5408 Жыл бұрын

Hello James, this method would not work for chat models anymore, right? The code would have to be adjusted to work for the new chat models from langchain. Could you make a new video to cover that?

@jamesbriggs Жыл бұрын

it works for normal LLMs, not for chatbot-only models - but yes I'll be doing another video on this

@thedailybias5408 Жыл бұрын

@@jamesbriggs awesome! Thank you so much for all the work you put in. You got me back to coding :)

@AlbusDumbledore-fr3qg Жыл бұрын

make a video on using this kind of long term memory based chat for sementic search on local files like txt pls

@jamesbriggs Жыл бұрын

planning to do it soon!

@huppahd5101 Жыл бұрын

Hi great content but the gpt-3.5 model already has its conversation memory so instead of davinci you can use that. It is also 10 times cheaper😊

@jamesbriggs Жыл бұрын

thanks for sharing, gpt-3.5-turbo is great! We do demo it in this video during the first example even :) - the reason I share this tutorial anyway is because gpt-3.5-turbo is (using the direct openai api) restricted to the equivalent of `ConversationBufferMemory`, it doesn't do the summary, window, or summary + window memory types We didn't really cover it here but there's also the knowledge graph memory, we'll cover that in the future

@heymichaeldaigler Жыл бұрын

@@jamesbriggs I see, so even if we want to use the turbo model because it is cheaper than davinci, we would still want to explore one of these Langchain memory types?

@fire17102 Жыл бұрын

@@jamesbriggs graph memory looks really interesting, would love to see it utilized with turbo or chatgpi api, also wondering if/when openai will start cacheing tokens for users on their end meaning you would only pay for new data added to the conversation.

@eduardomoscatelli Жыл бұрын

The big problem is that so far I haven't found a solution that doesn't need to insert the entire schema in the prompt itself so that chatgpt understands how to organize and structure the data. Explaining my need better, I extracted information from sales pages via webscrapping and I would like Chatgpt to organize the data collected based on my SCHEMA structure so that I can save them in the database with the fields I created. I wouldn't want to add instructions on how to sort the data in the ChatGPT prompt every time. DOUBT: Question of 1 million dollars 😊: How to "teach" the schema to chatgpt only 1 time and be able to validate infinite texts without having to spend a token inserting the schema in the prompt and without having to train the model via fine-tune?

@RatafakRatafak 6 ай бұрын

For this kind of question you should try more advanced LLM channels

@superchiku Жыл бұрын

Make James Famous ....

@promptjungle Жыл бұрын

He already is

@dallasurban9676 Жыл бұрын

So large language is simply a specialized transformer models. For words. Stable diffusion, and all the others are a specialized transformer model for images. Etc. Right now companies are developing out their own specialized transformer models.

@jamesbriggs Жыл бұрын

for large language models yes, they're essentially specialized and very large transformer models Stable diffusion does contain a transformer or two in the pipeline, but the core "component" of it is the diffusion model, which is different. But the input to this diffusion model includes embeddings which are generated by something like CLIP (which contains a text transformer and vision transformer, ViT) Generally yes, transformers are everywhere, with a couple of other models (like diffusers) scattered around the landscape

@TLabsLLC-AI-Development 4 ай бұрын

Yeah. I count the transformer and diffusion layers to be separate aspects of it but I see what you mean. It's getting so crazy.

@RyushoYosei Жыл бұрын

And yet ChatGPT needs some of this badly as I have seen it massively forget things that it said literally just one or two comments previously.

@did.dynamics8504 Жыл бұрын

IT's not DIALOGUE its a SERIE of Questions .... the AI must dialogue like you make with friend ,

@uncletan888 Жыл бұрын

ChatGPT 4 charge high fees and people should not support it.

@VoyceAtlas Жыл бұрын

we should have a dedicated ai that sumarizes from old chats based on what you are talking about now and then give back less recent convos. a bit of both

@jamesbriggs Жыл бұрын

I think this is similar to the summary + buffer window memory?