LangChain101: Question A 300 Page Book (w/ OpenAI + Pinecone)

  Рет қаралды 203,551

Greg Kamradt (Data Indy)

Greg Kamradt (Data Indy)

Күн бұрын

Twitter: / gregkamradt
Or get updates to your inbox: mail.gregkamradt.com/signup
In this tutorial we will load a PDF book, split it up into documents, get vectors for those documents as embeddings, then ask a question.
--AI Generated Description--
In this tutorial, I am is discussing how to query a book using OpenAI, LangChain, and Pinecone, an external vector store, for semantic search.
I'm demonstrating how to split up the book into documents, use OpenAI embeddings to change them into vectors, and then use Pinecone to store them externally.
I'm then showing how to ask a question and get an answer back in natural language. This technique can be used to query books as well as internal documents or external data sets.
--AI Generated Description--
0:00 - Intro
1:31 - Diagram Overview
3:33 - Code Start
5:46 - Embeddings
6:33 - Pinecone Index Create
7:45 - First Question
9:33 - Ask Questions w/ OpenAI
Code: github.com/gkamradt/langchain...

Пікірлер: 620
@edzehoo
@edzehoo Жыл бұрын
So even Ryan Gosling's getting into this now.
@DataIndependent
@DataIndependent Жыл бұрын
It's a fun topic!
@blockanese3225
@blockanese3225 Жыл бұрын
@@DataIndependent he was referring to the fact you look like Ryan Gosling.
@Author_SoftwareDesigner
@Author_SoftwareDesigner Жыл бұрын
​@@blockanese3225 I think understands that.
@blockanese3225
@blockanese3225 Жыл бұрын
@@Author_SoftwareDesigner lol I couldn’t tell if he understood that when he said it’s a fun topic.
@nigelcrasto
@nigelcrasto Жыл бұрын
yesss
@krisszostak4849
@krisszostak4849 Жыл бұрын
This is absolutely brilliant! I love the way you explain everything and just give away all notes in such detailed and easy to follow way.. 🤩
@blocksystems202
@blocksystems202 Жыл бұрын
No idea how long i've been searching the web for this exact tutorial. Thank you.
@DataIndependent
@DataIndependent Жыл бұрын
Wonderful - glad it worked out.
@64smarketing57
@64smarketing57 Жыл бұрын
This is exactly what I was looking to do, but I could'nt sort it out. This video is legit the best resource on this subject matter. You're gentleman and a scholar. I tip my hat to you, good sir.
@davypeterbraun
@davypeterbraun Жыл бұрын
Your series is just so so good. What a passionate, talented teacher you are!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Thank you!
@NaveenVinta
@NaveenVinta Жыл бұрын
Great job on the video. I understood a lot more in 12 mins than from a day of reading documentation. Would be extremely helpful if you can bookend this video with 1. dependencies and set up and 2. turning this into a web app. If you can make this into a playlist of 3 videos, even better.
@nigelcrasto
@nigelcrasto Жыл бұрын
you know it's something big when The GRAY MAN himself is teaching you AI!!
@sarahroark3356
@sarahroark3356 Жыл бұрын
OMG, this is exactly the functionality I need as a long-form fiction writer, not just to be able to look up continuity stuff in previous works in a series so that I don't contradict myself or reinvent wheels ^^ -- but then to also do productive brainstorming/editing/feedback with the chatbot. I need to figure out how to make exactly this happen! Thank you for the video!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Glad it was helpful
@areacode3816
@areacode3816 Жыл бұрын
Agreed. Do you have any simplified tutorials? Like explaining langchain I fed my novel into chatgpt page by page it worked..ok but I kept running into roadblocks. Memory cache limits and more.
@thebicycleman8062
@thebicycleman8062 Жыл бұрын
@@areacode3816 maybe from ur pinecone reaching its limit? or ur 4000 gpt3 token limit? i would check these first, if its pinecone the fix is easy, jus buy more space, but if its due to gpt then try gpt4 it has double the token at 8k or if that doesnt work i would figure out an intermediary step in between to introduce another sumarizing algorithm before passing it to gpt3
@gjsxnobody7534
@gjsxnobody7534 Жыл бұрын
How would I use this to make a smart chat bot for our chat support on our company? Specific to our company items
@shubhamgupta7730
@shubhamgupta7730 Жыл бұрын
@@gjsxnobody7534I have same query!
@ninonazgaidze1360
@ninonazgaidze1360 10 ай бұрын
This is super awesome!!! And so easily explained! You made my year. Please keep up the greatest work
@virendersingh9377
@virendersingh9377 Жыл бұрын
I like the video because it was to the point and the presentation with the initial overview diagram is great.
@Crowward92
@Crowward92 Жыл бұрын
Great video man. Loved it. I had been looking for this solution for some time. Keep up the good work.
@lostnotfoundyet
@lostnotfoundyet Жыл бұрын
thanks for making these videos! I've been going through the playlist and learning a lot. One thing I wanted to mention that I find really helpful in addition to the concepts explained is the background music! Would love to get that playlist :)
@DataIndependent
@DataIndependent Жыл бұрын
Thank you! A lot of people gave constructive feedback that they didn't like it. Especially when they sped up the track and listed to it on 1.2x or 1.5x Here is where I got the music! lofigenerator.com/
@nickpetolick4358
@nickpetolick4358 Жыл бұрын
This is the best video i've watched explaining the use of pinecone.
@DataIndependent
@DataIndependent Жыл бұрын
Nice!!
@401kandBeyond
@401kandBeyond Жыл бұрын
This is a great video and Greg is awesome. Let's hope he puts together a course!
@DanielChen90
@DanielChen90 Жыл бұрын
Great tutorial bro. You're really doing good out here for us the ignorant. Took me a while to figure out that I needed to run pip install pinecone-client to install pinecone. So this is for anyone else who is stuck there
@DataIndependent
@DataIndependent Жыл бұрын
Glad it worked out
@nsitkarana
@nsitkarana Жыл бұрын
Nice video. i tweaked the code and split the index part and the query part so that i can index once and keep querying - like how we would do in the real world. Nicely put together !!
@babakbandpey
@babakbandpey Жыл бұрын
Hello, Do you have an example of how you did that. This is the part that I have become confused about how to reuse the same indexes. Thanks
@karimhadni9858
@karimhadni9858 Жыл бұрын
Can you pls provide an example?
@thespiritualmindset3580
@thespiritualmindset3580 10 ай бұрын
this helped me a lot, thanks, for the updated code in description as well!
@HelenJackson-pq4nm
@HelenJackson-pq4nm Жыл бұрын
Really clear, useful demo - thanks for sharing
@MrWrklez
@MrWrklez Жыл бұрын
Awesome example, thanks for putting this together!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Glad it worked out. Let me know if you have any questions
@Mr_Chiro_
@Mr_Chiro_ Жыл бұрын
Thank you soooo much I am using this knowledge soo much for my school projects.
@PatrickCallaghan-yf2sf
@PatrickCallaghan-yf2sf 10 ай бұрын
Fantastic video thanks. I obtained excellent results (accuracy) following your guide compared to other tutorials I tried previously.
@DataIndependent
@DataIndependent 10 ай бұрын
Ah that's great - thanks for the comment
@aaanas
@aaanas 9 ай бұрын
Was the starter tier of pinecone enough for you?
@PatrickCallaghan-yf2sf
@PatrickCallaghan-yf2sf 9 ай бұрын
Its one project only on starter tier, that one project can contain multiple documents under one vector vector db. For me it was certainty enough to get an understanding of the potential. From my limited experience, to create multiple vector db's for different project types you will need to premium/paid and the cost is quite high. There may be other competitors offering cheaper entry level if you wish to develop apps but for a hobbyist/learning the starter tier on pinecone is fine IMO.
@ShadowScales
@ShadowScales 9 ай бұрын
bro thank you so much honestly this video means so much to me, I really appreciate this all the best in all your future endeavors
@DataIndependent
@DataIndependent 8 ай бұрын
Love it - what was your use case?
@vinosamari
@vinosamari Жыл бұрын
Can you do a more indepth Pinecone video? It seems like an interesting concept alongside embeddings and i think it'll help seam together the understanding of embeddings for more 'web devs' like me. I like how you used relatable terms while introducing it in this video and i think it deserves its own space. Please consider an Embeddings + Pinecone fundamentals video. Thank you.
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Thank you. What's the question you have about the process?
@ziga1998
@ziga1998 Жыл бұрын
@@DataIndependent I thinks that general pinecone video would be great, and connecting it with LangChain and building similar apps to this would be awesome
@ko-Daegu
@ko-Daegu Жыл бұрын
Weaviet is even better
@____2080_____
@____2080_____ Жыл бұрын
This is such a game changer. Can’t wait to hook all of this up to GPT-4 as well as countless other things
@DataIndependent
@DataIndependent Жыл бұрын
Nice! What other ideas do you think it should be hooked up to?
@____2080_____
@____2080_____ Жыл бұрын
Thumbs up and subscribed.
@davidzhang4825
@davidzhang4825 Жыл бұрын
This is gold ! please do another one with data in Excel or Google sheet please :)
@ThomasODuffy
@ThomasODuffy Жыл бұрын
Thanks for this very helpful practical tutorial!
@ramachinta3140
@ramachinta3140 26 күн бұрын
Very helpful Video, Thank you!
@sunbisoft9556
@sunbisoft9556 Жыл бұрын
Got to say, you are awesome! Keep up the good work, you got a subscriber here!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Thank you. I just ordered upgrades for my recording set up so quality will increase soon.
@luisarango-jm8eq
@luisarango-jm8eq Жыл бұрын
Love this brother!
@CarloNyte
@CarloNyte Жыл бұрын
Duudee!!! This video is exactly what I was looking for! Still a complete noob at all this LLM integration stuff and so visual tutorials are so incredibly helpful! Thank you for putting this together 🙌🏿🎉🙌🏿
@DataIndependent
@DataIndependent Жыл бұрын
Great to hear! Checkout the video on the '7 core concepts' which may help round out the learnings
@bartvandeenen
@bartvandeenen Жыл бұрын
I actually scanned the whole Mars trilogy to have something substantial, and it works fine. The queries generally return decent answers, although some of them are way off. Thanks for your excellent work!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Glad to hear it. How many pages/words is the mars trilogy?
@bartvandeenen
@bartvandeenen Жыл бұрын
@@DataIndependent About 1500 pages in total.
@keithprice3369
@keithprice3369 Жыл бұрын
Did you look at the results returned from Pinecone so you could determine if the answers that were off were due to Pinecone not providing the right context or OpenAi not interpreting the data correctly?
@bartvandeenen
@bartvandeenen Жыл бұрын
@@keithprice3369 no I haven't.good idea to do this. I know have gpt4 access so can use much larger prompts
@keithprice3369
@keithprice3369 Жыл бұрын
@@bartvandeenen I've been watching a few videos about LangChain and they did bring up that the chunk size (and overlap) can have a huge impact on the quality of the results. They not only said there hasn't been much research on an ideal size but they said it should likely vary depending on the structure of the document. One presenter suggested 3 sentences with overlap might be a good starting point. But I don't know enough about LangChain, yet, to know how you specify a split on the number of sentences vs just a chunk size.
@RodolphoPortoSantista
@RodolphoPortoSantista Жыл бұрын
This video is very good!
@tunle3980
@tunle3980 Жыл бұрын
Thank you very much for doing this. It's absolutely awesome!!! Also can you do a video on how to improve the quality of answers?
@user-xp2ym1ng2h
@user-xp2ym1ng2h 9 ай бұрын
Thanks as always Greg!
@DataIndependent
@DataIndependent 9 ай бұрын
Awesome thank you
@johnsmith21170
@johnsmith21170 Жыл бұрын
awesome video, very helpful! thank you
@DataIndependent
@DataIndependent Жыл бұрын
Love it thank you
@guilianamustiga2962
@guilianamustiga2962 9 ай бұрын
thank you Greg! very helpful tutorial!!
@DataIndependent
@DataIndependent 9 ай бұрын
Thanks Guiliana!
@waeldimassi3355
@waeldimassi3355 Жыл бұрын
Amazing work ! thank you so much !!
@haouasy
@haouasy 10 ай бұрын
Amazing content man , love the diagrams and how you deliver ,absolutely professional . quick question , is the text returned by the chain is exactly the same from the book or does the openAI engine make some touches and make it better ?
@ritik1857
@ritik1857 Жыл бұрын
Thanks Ryan!
@caiyu538
@caiyu538 11 ай бұрын
Great series.
@walter7812
@walter7812 9 ай бұрын
Great tutorial, thanks so much!
@DataIndependent
@DataIndependent 9 ай бұрын
Awesome thanks Walter
@lukaszwiktor
@lukaszwiktor Жыл бұрын
This is gold! Thank you so much!
@DataIndependent
@DataIndependent Жыл бұрын
Thank you!
@Juniorventura29
@Juniorventura29 Жыл бұрын
Awesome tutorial, brief and easy to understand, Do you think this could be an approach to make semantic search on private data from clients? my concern is data privacy so, I guess by using pinecone and openAI, is that openAI only process what we send (to respond in a NL), but they don't store any of our documents.
@geethaachar8495
@geethaachar8495 Жыл бұрын
That was fabulous thank you
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Glad to hear it
@dogchaser520
@dogchaser520 Жыл бұрын
Succinct and easy to follow. Very cool.
@JoanSubiratsLlaveria
@JoanSubiratsLlaveria 11 ай бұрын
Excellent video!
@agiveon1999
@agiveon1999 Жыл бұрын
This is great, thanks! have you thought about how to extend it to be able to CHAT about the book? (as opposed to a question at a time). I am running into problems figuring out when to keep a chain of chat and when to realize its a new or related question that needs new pulling of similar docs
@svgtdnn6149
@svgtdnn6149 Жыл бұрын
thanks for the great content! do you know how to better control the cost of having such a retrieval-based chatbot? Based on my experience, it is quite costly to run QnA on just the simple pdf that provided in LangChain repo, using default embeddings and llm models provided from the langchain example
@3278andy
@3278andy Жыл бұрын
Amazing tutorial Greg! I'm able to reproduce your result in my env, I think in order to ask about follow up questions, chat_history should be handy
@saburspeaks
@saburspeaks Жыл бұрын
Amazing stuff with these videos
@DataIndependent
@DataIndependent Жыл бұрын
Glad you like them!
@rajivraghu9857
@rajivraghu9857 Жыл бұрын
Excellent 👍
@JuaniPisula
@JuaniPisula Жыл бұрын
Great video! Do you know how Pinecone deals with the similarity of sequences of different length? For example, matching the 1k tokens documents in the video's db with the short query questions you ask.
@pramodm6168
@pramodm6168 Жыл бұрын
Thank you - Super helpful to understand how to use external data sources with OpenAI. What are some of the limitations of this approach i.e. size of content being indexed in pinecone, any limits on correlating and summarizing data across multiple documents/sources, can I combine multiple types of sources of information about a certain topic (document, database, blogs, cases etc.) into a single large vector?
@roberthahn9040
@roberthahn9040 Жыл бұрын
Really awesome video!
@DataIndependent
@DataIndependent Жыл бұрын
Nice!! Thank you - what else do you want to see?
@RomuloMagalhaesAutoTOPO
@RomuloMagalhaesAutoTOPO Жыл бұрын
Great explanation. Thank you.
@DataIndependent
@DataIndependent Жыл бұрын
Thank you! That's great
@philipsnowden
@philipsnowden Жыл бұрын
Your videos are amazing. Keep it up and thanks!
@DataIndependent
@DataIndependent Жыл бұрын
Thanks Philip. Anything else you want to see?
@philipsnowden
@philipsnowden Жыл бұрын
@@DataIndependent I'm curious what's a better option for this use case and would love to hear your thoughts. Why LangChain over Haystack? I want to pass through thousands of text documents into a question answering system and am still learning the best way to structure it. Also, an integration into something like Paperless would be cool! I'm a total noob so excuse my ignorance. Thanks!
@DataIndependent
@DataIndependent Жыл бұрын
@@philipsnowden I haven't used Haystack yet so I can't comment on it. If you have 1K text documents you'll definitely want to get embeddings and store them, retrieve them, then pass them into your prompt for the answer. Haven't used paperless yet either :)
@philipsnowden
@philipsnowden Жыл бұрын
@@DataIndependent Good info, thank you.
@philipsnowden
@philipsnowden Жыл бұрын
@@DataIndependent Could you do a more in depth explainer on this? I'm struggling to take a directory of text files and get it going. I've been reading and trying the docs for langchain but am having a hard time . And can you use the new turbo 3.5 model to answer the questions? Thanks for your time, have a tip jar?
@cheunghenrik7041
@cheunghenrik7041 Жыл бұрын
Thanks for the tutorial series! May I ask could I work with multiple different PDFs at the same time (except combining them?)?
@rodrigomarques7128
@rodrigomarques7128 10 ай бұрын
This is awesome!!!!
@DataIndependent
@DataIndependent 10 ай бұрын
Nice! Glad it worked out
@rodrigomarques7128
@rodrigomarques7128 10 ай бұрын
@@DataIndependent what's open source alternative you indicate for the model embedding and QA model?
@rayxiao460
@rayxiao460 Жыл бұрын
Very impressive.great job.
@quantum_ocean
@quantum_ocean Жыл бұрын
Thanks for sharing. Could you elaborate on why you didn’t use overlap?
@nattapongthanngam7216
@nattapongthanngam7216 3 ай бұрын
Appreciate it!
@thepracticaltechie
@thepracticaltechie Жыл бұрын
Awesome video! Is there a way to embed the prompt and response interface into a website, more like a chatbot experience?
@user-vc2sc9rq7t
@user-vc2sc9rq7t Жыл бұрын
Thanks for your tutorials on Langchain, certainly helps alot and appreciate what you're doing here! Would like to better understand how pinecone helps in this use case as compared to your prev tutorial on 'custom files +chatgpt'. Would i be able to upload multiple documents to query in that prev tutorial or would pinecone be necessary?
@DataIndependent
@DataIndependent Жыл бұрын
Pinecone is good when you want to store your vectors in the cloud. This can help when you're building a more robust app. In the previous tutorial I was using Chroma which is more local based.
@sabashioyaki6227
@sabashioyaki6227 Жыл бұрын
This is definitely cool, thank you. There seem to be several dependencies left out. It would be great if all dependencies were shown or listed...
@DataIndependent
@DataIndependent Жыл бұрын
ok, thank you and will do. Are you having a hard time installing them all?
@benfield1866
@benfield1866 Жыл бұрын
@@DataIndependent hey I'm stuck on the dependency part as well
@retardedpenguin1
@retardedpenguin1 Жыл бұрын
How do you get around rate limits for really large documents? OpenAI ada embeddings model can only take up to a certain amount of requests/ chunk sizes per minute.
@andytesii
@andytesii Жыл бұрын
love it!
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Thank you
@alvaromseixas
@alvaromseixas Жыл бұрын
Hey, Greg! I'm trying to connect the dots on GPT + langchain and your videos have been excelent sources! To give it a try, I'm planning to build some kind of personal assistant for a specific industry (i.e. law, healthcare), and down the road the vector database will become pretty big. Any guideline on how to sort the best results and also how to show the source of where the information was pulled from?
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Check out the langchain documentation for "q&a with sources" you're able to get them back pretty easily.
@ininodez
@ininodez Жыл бұрын
Great video!! Loved your explanation. Could you create another video on how to estimate the costs? Is the process of turning the Documents to Embeddings using OpenAI running every time you make a new question? or just the first time? Thanks!
@silent.-killer
@silent.-killer Жыл бұрын
Pinecone is basically a search engine for ai. It doesn't need the entire book but just segments of it instead. This saves a lot of tokens cause only segments of information end up in the prompt. Like adding some information into gpt's short term memory
@mlg4035
@mlg4035 Жыл бұрын
Short, but very sweet video! Question: does this work for documents in other languages? Say, Japanese, for example? And, is there a text splitter for Japanese? (a la ChaSen, Kuromoji, etc.)
@kelvinromero
@kelvinromero 10 ай бұрын
Hey Greg amazing content, learning a lot from your videos! But I'm running into a problem, I was looking into the source code, and I noticed that the Pinecone.from_texts method indexes/stores the data, so it's not ideal to be running multiple times, right? Do you have any suggestion to improve this?
@shaunchen5054
@shaunchen5054 Жыл бұрын
Great video , I am wondering is there way to use the PDFs which made from photocopy of the document ( need to convert image to text )
@kennt7575
@kennt7575 Жыл бұрын
It’s incredible instructions. In my case, I have some documents in Vietnamese language, will Pinecone support utf8 ? OpenAI + langchain + pincone,.. very helpful in many fields especially in customer services
@sangeetkumar6337
@sangeetkumar6337 Жыл бұрын
It's really a great video to get start with langchain. I have a small confusion here. what if I want to send all the similar docs to the llm model not just k=5. Is there a way to deal with it?
@sunil_modi1
@sunil_modi1 Жыл бұрын
Your videos is really awesome and very helpful. What approach should i take if i want to make semantic search from structured (tabular) data instead of free text using openai and langchain?
@DataIndependent
@DataIndependent Жыл бұрын
There might be a better answer out there...but my take is that, since you'll need to feed text into OpenAI, then you can make documents out of your rows first, get embeddings for those documents, then do your similarity search. It'll take some translation and file formatting
@sovopl
@sovopl Жыл бұрын
Great tutorial, I wonder how to generate questions based on the content of the book? I would probably have to pass the entire content of the book to the GPT model.
@yonathan310393
@yonathan310393 Жыл бұрын
This is a great video. It helped a lot. I have a question. I am new to this, and I am having trouble splitting this code to make the queries now directly to the previously uploaded data, instead of uploading the vectors again. I want to use what I already have in Pinecone. How do i do that?
@tazahglobal8662
@tazahglobal8662 Жыл бұрын
Loved it. 1 Question, what model of openai does this approach uses? For example, davinci etc?
@jonathancrichlow5123
@jonathancrichlow5123 Жыл бұрын
this is awesome! my question is, what happens when the model is asked a question outside of the knowledge base that was just uploaded? For example, what would happen if you asked a question about who is the best soccer player?
@satvikparamkusham7454
@satvikparamkusham7454 Жыл бұрын
Excellent video! Thanks for this! Is there a way to use conversational memory while doing generative Q&A?
@DataIndependent
@DataIndependent Жыл бұрын
Big time - check out the latest webinar on this exact topic. It should be on the langchain twitter
@knallkork700
@knallkork700 Жыл бұрын
Hey, great video! What do you mean when you say that it's going to be more expensive with additional documents? What drives the cost? Thank you!
@danilovaccalluzzo
@danilovaccalluzzo Жыл бұрын
great video. thanks so much. How do you query the index without creating the embeddings all the time? is it possible? thanks
@nihonkeizaishinbun2254
@nihonkeizaishinbun2254 Жыл бұрын
Hi, i found this : docsearch = Pinecone.from_existing_index(index_name, embeddings)
@satheeshthangaraj5614
@satheeshthangaraj5614 Жыл бұрын
Hi thanks for sharing, if we want to deploy this code in AWS as web app, what changes we should do.
@fareedbehardien
@fareedbehardien Жыл бұрын
Would love to see an example of adding another book after you've done this one. What would be some of the considerations and fine-tuning you'd make as a result of the second upload
@DataIndependent
@DataIndependent Жыл бұрын
You could add more documents to your existing index and it shouldn't be a problem. However once you start to add a bunch of information, pre-filtering your vectors will become more important. Ex: If you know the answer comes from 1 of your 3 books then you can tell Pinecone to only return docs from that 1 book
@lnyxiux9654
@lnyxiux9654 Жыл бұрын
Thanks for sharing !
@DataIndependent
@DataIndependent Жыл бұрын
Nice! Glad it worked out
@lnyxiux9654
@lnyxiux9654 Жыл бұрын
@@DataIndependent Yep ! It was a bit of pain to get unstructured properly set up but after that it's all good. Impressive results very quickly !
@DataIndependent
@DataIndependent Жыл бұрын
@@lnyxiux9654 I shared the same pain...that part didn't make it to the video
@pedrorios6566
@pedrorios6566 7 ай бұрын
Every time I run the cell with the emmbeding class do I get a charge from OpenAI? What option can I use to do the embedding load only once (for example to make queries available through a web application)?
@quengelbeard
@quengelbeard 3 ай бұрын
Hey Greg, great video! Do you know if it's possible to automatically create a pinecone db index from code? So that you don't have to create them manually
@HerroEverynyan
@HerroEverynyan Жыл бұрын
Hi! Awesome tutorial. This is exactly what I was looking for. I really love this series you've started and hope you'll keep it up. I also wanted to ask: 1. What's the difference between using Pinecone or another vector store like Chrome, FAISS, Weaviate, etc? And what made you choose Pinecone for this particular tutorial? 2. What was the cost for creating embeddings for this book? (time & money) 3. Is there a way to estimate the cost of embeddings with LangChain beforehand? Thank you very much and looking forward to more vids like this! 🤟
@DataIndependent
@DataIndependent Жыл бұрын
For your questions 1. The difference with Pinecone/Chrome,etc. Not much. They store your embeddings and they run a similarity calc for you. However the space is super new, as things progress one may be a no brainer over another. Ex: You could also do this in GCP but you'd have to deal with their overhead as well. 2. Hm, unsure about the book but here is the pricing for Ada embeddings: $0.0004 / 1K tokens. So if you had 120K word book which is ~147K tokens, it would be $.05. Not very steep... 3. Yes, you can calc the number of tokens you're going to use and the task, then look up their pricing table and see how much it'll be.
@DataIndependent
@DataIndependent Жыл бұрын
​@@myplaylista1594 This one should help out help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
@klaudioz_
@klaudioz_ Жыл бұрын
@@DataIndependent It can't be so expensive. text-embedding-ada-002 is about ~3,000 pages per US dollar (assuming ~800 tokens per page).
@DataIndependent
@DataIndependent Жыл бұрын
@@klaudioz_ ya, you’re right my mistake. I didn’t divide by the extra thousand in the previous calc. Fixing now
@klaudioz_
@klaudioz_ Жыл бұрын
@@DataIndependent No problem. Thanks for your great videos !!
@roberthuff3122
@roberthuff3122 Жыл бұрын
Great stuff! What GUI wrapper do you recommend?
@kennethleung4487
@kennethleung4487 Жыл бұрын
Awesome video as always. Noticed that there is the standard load_qa_chain, and on the other hand we also have VectorDBQA. Which one should be the choice to go for?
@DataIndependent
@DataIndependent Жыл бұрын
Depends on your task. The VectorDBQA will be a convenient way do handle the document similarity for you. Or you could do it manually yourself w/ load_qa_chain.
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
Great content
@DataIndependent
@DataIndependent Жыл бұрын
Thanks!
@HiteshGulati
@HiteshGulati Жыл бұрын
Hi thanks for the detail video. I was able to follow your video and create a QnA chat bot. One place I am stuck ishow can I reuse the embedings created earlier, is there a way to fetch already saved embedings from pinecone db into docsearch variable. Any suggestion would be helpful :)
@daryladhityahenry
@daryladhityahenry Жыл бұрын
Hi. I kind of curious, with so many open source chat gpt like right now, can we use that instead of openAI API? For example, using dolli and use only about 8B parameter. Is it possible? And also, about the embeddings, we can use another embedding too right? Is it the same with bag of words kind of thing? Thank you. Great video!
@valdinia-office2910
@valdinia-office2910 Жыл бұрын
In LangChain is "similarity search" used as a synonym for "semantic search", or they are referring to different types of search? To my knowledge similarity search focuses on finding items that are similar based on their features or characteristics, while semantic search aims to understand the meaning and intent behind the query to provide contextually relevant results
@sumitbakhli2049
@sumitbakhli2049 Жыл бұрын
I am getting Index 'None' not found in your Pinecone project. Did you mean one of the following indexes : langchain1 for below line docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name) Any idea what the issue could be. I checked index_name variable is set correctly as langchain1
@memesofproduction27
@memesofproduction27 Жыл бұрын
Great tut, thank you. Any advice on vectorizing a ton of widely varied documents? How many qa chatbots? One per index?
@DataIndependent
@DataIndependent Жыл бұрын
Hm how many chatbots will depend on your product use case. I would put them in the same index, but make sure your metadata is explicit so you can easily filter with them
@memesofproduction27
@memesofproduction27 Жыл бұрын
@@DataIndependent Thank you
@itsajaething
@itsajaething Жыл бұрын
Is there a way you could program it to reference your external documents but also use the internet to find additional information?
@cnmoro55
@cnmoro55 Жыл бұрын
How does langchain wraps the history of the chat ? Or it doesn't ? Internally, how does it send the prompt to OpenAI ? Thanks for the amazing tutorial
@adamsnook5135
@adamsnook5135 Жыл бұрын
Hi great video, look forward to diving into some more of your stuff. I just wanted to ask a question about using this method but to query something like Airtable information. I have a hotel company and it would be really useful to be able to have users ask questions on the data I’ve collected about the hotels. Thank you! Also have you looked into Xata?
@DataIndependent
@DataIndependent Жыл бұрын
Check out the CSV loader which can be used when you extract data from airtable
@counsellb
@counsellb Жыл бұрын
Thanks for the video, super helpful! How can I do the same but for a csv file where each line is a product, and product information is organised by headers? (not passing it a pdf, all the tutorials seem to be for PDFs)
@DataIndependent
@DataIndependent Жыл бұрын
Yes you can! Check out the langchain csv loader for this
@ravisawhney3111
@ravisawhney3111 Жыл бұрын
Great video, how do I call the embeddings from pinecone next time I run the application (instead of having to generating them again via openai at a cost)?
@yonathan310393
@yonathan310393 Жыл бұрын
Great Question. Did you ever get a response? I am looking for the same thing
Workaround OpenAI's Token Limit With Chain Types
15:53
Greg Kamradt (Data Indy)
Рет қаралды 61 М.
The LangChain Cookbook - Beginner Guide To 7 Essential Concepts
38:11
Greg Kamradt (Data Indy)
Рет қаралды 335 М.
Despicable Me Fart Blaster
00:51
_vector_
Рет қаралды 24 МЛН
НРАВИТСЯ ЭТОТ ФОРМАТ??
00:37
МЯТНАЯ ФАНТА
Рет қаралды 1,6 МЛН
OpenAI Embeddings and Vector Databases Crash Course
18:41
Adrian Twarog
Рет қаралды 427 М.
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
How to build chat with your data using Pinecone, LangChain and OpenAI
15:05
Vector Databases simply explained! (Embeddings & Indexes)
4:23
AssemblyAI
Рет қаралды 301 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 162 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Vector Database Explained | What is Vector Database?
6:52
codebasics
Рет қаралды 72 М.