A Guide to Distributed PostgreSQL
18:14
Пікірлер
@benixmaximus
@benixmaximus 7 күн бұрын
It seems like the Vector extension is no longer available in azure.extensions. I tried this with my work instance and then setup my own personal instance and it's not available in either. Would be interested if anyone knows anything about it as I haven't found this topic in any forums
@gabrieloliveira5040
@gabrieloliveira5040 Ай бұрын
I'm loving these videos, thank you so much!
@DevMastersDb
@DevMastersDb Ай бұрын
Hey, thanks a lot for sharing feedback! It means a lot to me.
@ivar325
@ivar325 Ай бұрын
hi denis..need more videos on azure ai use cases in postgres flexible server. please assist
@DevMastersDb
@DevMastersDb Ай бұрын
Hey, sounds good. I’ll keep this in mind. Any specific use cases?
@ivar325
@ivar325 Ай бұрын
@@DevMastersDb sentiment analysis
@mohammadballour6504
@mohammadballour6504 Ай бұрын
Great explanation. Thank you
@DevMastersDb
@DevMastersDb Ай бұрын
Glad it was helpful!
@yerrysherry135
@yerrysherry135 Ай бұрын
A very good video. Clearly explained. Have you thought about becoming a teacher? Thank you very much!!
@DevMastersDb
@DevMastersDb Ай бұрын
Thanks, I'm glad you liked it! Teaching is one of the main reasons I work as a Developer Relations professional. In this role, I can spend a lot of time sharing my knowledge with others and being paid for that ;)
@debarghyadasgupta1931
@debarghyadasgupta1931 Ай бұрын
What is your recommendation for very large vector column/table? The value of ‘m’ and ef_construction
@DevMastersDb
@DevMastersDb Ай бұрын
It depends on the target recall (search accuracy/relevancy) and performance for the similarity searches. The greater the `m`, the better recall, but the index build time increases, and query performance might also be impacted. Increasing the `ef_construction` can lead to better recall, but will also increase index build time. I would start with the defaults (m=16 and ef_construction=64) and increase/adjust them once you find the right balance between recall, query performance and index build time. Note, the ef_construction needs to be at least double of m.
@rafastepniewski6135
@rafastepniewski6135 Ай бұрын
Great job done here. Thanks a lot for sharing this.
@DevMastersDb
@DevMastersDb Ай бұрын
Many thanks!
@anuragangara2619
@anuragangara2619 2 ай бұрын
Thanks for this video! Got some really promising results in just a few hours! Quick question -- I don't need the context to have the full table ["products", "users", etc]. For my use-case I only need it to have context for the user. (i.e. products for user 1005, purchases for user 1005, etc) If I provide the full tables in include_tables, it very quickly reaches the token limit. Is there a way to dynamically reduce the amount of context when initializing the Langchain Database agent?
@DevMastersDb
@DevMastersDb 2 ай бұрын
Excellent! glad you managed to get it working on your end 👍 Try to instruct the LLM to retrieve that context dynamically. For instance, you can say to execute the query "select name, price from products where id = {id}" setting the id as a parameter. Then, the LLM can perform this request over the database and pull a user-specific context. Also, LangChain support various tools/actions that let LLM pull info from other sources or perform various actions: python.langchain.com/v0.1/docs/modules/tools/
@anuragangara2619
@anuragangara2619 2 ай бұрын
@@DevMastersDb That makes sense, the issue is (unless I'm misunderstanding), passing entire tables to the LLM, regardless of whether the LLM knows that it should filter down to a subset of data, seems to take a lot of tokens. i.e. We're providing a lot of context, and then asking the LLM to disregard most of it (as opposed to providing the narrow context in the first place). As a result, if I add more than two tables, I get the error: {'error': {'message': "This model's maximum context length is 4097 tokens, however you requested 10494 tokens. Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}} I'm extremely new to this stuff (just a day or two), so I could totally be missing something! One thing I'm going to try next is to create a View with the data after I've applied some joins and filters to it and then pass the view in include_tables instead, so I'm providing just the minimum context the model would need.. not sure if that'll work, or is even the right way of thinking about it 🤔
@DevMastersDb
@DevMastersDb 2 ай бұрын
@@anuragangara2619 how many tables do you folks have? First things, yes, ask the agent to look into those which are necessary by defining them in the "include_tables". The LLM will pull only the metadata of those tables and not the actual data. So, it should fit in the context window. And then in your system message for the LLM you also say that use this query to pull the actual data for a selected user. If that data set doesn't fit into the context, then try to filter by the user_id/product_id and some time frame. Anyway, take it slowly and learn LangChain capabilities by playing with a few tutorials. You'll definitely get it figured!
@anuragangara2619
@anuragangara2619 2 ай бұрын
@@DevMastersDb Hmm, odd -- only 5 or 6 tables; personal project, so just me. I'll look into that then! Thanks so much for the great video, really enjoyed this!
@DevMastersDb
@DevMastersDb 2 ай бұрын
@@anuragangara2619 make sure that LLM doesn’t pull data, enable the “verbose” mode to see what happens. Also, start with 1-2 tables and then add more, see when it breaks. It might have to discover the root cause. Glad you liked the video, thanks for feedback!
@debarghyadasgupta1931
@debarghyadasgupta1931 2 ай бұрын
Can I achieve the same setup using a local Postgres and Azure OpenAI? Any guidance would be highly appreciated. I'm attempting this on Windows, so my Postgres is running on Windows. I'm interested in using Azure OpenAI embedding to achieve the same result. I watched your video where you demonstrated using an Azure Postgres instance to achieve this.
@DevMastersDb
@DevMastersDb 2 ай бұрын
Yes, it should be doable. You need to use an Azure OpenAI SDK (or another framework that supports Azure embedding models) to generate embeddings. Here is a quick tutorial by Microsoft for Python: learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/embeddings Btw, what's your programming language?
@debarghyadasgupta1931
@debarghyadasgupta1931 2 ай бұрын
@@DevMastersDb thank you for your reply. Using Python. Let me go through link. Thanks again man. Really appreciate it.
@DevMastersDb
@DevMastersDb 2 ай бұрын
@@debarghyadasgupta1931 excellent!
@debarghyadasgupta1931
@debarghyadasgupta1931 Ай бұрын
@@DevMastersDb I have multiple tables and these tables have different columns. I would like to generate vector for multiple tables. What is your recommendation for vector storage. Same table with new vector column or new table specifically for the vector?
@DevMastersDb
@DevMastersDb Ай бұрын
@@debarghyadasgupta1931 Postgres stores vectors with a large number of dimensions in the internal TOAST (The Oversized-Attribute Storage Technique) tables. Thus, from the storage perspective the vector columns can be either in your existing tables with other data or in separate tables. I would store the vector columns in the original tables if I need to filter data by the vector columns and other columns frequently. For instance, if most of the times your queries look like "select * from table1 where vector_column <=> ... and other_column = 5 or another_column = 4" then it makes sense to store vectors in the original tables with other data. Otherwise you would need to join the original tables with the tables dedicated for the vectors.
@debarghyadasgupta1931
@debarghyadasgupta1931 2 ай бұрын
I have set the following the application properties for the root CA DATABASE_CA_CERT_FILE= C:\Users\deb\Documents\pgvector-azure\DigiCertGlobalRootCA.crt Getting this error ❯ node .\backend\embeddings_generator.js node:internal/process/promises:289 triggerUncaughtException(err, true /* fromPromise */); ^ Error: self-signed certificate in certificate chain at TLSSocket.onConnectSecure (node:_tls_wrap:1674:34) at TLSSocket.emit (node:events:519:28) at TLSSocket._finishInit (node:_tls_wrap:1085:8) at ssl.onhandshakedone (node:_tls_wrap:871:12) { code: 'SELF_SIGNED_CERT_IN_CHAIN' Any clue is highly appreciated. Using Azure Public so relevant cert is used.
@DevMastersDb
@DevMastersDb 2 ай бұрын
Try to add the "rejectUnauthorized:false" flag to the SSL settings of the database connection parameteres. ssl: { rejectUnauthorized: false, ca: fs.readFileSync('path_to_your_root_certificate').toString()}, But get a proper certificate before you go in prod :)
@abrahammoyo3457
@abrahammoyo3457 2 ай бұрын
Denis is a legend!!!!! 😂. Slow and understandable!
@DevMastersDb
@DevMastersDb 2 ай бұрын
Thanks! Let’s learn it slow and gradually:)
@ifeolu8501
@ifeolu8501 2 ай бұрын
Hi Denis, is there a way we could use PostgresML to specify it text if addressing certain topic, we could go to another table. This way we enhance the power of PostgresML and Spring AI
@DevMastersDb
@DevMastersDb 2 ай бұрын
Hey, sure, even though Spring AI doesn't support PostgresML natively (and the PostgresML team doesn't have a client library for Java yet), you still can use Spring JDBC Client (or another driver/framework) to work with PostgresML using the SQL APIs. Btw, this is a PostgresML video that I shot in the past: kzfaq.info/get/bejne/fdqqa5tkycfKYY0.html
@ifeolu8501
@ifeolu8501 2 ай бұрын
@@DevMastersDb Thanks I figured out that langchain is a better option for me. As I can convert natural language to SQL. However I need to figure how to convert python to GPT. Then how to use GPT
@DevMastersDb
@DevMastersDb 2 ай бұрын
@@ifeolu8501 check out LangChain4j if Java is better option for you
@engdoretto
@engdoretto 2 ай бұрын
Thanks a lot!
@DevMastersDb
@DevMastersDb 2 ай бұрын
You're welcome!
@efficiencygeek
@efficiencygeek 2 ай бұрын
Thanks Denis for putting this great tutorial together. One of the best topics I encountered in PGDay Chicago 2024. It was a pleasure being a cohost during your session.
@DevMastersDb
@DevMastersDb 2 ай бұрын
Hey, thanks for sharing your feedback and picking my session among the others! Staying in touch.
@kermygorst9692
@kermygorst9692 2 ай бұрын
Promo'SM
@daphenomenalz4100
@daphenomenalz4100 3 ай бұрын
Love your videos!
@DevMastersDb
@DevMastersDb 3 ай бұрын
So glad! Thanks for support!
@MrApalazuelos
@MrApalazuelos 3 ай бұрын
Thanks Denis for sharing this. My only suggestion will be to use the --net host option instead of -p 5432:5432. I ran some benchmarking and the first option is much more performant as it doesn't use the docker network isolation for communicating with the container.
@DevMastersDb
@DevMastersDb 3 ай бұрын
Yes, agree. I always use a dedicated network when need containerized applications talk to a Postgres instance in Docker.
@MrApalazuelos
@MrApalazuelos 3 ай бұрын
Another simple and awesome video. Great work pal! I can't wait for the next episode 🍿
@DevMastersDb
@DevMastersDb 3 ай бұрын
Thanks a ton, my friend! I’ll do my best to get the good stuff coming 👍
@daphenomenalz4100
@daphenomenalz4100 3 ай бұрын
:D
@jugurtha292
@jugurtha292 3 ай бұрын
Hey, man awsome content! do you mind telling where to learn about postgres multitenancy? a client recently asked me to have his data separate from the main database i use for everyone and im confused on how to do this.
@DevMastersDb
@DevMastersDb 3 ай бұрын
Hey, thanks for sharing your feedback! Does the client just want to ensure his data is not visible to other clients, or has the client explicitly asked to segregate his data into separate database tables? Overall, consider these options: 1. Shared tables for clients. All customer data is stored in the same tables, but there is a special column such as "tenant_id" that lets you filter out one client's data from another's. You need to set up row-level security to ensure that when an application requests data belonging to tenant_id=5, the data of other tenants won't be visible. 2.Schema per client. Each client has his own set of tables stored in a client-specific schema (CREATE SCHEMA client1, CREATE SCHEMA client2, etc.). The application needs to specify the schema while querying or updating data of a specific client - "select balance from client5.account" or "select balance from client100.account". Distributed Postgres is useful for multi-tenant use cases if you expect to have many tenants and will need to scale. It's also useful if you need to pin each tenant's/client's data to a specific location (zone, region, database server).
@anagai
@anagai 3 ай бұрын
what model is this using? Can we do this with Azure openai?
@DevMastersDb
@DevMastersDb 3 ай бұрын
I used OpenAI GPT 4. Absolutely, you can use other models including Azure OpenAI
@reallylordofnothing
@reallylordofnothing 3 ай бұрын
Figma could have used Postgres Citus but invented their own sharding capability because they were running Postgres In house instead of on the cloud and AWS didn't offer Citus anyways. great Video Denis.
@DevMastersDb
@DevMastersDb 3 ай бұрын
Hey, buddy. Thanks for the feedback. In fact, these days Figma uses RDS in the cloud. I’ve created a short written version if you’re interested: medium.com/@magda7817/why-has-figma-reinveted-the-wheel-with-postgresql-3a1cb2e9297c
@stevenhkdb
@stevenhkdb 3 ай бұрын
crazy useful and stragit to the point!!!
@DevMastersDb
@DevMastersDb 3 ай бұрын
Yep, this stuff is crazy. And that’s just the beginning. It’s gonna be much wilder soon )
@daphenomenalz4100
@daphenomenalz4100 3 ай бұрын
Awesome content, loved the videos! Thanks so much
@DevMastersDb
@DevMastersDb 3 ай бұрын
Thanks for watching and sharing feedback! That matters a lot to me.
@jonathanfrias1
@jonathanfrias1 3 ай бұрын
Why are your pupils so HUGE!?
@DevMastersDb
@DevMastersDb 3 ай бұрын
So that I can hypnotize you
@sundarsravanivlogs9619
@sundarsravanivlogs9619 3 ай бұрын
What to learn as java full stack developer for AI future?? Do i need to learn python and completely switch fron java developer to python with AI
@DevMastersDb
@DevMastersDb 3 ай бұрын
Python is still a better fit for those who need to create and train models. As for general purpose apps that use LLMs and other models, Java is a great choice. So, stick to Java unless you’re creating your own models. Study Spring AI, LangChain4j and other frameworks that are gonna evolve rapidly within the Java ecosystem.
@daphenomenalz4100
@daphenomenalz4100 3 ай бұрын
Your channel is awesome :D
@DevMastersDb
@DevMastersDb 3 ай бұрын
Thank you, my friend! Appreciate your feedback )
@daphenomenalz4100
@daphenomenalz4100 3 ай бұрын
@@DevMastersDb I searched more on this on how to do the same with kubeadm, is it worth it setting up whole managed kubeadm cluster or would i just be fine using k3s/minikube?
@build-your-own-x
@build-your-own-x 3 ай бұрын
Here to know what wheels figma reinvented
@DevMastersDb
@DevMastersDb 3 ай бұрын
At least one
@Chris-cx6wl
@Chris-cx6wl 3 ай бұрын
@@DevMastersDb Make it two
@slowjocrow6451
@slowjocrow6451 4 ай бұрын
What is RAG?
@DevMastersDb
@DevMastersDb 4 ай бұрын
RAG stands for retrieval-augmented generation. It's a technique to enhance the behavior of an LLM by providing it with more context. Usually, you get that context from your own database that stores your own data. For instance, let's say you ask ChatGTP to recommend a few places to stay in NYC between April 1-7, 2023. ChatGPT doesn't know those details, it was trained on some generic data from the past and didn't have access to the private data of Expedia or Booking.com. But Expedia/Booking's own AI assistant can easily address this task by using the RAG approach. You ask their assistant to recommend the places, they query data from the database and feed it as a context to an LLM (that can be GPT), and then the LLM responds to you like a human would.
@slowjocrow6451
@slowjocrow6451 3 ай бұрын
@@DevMastersDb Great explanation thanks. So is your langchain example RAG? Because it's providing extra metadata etc to your query? I've looked at a few examples of langchain and it seems to match my idea of what RAG is, but langchain doesn't call itself RAG so maybe I'm missing something. Trying to figure out what all these new buzzwords mean hah
@DevMastersDb
@DevMastersDb 3 ай бұрын
@@slowjocrow6451 yep, LangChain doesn't have any RAG-specific APIs and it doesn't need them. But when you create those chains (with LangChain) and some parts of the chain retrieve additional information from a database or another resource and feed this information as an extra context to an LLM - then you're effectively creating a RAG-based solution with LangChain. Hope it makes sense. Also, I found this LangChain cookbook useful, take a look: python.langchain.com/docs/expression_language/cookbook/retrieval
@slowjocrow6451
@slowjocrow6451 4 ай бұрын
Crazy stuff, thanks for the video
@y4h2
@y4h2 4 ай бұрын
looking forward
@DevMastersDb
@DevMastersDb 4 ай бұрын
Planning to publish next week!
@ssaha7714
@ssaha7714 4 ай бұрын
Wow.. great.. Such a nice video. Denis, do you have git repo for this? I am interested to try with big dataset to generate embeddings the way you have shown.
@DevMastersDb
@DevMastersDb 4 ай бұрын
Hey, glad you liked it! Sure, here is a complete version of the app, enjoy! github.com/YugabyteDB-Samples/YugaPlus
@pier-jeanmalandrino8309
@pier-jeanmalandrino8309 4 ай бұрын
Nice work ! Very interesting. You should write papers for poeple more interested in reading than videos ;)
@DevMastersDb
@DevMastersDb 4 ай бұрын
Haha 😀 yep, many love to read. I blog periodically on DZone and medium but still videos is my primary format at least for now. dzone.com/authors/dmagda The beauty of videos is that you can literally show how things work or don’t work in practice.
@eugenetapang
@eugenetapang 4 ай бұрын
Thank you Denis, absolutely perfection, everything in this Postgres playlist is so timely and exactly the elegant and purist vibe that is just Coding ASMR. Thanks buddy, you got a loyal listener here. Keep Rockin!
@DevMastersDb
@DevMastersDb 4 ай бұрын
Thank you for your kind words and support! Glad that you found those videos useful. Btw, is there any specific area/topic you would suggest me diving into next? I'm building up content for generative AI apps in Java with Postgres pgvector. But not sure if this is what you're interested in.
@kollisreekanth
@kollisreekanth 4 ай бұрын
Really wonderful video. Thanks for sharing it with everyone. I just have a question, can we use this with NoSql databases like MongoDB/DynamoDB?
@DevMastersDb
@DevMastersDb 4 ай бұрын
Glad you found the video useful! As for MongoDB and other NoSQL databases, I don’t see that LangChain supports agents for them. But some folks found a way how to create custom agents using foundational capabilities of LangChain: omershahzad.medium.com/agent-for-mongodb-langchain-ccf69913a11a
@kollisreekanth
@kollisreekanth 4 ай бұрын
@@DevMastersDb thank you so much for the quick reply. Appreciate it 🙏🏼
@combinio9533
@combinio9533 4 ай бұрын
This is what I needed. Thank You a lot for uploading the video! <3
@DevMastersDb
@DevMastersDb 4 ай бұрын
Excellent, glad you found it useful!
@6srinivasan
@6srinivasan 4 ай бұрын
Is there a web version of DBeaver AI Chat. This seems similar to Vaana ai
@DevMastersDb
@DevMastersDb 4 ай бұрын
The AI Chat is not available in the web version yet. Thanks for sharing Vaana. Looks very promising and advanced.
@6srinivasan
@6srinivasan 4 ай бұрын
Yes Vanna ai has same capabilities, for restricting access to LLM, you have to write your own. Since the DBeaver has capabilities to restrict using Team edition, I was checking if a web version is available. That would be a game changer
@DevMastersDb
@DevMastersDb 4 ай бұрын
I see. I’ll pass over your feedback to the DBeaver’s team. Also, I would add that Vaana is a solution that specializes in the text-to-SQL interface (and AI data analysts), that’s why they have their own SDK that can fine-tune the behavior of LLMs. DBeaver, as a general-purpose database management tool, has just started advancing its AI capabilities. If they continue in the same direction, they should catch up with Vaana.
@dantedt3931
@dantedt3931 4 ай бұрын
Very good.
@DevMastersDb
@DevMastersDb 4 ай бұрын
Thanks, glad you liked it!
@GSUGambit
@GSUGambit 4 ай бұрын
Thanks @denis!
@user-pc6qf9yf1j
@user-pc6qf9yf1j 4 ай бұрын
Thanks keep goiing great content
@DevMastersDb
@DevMastersDb 4 ай бұрын
Thanks! Are there any specific topics you’d like me to cover?
@ToMontrond
@ToMontrond 4 ай бұрын
Hello Denis, do you have a repository for this project?
@DevMastersDb
@DevMastersDb 4 ай бұрын
Sure, it's here: github.com/YugabyteDB-Samples/YugaPlus
@DevMastersDb
@DevMastersDb 4 ай бұрын
Enroll here: info.yugabyte.com/scalable-fault-tolerant-apps-distributed-postgresql/
@mallikarjunmongolla4519
@mallikarjunmongolla4519 4 ай бұрын
Thanks for this video very useful to get into Spring AI . world need spring AI
@DevMastersDb
@DevMastersDb 4 ай бұрын
Excellent, thank you! I've shared your feedback with the Spring team ;)
@vamsiraghu3258
@vamsiraghu3258 4 ай бұрын
excellent explanation and the demo. Thank you!
@DevMastersDb
@DevMastersDb 4 ай бұрын
Thank you for feedback! Glad you found it useful.
@GeriFitrah
@GeriFitrah 5 ай бұрын
Will it work with chat history?
@DevMastersDb
@DevMastersDb 5 ай бұрын
Yes, you need to tweak the current implementation as follows: 1. Store the history in some variable like "chat_history". 2. Pass this variable to the agent prompt that is generated by the "prepare_agent_prompt" method. 3. You can append the chat history to the end "agent_prompt" variable as follows ".... also, consider the following chat history {chat_history}"
@applepeel1662
@applepeel1662 5 ай бұрын
Really cool! Thanks a lot
@DevMastersDb
@DevMastersDb 5 ай бұрын
Glad you liked it! Anything else you'd like to learn about? It should be related to databases (the focus of my channel)
@applepeel1662
@applepeel1662 5 ай бұрын
Really cool! Thanks a lot
@dewanjeesoma
@dewanjeesoma 5 ай бұрын
I am using pgvector to insert data in postgres table, how to add additional columns like your movie schema?
@DevMastersDb
@DevMastersDb 5 ай бұрын
Use the following command to add a column of the vector type to your existing table: ALTER TABLE my_table ADD COLUMN my_new_vector_column vector(1536); where 1536 is the dimension of the vectors generated by the OpenAI text-ada-2 model. You can change the dimension to the value supported by your model. Next, to generate embedding and store them back into the `my_new_vector_column` column: 1. Suppose you want to generate embedding for the `description` column of the text type. Read all the `description` for all the rows. 2. For every description generate an embedding using your model 3. Use the UPDATE statement to write the generated embedding back into the `my_new_vector_column` column
@CarlosBaiget
@CarlosBaiget 5 ай бұрын
Very informative and interesting, thanks.
@DevMastersDb
@DevMastersDb 5 ай бұрын
Glad you enjoyed it!
@GSUGambit
@GSUGambit 5 ай бұрын
Itd be nice if you had a tutorial that showed how to create the embeddings. That will be the situation most developers will be in. We have the data in postgres but we need to generate the embeddings based on data already in our database
@GSUGambit
@GSUGambit 5 ай бұрын
At 10:21 you show aiClient.embed function. My assumption is we need to list the key value pairs of our Domain Objects into a string and call this function and that will give us the embeddings we should save in the database.
@DevMastersDb
@DevMastersDb 5 ай бұрын
@@GSUGambit Good point. I should have shown how to generate embeddings for my original dataset. The process is follows. Assume that you have the "value" column of a text type in Postgres. And you want to do a vector similarity search against its content. Then, this is how you can generated embeddings (pseudo-code): 1. Add a column that will store the embedding value (1536 is a dimenstion for OpenAi text-ada-2 model, you can set another value for another model): alter table myTable add column value_vector vector(1536). Repeat for every row in the table: 1. select id, value from myTable; 2. embedding = aiClient.embed(value); 3. update myTable set value_vector = embedding where id = row.id Hope it makes things clear.
@GSUGambit
@GSUGambit 5 ай бұрын
@DevMastersDb thank you I believe this is everything I need
@DevMastersDb
@DevMastersDb 5 ай бұрын
Ping me if anything doesn’t work as expected. I’ll do my best to send you over a code snippet in Java if necessary
@DevMastersDb
@DevMastersDb 4 ай бұрын
@@GSUGambit in case you still need it, this is a new video that shows how to generate embeddings with Spring AI: kzfaq.info/get/bejne/fJZ2esap0LOpnJ8.html