GraphRAG or SpeculativeRAG ?

No video

GraphRAG or SpeculativeRAG ?

Рет қаралды 7,527

Күн бұрын

What are the latest and best RAG systems, mid-July 2024? IF you build your next RAG system to integrate external data (from databases), what RAG system to choose for the best performance: GraphRAG or SpeculativeRAG? My AI research channel is here to provide answers.
This video introduces a novel framework called Speculative Retrieval-Augmented Generation (Speculative RAG), designed to optimize retrieval-augmented generation systems by efficiently generating accurate responses. This method innovatively separates the retrieval-augmented generation process into two distinct phases: drafting and verification. In the drafting phase, a specialized, smaller language model (LM) generates multiple answer drafts in parallel, each from a distinct subset of retrieved documents. This approach ensures diversity in perspectives and reduces redundancy. In the verification phase, a larger, generalist LM evaluates these drafts and selects the most accurate response based on a scoring system that assesses the drafts against their rationales.
The Speculative RAG model significantly improves the efficiency and accuracy of response generation in knowledge-intensive tasks by leveraging parallel processing and optimized document sampling. The framework clusters documents based on content similarity before drafting to minimize information overload and enhance focus. The model has been tested across various benchmarks such as TriviaQA, MuSiQue, PubHealth, and ARC-Challenge, demonstrating substantial improvements in both speed and accuracy compared to conventional RAG systems.
GraphRAG is a recent development from Microsoft that significantly enhances the performance of large language models (LLMs) through the integration of knowledge graphs with Retrieval Augmented Generation (RAG). It was designed to address the shortcomings of traditional RAG, which typically relies on vector similarity for information retrieval, often resulting in inaccuracies when dealing with complex or comprehensive queries.
all rights w/ authors:
Speculative RAG: Enhancing Retrieval Augmented
Generation through Drafting
arxiv.org/pdf/...
#airesearch
#aieducation
#newtechnology

Пікірлер: 26

@MattJonesYT Ай бұрын

I think the weakest link in RAG is it usually chunks it without respect to context which means the data is immediately corrupted and then you need a really complex system to make the data not corrupt again. I think the biggest gains to be had is at the start of the process by chunking it in very intelligent semantic paragraphs that stand on their own like a section of paragraphs in a book. Just splitting every n tokens ruins RAG performance.

@themax2go Ай бұрын

well that's why contextual graphs now exist, ie ms's graphrag and now scifi's triplex...

@criticalnodecapital Ай бұрын

@@themax2go 100%, i was waiting 4 months fro them to drop it, and then realised speculative graph, or using an abstractlayer to let the llms bash it out was a better way to go!!!>.. EVALs fml , why did i not do this before.

@davidwynter6856 Ай бұрын

Through actual use of baseline RAG over a year ago I realised knowledge graphs with their rich semantic capability would improve things radically. But after some experimentation I realised I needed to combine the triples and the embeddings, for simplicity and performance reasons. This is easy and free using Weaviate which allows a schema to be added over the top of the vector store. Since then I have built 4 different knowledge graphs over Milvus and Weaviate, they work brilliantly, and you can also build embedding for the full triple as well as the constituent subject, predicate and object. GPT-4o understand triple representations extracted from the user prompt very well.

@fintech1378 Ай бұрын

Awesome, any video / link to share?

@davidwynter6856 Ай бұрын

@@fintech1378 Sorry, trying to get a job currently, after my sabbatical, the toolkit I built has to remain private

@artur50 Ай бұрын

github ;)?

@Karl-Asger Ай бұрын

Great to hear this. Can you speak to the cost of generating the knowledge graph, and what scale you're working with? I really like your insight here about embedding not just the chunks but also the triplets.

@antaishizuku Ай бұрын

Preprocessing text really gives better results so at the end of this if you return a preprocessed string instead of the original to the llm it would probably do better. Personally im focusing on a different approach but from my testing i found this helps.

@c.d.osajotiamaraca3382 Ай бұрын

Thank you for helping me avoid the rabbit hole.

@user-gj1gd5pi1m Ай бұрын

Both are not practical. I guess the authors do not have a production level experience in RAG.

@whoareyouqqq 24 күн бұрын

Google reinvented the map-reduce algorithm, where the map step is draft and the reduce step is verification

@iham1313 Ай бұрын

there was (some time ago and somewhere) the argument about models not capable understanding the question as it is not trained on the domain specific data. it would be interesting to combine training of a base model with domain data (like articles, documents and books) and sending it of to a RAG like setup to retrieve referable results.

@thomaslapras1669 Ай бұрын

Great video, as usual ! But i have one question : what if the relevant context is splitted in different sub datasets ?

@code4AI Ай бұрын

You operate with multiple datasets.

@topmaxdata Ай бұрын

In many cases, working with a KV store or a relational database with extracted entities and relationships is more practical than using a graph database like Neo4j for the following reasons: Familiarity: Most developers are already familiar with relational databases and key-value stores, making them easier to work with and maintain. Ecosystem: Relational databases and KV stores have mature ecosystems with robust tools, libraries, and integrations. Performance: For many use cases, KV stores and well-designed relational databases can offer excellent performance. Flexibility: Relational databases can handle a wide range of data structures and query patterns. Scalability: Both KV stores and relational databases can be scaled horizontally or vertically to meet performance needs.

@code4AI Ай бұрын

Smile. And after the praise for a KV, now list 5 problems w/ KV, just to have a balanced presentation from your side.

@topmaxdata Ай бұрын

@@code4AI Curious what are the 5 problems? Thank you.

@be1tube Ай бұрын

I'll give the disadvantages a shot: Consistency: joins are not atomic, so by the time you finish the join the info may be outdated Extra memory: joins must be done in the client Extra queries: need to do one query per joined table usually No relationship constraints across tables or rows Imperative style: you tell the DB every step. You don't get intelligent query optimizers giving you the benefit of years of database research, you have to build it from scratch. The DB doesn't know its structure: it doesn't know when to cascade deletes or when to store two elements nearby because they will be accessed together. Note: it's been a decade since I tried to use a large KV store, so maybe some of these are better now.

@sinasec Ай бұрын

is there any source code for this RAG?

@lionardo Ай бұрын

I doubt this is working better than simple rags.

@AaronALAI Ай бұрын

Hmm 🤔 i don't doubt there are better rag strategies.... however, rag with a model of good context size (65k+) yields very good results. But there will always be a scaling issue, too little model context or too large a db.

@code4AI Ай бұрын

Whenever a global corporations tells us, that their old product has a very poor performance and that we now have to buy a new product .... we can decide for a product that fits our needs.

@GeertBaeke Ай бұрын

@@code4AI That is not exactly what Microsoft is saying. The team that built Graph RAG focused mainly on global queries that use community summaries that were created during indexing. This allows you to ask global questions about your data that, out of the box, provides better answers than baseline RAG. And their local queries are actually a combination of vector queries to find entry points in the graph followed up by graph traversal. It's about combining things, not simply selecting one thing.