BEST RAG you can buy: LAW AI (Stanford)

Рет қаралды 4,695

Ай бұрын

Best commercially available legal research /Law AI RAG systems, evaluated by Stanford, in new research.
All rights with authors only:
Hallucination-Free? Assessing the Reliability of
Leading AI Legal Research Tools
arxiv.org/pdf/2405.20362
hai.stanford.edu/news/halluci...
#airesearch
#law
#ai

Пікірлер: 19

@brandonheaton6197 Ай бұрын

Solid research call out. Inspired by our old friend the GAN, I am currently trying to use critics in my agentic workflow to catch the aberrant responses. Expensive, but much like humans the GPT4 (more so than 4o) is often able to recognize when it has stated a falsehood or logical non-sequitur

@jnevercast Ай бұрын

GPT 4 is great at judging itself and there have been papers for using GPT 4 as a reward function. As you say though, very expensive.

@fabianaltendorfer11 Ай бұрын

Very interesting, thank you for the update. Stanford proved what I've already believed for quite some time but there is a lot of interest of the market to hide how bad RAG in some areas really is.

@Canna_Science_and_Technology Ай бұрын

This isn’t a perfect solution. But what I ended up doing for my rag system. Not only to ensure a very robust advanced rag system implemented. But for pre-processing the documents each chunk pulled from the original document an anchor was included. This way during the Q&A and rag retrieval, the original document displayed in a split screen would highlight the areas of where the information was taken from to answer the users question. That way, the user can read exactly where in the original document the information was taken from. Not a perfect solution, but it has helped. my rag systems do not involve reasoning, and as someone who’s been developing in AI for a while, I would never provide a rag system or use an LM that requires reasoning. it just isn’t ready yet and there is no rush to implement it. I don’t want to leave my users second-guessing or questioning AI. I think it has an awesome future and I don’t want to cram it down people throat just because.

@code4AI Ай бұрын

Hmmmm .... for me artificial intelligence is inherently intertwined with reasoning capabilities. If I just want to present the location of facts in one or multiple document, I let an Indexer index all my text chunks or words. But the beauty of the vector encoding /vector embedding is the concept of transforming semantic relevance to locational closeness in a vector space. And not using this insight and relation of close by facts for reasoning ..., smile.

@mulderbm Ай бұрын

Indeed i commented once on one of the RAG research channels and made the comparison to a FAQ and the answer was exactly. Then why use the LLM? If you want the answer from the domain database?

@Canna_Science_and_Technology Ай бұрын

@@code4AI Thanks for the feedback, everyone! Just to clarify, my RAG system focuses on enhancing transparency and trust. I use interactive anchors in the responses. When you click an anchor, the document on the right scrolls to the exact source and highlights the text, so you can see exactly where the information came from and verify its accuracy. Additionally, I use an indexer to map content locations, making searches quick and precise. While I understand that reasoning is inherently part of AI and LLMs, my goal here is not to use the system for intensive reasoning but to ensure users can trust and easily verify the information they receive. Combining indexing and anchoring ensures efficient retrieval and clear transparency.

@DaveRetchless Ай бұрын

Excellent information, like always! Thank you for the ongoing knowledge you provide us! Lots of gratitude!

@mshonle Ай бұрын

I think there’s a great chance to use encoder-only models like BERT to do an extractive summarization task (identifying relevant tokens, not auto-regressively generating an abstractive summarization), so that all relevant quotes can be labeled and identified as input for a separate decoder only LLM. (I imagine the initial text to give the BERT-like encoder would be based on vector queries and also verbatim text search, to cover gaps in the vector search.) Provide the system with a glossary and known case law graphs (whatever those might be) and traverse the related works search iteratively until you reach a fixed point or the relevance score drops below some threshold.

@matinci118 Ай бұрын

This is a great illustration how professional industries will have, after all this excitement, a period of sobering real-life performance limitations ahead of them. The thing about grokking in this context, however, is that it would require that 'perfect dataset' to encapsulate that relevant slice of reality. I dont quite see how one could curate a dataset that meets this standard in legal services. The law itself embraces uncertainty: When black-and-white, yes/no decision paths are exhausted, legal systems rely on principles such as fairness, equity and discretion. These are by definition categorical variables and representing them in a 'perfect way' strikes me as an impossible task, and thats why grokking seems like its not actually the answer here.

@theseedship6147 Ай бұрын

I suppose that one of the problems with a SAAS service is the cost of the... service. 😅 How can we justify the development of overlearning to hypothetically achieve this grokking phenomenon ? And even if we succeed in this trick on one or more small models, which logical doors, represented by these 'Grokked' llm necessary for elementary, atomic (???) reasoning, should be unlocked as a priority ? I find it hard to believe that a model, even a Grokked model, can have all the necessary faculties on its own. That said, being a hyperspecialist without hallucinations on specific tasks remains a fantastic advance. I wonder if the ultrainteract dataset could be oriented in the direction of Grokking to reach this phenomenon more quickly or in a more structured way ? Thank you very much for sharing your thoughts with us !

@densonsmith2 Ай бұрын

Here is what is missing: how does the RAG system compare with individual human lawyers? I am extremely skeptical that if I walk into a local law office and give them the same test they will get 65%.

@artur50 Ай бұрын

Law and finance analytics on a high level could revolutionize business uses of AI, not yet though...

@spkgyk Ай бұрын

RAG clearly has a myriad of problems, and theres a long road ahead before anything groud breaking comes from it. However, grokking also seems very abstract to me. For example, what is the exact format of data needed to grok a system, how easy is it to achieve the grokking phase (train once, twice, 100 times to 10^6 epochs?), how to understand the phi in practice. From the grokking videos it seems the data needs to be structured like a knowledge graph, and during training you randomly sample connections from that graph. But in this case, is the LLM youre grokking receiving logical statements, or human readable sentences? And with the output of these grokked models, are they human readable or written in mathematical logic? Do we need to use a grokked llm for the reasoning part, and then connect to fine-tuned regular llm to fully explain and format the output? I have so many questions haha

@gileneusz Ай бұрын

but grokking has not been tested yet on Law AI category...

@frederic7511 28 күн бұрын

Very interesting video. Quite depressing actually because I'm currently working on a Law RAG system (Legal assistant)... Now I understand why I'm having a hard time. Does it mean the only way to make it is to overtrain your model with a bunch of law cases so that it gets the legal reasoning skill, and then use this Grokked model in a RAG system ? Or does it mean only relying on the model, and train it with all laws and law cases ? That'd cost millions to do that !

@code4AI 28 күн бұрын

Listen, if it would be easy, every consultancy firm in the world would offer it as a service. So, therefore, you have to be brilliant, genius, combine undiscovered ideas and experiment for your particular downstream task. Do not just follow explored patterns, but have your own ideas, your own imaginations how the inner workings should work, and then start to implement it. When all the others fail, it is the unique opportunity for fresh ideas. Microsoft has about 200 researcher on building the perfect RAG, this old idea, and it has billions to do so, but if you know your goal, your industry, your service that you want to offer, build on your insights, that no Microsoft employee has. -- End of prep talk, back to work.

@OTISWDRIFTWOOD Ай бұрын

Just goes to show AI wont perform well unless you can get talented people to implement it. Good people are hard to find. You can forget that the best people are on this. Its the people put forward by salespeople. No where near the best. Find the real talents. Also vector search is extremely overrated.