Рет қаралды 1,658
Over the past year, I’ve been relentlessly working single-handedly on an application that I’ve now named LARS, for the ‘LLM & Advanced Referencing Solution’. It enables you to run LLM's (Large Language Models) locally on your device, upload your own documents and engage in Q&A sessions where the LLM grounds its responses in your uploaded content. This grounding helps increase accuracy and reduce the common issue of AI-generated inaccuracies or "hallucinations." This technique is commonly known as "Retrieval Augmented Generation", or RAG.
However, LARS takes the concept of RAG much further by adding detailed citations to every response, supplying you with specific document names, page numbers, text-highlighting, and images relevant to your question, and even presenting a document reader right within the response window!
There are features baked into LARS solely focused on improving the user experience such as:
1. Chat history, to resume prior conversations
2. Per-response user ratings, to identify focus-areas for improvements and
3. Conversation memory, so the user may ask follow-up questions
I’m happy to connect and discuss technical details further, such as the LLM-backend, embedding models used (there are four supplied in LARS!), vector database, text-extraction techniques (comprising fully local or OCR techniques, combined with custom parsers for scanned and table-heavy documents), options built in to tune the LLM's response via advanced settings (temperature, top-k/p etc.) and the prompt-engineering and RAG-tweaking tools built into LARS.
Last but certainly not the least, LARS can utilize your Nvidia-CUDA GPU to dramatically speed up inferencing and allows you to specify the exact number of model-layers you’d like to offload to the GPU. This is useful for hybrid CPU+GPU inferencing in memory-limited scenarios.
I’m keen to discuss this more and delve into the technical details with fellow enthusiasts. Feel free to connect via email at abheekg@hotmail.com or drop a hello on LinkedIn at / abheek-gulati
#GenAI #GenerativeAI #LLMs #LargeLanguageModels #genai #llm #RAG #generativeai #RetrievalAugmentedGeneration