Think-and-execute prompting for LLMs
7:37
Fine-tuning or RAG?
8:05
Ай бұрын
Fixing RAG with GraphRAG
15:04
Ай бұрын
Co-intelligence: book review
13:06
$10k for LLM reasoning
6:24
2 ай бұрын
LLM benchmarks
11:02
3 ай бұрын
LLMs eat entry-level SWEs
9:06
3 ай бұрын
LLMs can debug with prints
8:54
3 ай бұрын
Determinism ⇒ Fast LLMs (Groq)
10:07
GPT-4 passed the Turing Test!
6:58
LLMs with infinite context?
9:35
5 ай бұрын
Пікірлер
@awakenwithoutcoffee
@awakenwithoutcoffee 3 күн бұрын
great presentation Vivek. Some questions: - is graphRAG production ready ? if not, would it be difficult to upgrade RAG methods once we are in production ? - is there a RAG provider/stack that you prefer ? (datastax, pinecone, weaviate + a bunch of others who are all competing for attention) - what are your thoughts on LangChain vs LangGraph ?
@christopherconyers767
@christopherconyers767 3 күн бұрын
Awesome review - thanks for the great work!
@brandonheaton6197
@brandonheaton6197 3 күн бұрын
can you pontificate on the combination of upcoming transformer inference ASICs with deep agentic workflows employing GraphRAG style strategies? Seems like we will be close to our personal assistants writing a PhD thesis in the background whenever we ask a question. SOHU is reporting 500,000 tokens per second with Llama3 70B....
@RoulDukeGonzo
@RoulDukeGonzo 4 күн бұрын
Seems clear that for 'current events' rag is going to win, but for broader, domain specific themes or logic, how does fine tuning stack up? E.g. create code using our internal suite of APIs... If context is big enough, icl should be fine, but rag may miss some key docs based on semantic similarity alone... I guess... I should write a paper 😂
@sasha297603ha
@sasha297603ha 4 күн бұрын
Very interesting papers, thanks for covering!
@stevenwatson2927
@stevenwatson2927 5 күн бұрын
It's surprising to see ChatGPT achieving below 99% when Wolfram Alpha can basically answer anything just by having specific knowledge. It's also surprising to see that "playing" with the word prompt does anything at all yet alone give a better result. It makes no sense especially when we can clearly see from the research the information entropy is basically the same between prompts with and without extra steps.
@therobotocracy
@therobotocracy 6 күн бұрын
Is it flattening out because it maxes at %100?
@VivekHaldar
@VivekHaldar 5 күн бұрын
Yes, that too! People have started looking at harder benchmarks like GSM8k-Hard and MATH.
@karinlv890
@karinlv890 9 күн бұрын
Thank you for saving my group meeting! Your video helps a lot!
@wanfuse
@wanfuse 10 күн бұрын
wouldn't it cut to the chase to train an llm on your own data? theres your graph use one of these OpenAI's GPT-3/4 Hugging Face Transformers (e.g., GPT-2, GPT-3 via third-party providers) Google's T5 (Text-to-Text Transfer Transformer) Meta's BART and BlenderBot Anthropic's Claude every week update the llm summarization is the death of real data, better off one level of summarization? Just a thought!
@mccleod6235
@mccleod6235 9 күн бұрын
Maybe you don't want to send all your valuable business data to third party companies.
@wanfuse
@wanfuse 9 күн бұрын
@@mccleod6235 thats true but its not necessary, there are models that are open source you can train air gapped from a jetson
@bohnohboh676
@bohnohboh676 6 күн бұрын
"every week update the llm" yeah no way unless you have tons of cash, compute, and time
@wanfuse
@wanfuse 6 күн бұрын
maybe maybe not, let you know! your probably right, will see if my idea pans out
@rafikyahia7100
@rafikyahia7100 12 күн бұрын
Excellent content summarizing cutting edge approaches, thank you!
@sasha297603ha
@sasha297603ha 12 күн бұрын
Very interesting paper! Looks like team lead model and a bunch of juniors 😅 Thanks for covering!
@christopherd.winnan8701
@christopherd.winnan8701 14 күн бұрын
Are there any models where we can try this think and exe method for ourselves?
@VivekHaldar
@VivekHaldar 12 күн бұрын
As described in the paper, authors tried it with GPT-3.5, and Llama. They have prompts in the paper, you could try it with any LLM of your choice.
@vida91963
@vida91963 15 күн бұрын
Nice presentation thank you!
@jordycollingwood
@jordycollingwood 15 күн бұрын
Really great explanation, I’m currently struggling to decide on my own KG structure for a 2000 medical pdf corpus, so this was very helpful
@awakenwithoutcoffee
@awakenwithoutcoffee 3 күн бұрын
same here brother. There are so many techniques, everyday I learn something new which is both good and terrifying ha. What stack are you thinking of using ? We are researching DataStax, Pinecone, Weaviate and are learning to build agents with LangGraph.
@kaixiliu7469
@kaixiliu7469 18 күн бұрын
Thanks for sharing the review Vivek! Would you mind sharing your book list as well?
@VivekHaldar
@VivekHaldar 10 күн бұрын
Hey Kaixi! Don't have an explicit list, just pick up what looks interesting at the time... :-)
@btscheung
@btscheung 25 күн бұрын
Really appreciate your in depth review of the book! This provides more thoughtful reading when I start the book.
@thankqwerty
@thankqwerty 25 күн бұрын
Thanks for sharing the paper. In my experience with using Llama3-8B, in my benchmark dataset, I noticed that LLM has learned an incorrect fact or in contradiction with my application. I tried to clarify that in the prompt, but noticed the LLM is actually quite stubborn, and lead to quite fragile responses, i.e. the LLM sometimes get it right sometimes get it wrong with minimal changes in the prompt, could be as small as adding spaces. I wonder if you have come across similar situation or papers that discuss this behavior. Thanks.
@VivekHaldar
@VivekHaldar 20 күн бұрын
Yes that kind of brittleness is a common issue unfortunately.
@harivarsha4016
@harivarsha4016 Ай бұрын
I love this kind of content, please never stop !!!
@atomwalk
@atomwalk Ай бұрын
Awesome work! Thanks🤗
@user-wr4yl7tx3w
@user-wr4yl7tx3w Ай бұрын
More agent paper please. Thanks 😊
@willtipton1698
@willtipton1698 Ай бұрын
Nice video ty
@colinwar
@colinwar Ай бұрын
You ask vanilla questions if you can't un-cloak a machine response!. The reasoning is not there with language models, how stupid are people to not be able to ask the right questions? I call lies on these claims. Show the test as proof, I doubt you can or will show the actual test. This is absurd.
@gilinachum
@gilinachum Ай бұрын
But why is the paper's fine tuning different than the original pre-training and alignment fine tuning that came before it. All expose the model to a mix of existing and new data...
@VivekHaldar
@VivekHaldar Ай бұрын
You are correct -- in principle fine-tuning works the same way as pre-training (updating weights), so FT can be thought of as continued PT. Difference is in data used. One will FT when they have a domain-specific set of data that's very different from the PT data.
@hosseinmohammadi4574
@hosseinmohammadi4574 Ай бұрын
Interesting! Tnx
@sasha297603ha
@sasha297603ha Ай бұрын
Very interesting paper, thanks for covering!
@HampusAhlgren
@HampusAhlgren Ай бұрын
Just wanted to say I really appreciate your videos. Everything is short and concise and I love that you’re always using papers as the foundation for the conclusions. Keep it up!
@VivekHaldar
@VivekHaldar Ай бұрын
Thanks for the kind words. That's the idea!
@dennyoviedo4102
@dennyoviedo4102 Ай бұрын
Good brother 😊thanks for an excellent explanation , Peer 2 peer of BTC formula. I’ll eat this info into my brain 🧠 until my neurons starting a new circuit.😂.
@sasha297603ha
@sasha297603ha Ай бұрын
Very interesting paper, thanks for covering!
@MatySiman
@MatySiman Ай бұрын
Great video! Why the > 1 was a mistake? Didn't it return False as it should?
@MatySiman
@MatySiman Ай бұрын
More specifically, I find this sentence a bit weird: "This means that if any element has more than 1 duplicate , the function will return False . However , the task requires that if there are more than 1 duplicate of the same number , the function should return False ."
@MatySiman
@MatySiman Ай бұрын
@VivekHaldar
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
Does this also mean that experts in their field might choose to wait for improved AI abilities so that they can do more than just superficial improvements? I predict that we will see a tsunami of low quality generations followed by a true paradigm leap in terms of content.
@VivekHaldar
@VivekHaldar 2 ай бұрын
They don't need to wait. You can get pretty far before hitting the limits of current SOTA models. There is a tiny fraction of writers who are sought out for their unique voice. Everyone else is producing generic sounding copy, ripe for replacement.
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
@@VivekHaldar - I still cannot find a model that can handle more advanced tasks. Do you have any recs?
@VivekHaldar
@VivekHaldar 2 ай бұрын
@@christopherd.winnan8701 Example of an advanced task you see LLMs having trouble with? See recent videos (two weeks ago) on the channel about the $10k reasoning challenge for an example problem and resulting prompt that solved it.
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
@@VivekHaldar Is it an open access LLM?
@NiladriBhattacharjya
@NiladriBhattacharjya 2 ай бұрын
Loved it!
@MiningGodBruce
@MiningGodBruce 2 ай бұрын
I would bet that one of the tradeoffs of this strategy is that compiling a schedule for one of there chips probably takes several man-years of work. I'm guessing this is why they decided to only do inference on their cloud... because it constrains the space of models they need to support. It turned out to be an lucky decision because all the other custom chips are having a hard time trying to keep up with the models that people want to be running today. To be honest this seems like the most promising ML chip because it is actually taking advantage of the domain and provides a double-digit multiple improvement along some axis (latency only? how much energy does it use?). If your workload is statically-defined structure of large array computation, it certainly seems like a full execution schedule would be fundamentally different and more efficient than even the most optimized GPU runtime. GPUs probably still win out for consumer models whose structure changes every 2 weeks. Time will tell.
@archiliusfowl3701
@archiliusfowl3701 2 ай бұрын
Great review. Love these casual outdoors content! Adds a whole different vibe to the video!
@jaydugger3291
@jaydugger3291 2 ай бұрын
Thank you for this review.
@vijaybrock
@vijaybrock 2 ай бұрын
Sir, can you give me the best RAG pipeline approach to chat with 10-K reports of different companies for the past 10 to 15 years accumulated count can range between 50 - 100 pdf files of 10K reports.
@vijaybrock
@vijaybrock 2 ай бұрын
Give me the best RAG pipeline approach to chat with 10-K reports of different companies for the past 10 to 15 years accumulated count can range between 50 - 100 pdf files of 10K reports.
@thankqwerty
@thankqwerty 2 ай бұрын
Thank you for introducing the paper.
@sasha297603ha
@sasha297603ha 2 ай бұрын
Can't wait when we will have kaggle competitions like this😂
2 ай бұрын
This almost looks like a compiled program at this point.
@VivekHaldar
@VivekHaldar 2 ай бұрын
That's a great way to put it!
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
If you had 10k to offer as a prize, what solution would look for?
@VivekHaldar
@VivekHaldar 2 ай бұрын
Great question! I'd want the prompt to be simple (as close to the original high-level human readable problem statement as possible), but offer a prize for a model that was tuned enough to take that simple prompt and solve it. Basically, take the complexity away from the prompt and into the model training.
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
@@VivekHaldar - I was always impressed by Peter Diamandis work on the Xprize and moonshots. Maybe it would boost progress to have a bunch more smaller prizes for AI related achievements?
@sasha297603ha
@sasha297603ha 2 ай бұрын
Happy to hear that GPT-4 lost somewhere. Competition drives development! Very interesting topic, thanks for covering I wish company pay me 10k to came up with such a prompts😃
@AhmedKachkach
@AhmedKachkach 2 ай бұрын
Really fascinating way to approach this problem: how to get rid of non-determinism and communication overhead? Just decide at compilation time all the bits of computation that need to happen and do them independently in their dedicated chip :)) And you summarized it in a very approachable way! btw this is probably somewhat related to why they decided to stop selling chips; this approach makes a lot of sense in a DC where you can quickly amortize the fixed cost of serving any of these models (but not so much if you only want to serve a single model at low traffic but have to buy > 500 chips to do so)
@christopherd.winnan8701
@christopherd.winnan8701 2 ай бұрын
Interesting development. Well spotted and thank you for bringing it to our attention. Will you do a follow up vid? In the meantime, can you do a video which explains more about the "qualia' part of Qstar please? This video seems to have the best explanation so far. The Symmetry Theory of Valence (@The Centre for Psychedelic Research at Imperial College London) kzfaq.info/get/bejne/qrmAjZep2ZvKYp8.html Jump to 6m 20s for the TLDR. "For any conscious experience there exists a mathematical object isomorphic to it" just as four simple equations tie together phenomena we know as electromagnetism, they are talking about qualia as a deep mathematical structure to consciousness. Is it this kind of pattern recognition breakthough that has been achieved with Qstar?
@VivekHaldar
@VivekHaldar 2 ай бұрын
OpenAI is "not ready to talk about Q*", so I'd just wait until they publish something real. Thanks for the video link, will check it out.
@user-wr4yl7tx3w
@user-wr4yl7tx3w 2 ай бұрын
actually, understanding the problem itself takes cognitive effort.
@aishaal-harbi1929
@aishaal-harbi1929 2 ай бұрын
Great video!
@sasha297603ha
@sasha297603ha 3 ай бұрын
Very interesting how these chaining evolves. I hope that in the near future LLMs would be cheap to fine-tune and we will have ultra powerfull personilized agents. Great video,thanks for covering!
@jsalsman
@jsalsman 3 ай бұрын
Have you tried Josh Olin's WebGPT OpenAI GPT extension? It has something he calls code playgrounds (and web requests) which he says have been beating Devin on the coding benchmark it was touting for months prior.
@olegt3978
@olegt3978 3 ай бұрын
Overview on meta prompting strategies might be good ideo for a video.
@sasha297603ha
@sasha297603ha 3 ай бұрын
Great video, thanks for covering!
@CaribSurfKing1
@CaribSurfKing1 3 ай бұрын
Nice, logical breakdown