No video

Q* explained: Complex Multi-Step AI Reasoning

  Рет қаралды 8,423

code_your_own_AI

code_your_own_AI

Күн бұрын

NEW Q* explained: Complex Multi-Step AI Reasoning for Experts only (integrating graph theory and Q-learning from reinforcement learning of LLMs and VLMs).
My video provides an in-depth analysis of Q-Star, a novel approach that amalgamates Q-Learning and A-Star algorithms to address the challenges faced by large language models (LLMs) in multi-step reasoning tasks. This approach is predicated on conceptualizing the reasoning process as a Markov Decision Process (MDP), where states represent sequential reasoning steps and actions correspond to subsequent logical conclusions. Q-Star employs a sophisticated Q-value model to guide decision-making, estimating future rewards and optimizing policy choices to enhance the accuracy and consistency of AI reasoning.
Integration of Q-Learning and A-Star in Q-Star
Q-Star's methodology leverages the strengths of both Q-Learning and A-Star. Q-Learning's role is pivotal in enabling AI agents to navigate through a decision space by learning optimal actions through reward feedback, facilitated by the Bellman equation. Conversely, A-Star contributes its efficient pathfinding capabilities, ensuring optimal decision pathways are identified with minimal computational waste. Q-Star synthesizes these functionalities to form a robust framework that improves the LLM's ability to navigate complex reasoning tasks effectively.
Practical Implementation and Heuristic Function
In practical scenarios, such as autonomous driving, Q-Star's policy guides decision-making through a heuristic function that balances accumulated utility (g) and heuristic estimates (h) of future states. This heuristic function is central to Q-Star, providing a dynamic mechanism to evaluate and select actions based on both immediate outcomes and anticipated future rewards. The iterative optimization of these decisions facilitates an increasingly refined reasoning process, which is crucial for applications requiring high reliability and precision.
Performance Evaluation and Comparative Analysis
The efficacy of Q-Star is highlighted through performance comparisons with conventional models like GPT-3.5 and newer iterations such as GPT Turbo and GPT-4. The document details a benchmarking study where Q-Star outperforms these models by implementing a refined heuristic search strategy that maximizes utility functions. This superior performance underscores Q-Star’s potential to significantly enhance LLM's reasoning capabilities, particularly in complex, multi-step scenarios where traditional models falter.
Future Directions and Concluding Insights
The document concludes with a discussion on the future trajectory of Q-Star and multi-step reasoning optimization. The insights suggest that while Q-Star represents a considerable advancement in LLM reasoning, the complexity of its implementation and the computational overhead involved pose substantial challenges. Further research is encouraged to streamline Q-Star's integration across various AI applications and to explore new heuristic functions that could further optimize reasoning processes. The ultimate goal is to develop a universally applicable framework that not only enhances reasoning accuracy but also reduces the computational burden, making advanced AI reasoning more accessible and efficient.
All rights w/ authors:
Q*: Improving Multi-step Reasoning for LLMs with
Deliberative Planning
arxiv.org/pdf/...
#airesearch
#ai
#scienceandtechnology

Пікірлер: 27
@gregsLyrics
@gregsLyrics Ай бұрын
firehose to my brain. Amazing! This indicates a fairly long path of steps I need to learn so I can properly digest this beautiful wisdom. Really amazing channel, filled with advanced knowledge of the gods.
@scitechtalktv9742
@scitechtalktv9742 Ай бұрын
Interesting explanation! You mentioned there is code to try it yourself, but I cannot find that. Can you point me to it?
@parthmakode5255
@parthmakode5255 Ай бұрын
please tag me also once you find the code
@btscheung
@btscheung Ай бұрын
Your presentation in this video is definitely A+ in terms of clarity and depth of understanding! well done. Also, I am happy to see a real paper and study on the speculative Q* heuristic search algorithm. Although their results seems to not justify the effort and added complexity, we are only looking at well-known math problems that those LLMs might be pre-trained and focused a lot. If we change the angle to the algorithm is applied in the general solution search space, with greater complexity, Q* is the way to go!
@drdca8263
@drdca8263 Ай бұрын
I thought Q* was supposed to be a project by Google or OpenAI (I forget which, but I thought it was supposed to be one of them). The authors listed in the paper are indicated as being affiliated with either “Skywork AI” or “Nanyang Technology university”? Is this a model inspired by the rumors of there being a model with the name “Q*”, or is this the model the rumors were about? Were some of these people previously at OpenAI or Google, but not anymore? Or..?
@jswew12
@jswew12 Ай бұрын
It was OpenAI internal document leaks I believe. I’m wondering the same thing! I feel like it has to be related, otherwise this feels kind of wrong. I understand wanting to get eyes on your research, and this seems like good research so I commend them on that, but still. If anyone has more info, leave a reply.
@a_soulspark
@a_soulspark Ай бұрын
I'm also really confused. Skywork AI seems to be a legit company/research group, they have released models in the past. however, I see no indication that their Q* is related to OpenAI's. the authors of this paper don't seem to have a record on big tech companies. one of the authors, Chaojie Wang, has a github page which gives some more context (you can look it up on Google if you want)
@a_soulspark
@a_soulspark Ай бұрын
I also was quite confused! It doesn't seem like the people behind the paper have any relation with big tech companies (Google, OpenAI, Microsoft, etc.) and it doesn't seem like their paper is directly related to OpenAI's supposed Q*
@a_soulspark
@a_soulspark Ай бұрын
my old comment got deleted, perhaps bc some word triggered the algorithm. I just said you can use search to find out more about the authors, the first one in the cover of the paper immediately answers many questions.
@idiomaxiom
@idiomaxiom Ай бұрын
The trick is whether you have a Q* over a sequence or if you figured out how to credit a sequence for good or bad. "The Credit assignment problem". Possibly OpenAI has figured out a fine grained Q* which would give fast accurate feedback and learning.
@user-zd8ub3ww3h
@user-zd8ub3ww3h Ай бұрын
This is very good introduction and enjoy the contents even I have implemented my Q-Learning by myself around 30 years ago.
@GodbornNoven
@GodbornNoven Ай бұрын
Amazing video as always
@danberm1755
@danberm1755 19 күн бұрын
I have a special request. I'm really interested in understanding how query/key/value can "transform" embeddings closer to other embeddings using attention. In particular why do you need these three values? Why not just have a single query matrix to move the embeddings during attention?
@nthehai01
@nthehai01 Ай бұрын
Thank you for such a detailed explanation. Really enjoyed it 🚀. But is this Q* somewhat relevant to the one from OpenAI that people have been talking about 🧐?
@tablen2896
@tablen2896 Ай бұрын
Small tip: black borders on white font makes text easier to read and less tiring to watch
@philtoa334
@philtoa334 Ай бұрын
Nice.
@antaishizuku
@antaishizuku Ай бұрын
Yea i cant find the code for this. Could you please tell us where this is?
@drdca8263
@drdca8263 Ай бұрын
27:58 : you say “estimated utility of reaching the correct answer”. Does this mean “an estimate of what the utility would be if the correct answer is obtained” (which sounds to me like the plainest interpretation , but also the least likely, as I would think the utility for that would be arbitrary) or “the expected value of the random variable which gives utility based just on whether final answer is correct”, or “the expected value of the random variable, utility, which is determined by both whether the final answer is correct, and other things, such as length of answer”, or something else?
@thesimplicitylifestyle
@thesimplicitylifestyle Ай бұрын
Yay! 😎🤖
@allanfelipemurara4211
@allanfelipemurara4211 Ай бұрын
Omg ❤
@yacinezahidi7206
@yacinezahidi7206 Ай бұрын
First viewer here 🗡️
@SirajFlorida
@SirajFlorida Ай бұрын
LoL. Third I guess. Well Yacinezahidi was 0th user, is 1st, and I'm 2nd.
@theoptimisticnihilistyt
@theoptimisticnihilistyt Ай бұрын
wow
@smicha15
@smicha15 Ай бұрын
246th view. Nailed it!
@syedibrahimkhalil786
@syedibrahimkhalil786 Ай бұрын
Fourth then 😂
@user-uz1ol2gs6y
@user-uz1ol2gs6y Ай бұрын
Second
LLM - Reasoning SOLVED (new research)
47:51
code_your_own_AI
Рет қаралды 16 М.
GraphRAG or SpeculativeRAG ?
25:51
code_your_own_AI
Рет қаралды 8 М.
The Giant sleep in the town 👹🛏️🏡
00:24
Construction Site
Рет қаралды 20 МЛН
Zombie Boy Saved My Life 💚
00:29
Alan Chikin Chow
Рет қаралды 26 МЛН
Get 10 Mega Boxes OR 60 Starr Drops!!
01:39
Brawl Stars
Рет қаралды 19 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 290 М.
Markov Decision Processes - Computerphile
17:42
Computerphile
Рет қаралды 164 М.
How Cohere will improve AI Reasoning this year
1:00:23
Machine Learning Street Talk
Рет қаралды 24 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,4 МЛН
NEW TextGrad by Stanford: Better than DSPy
41:25
code_your_own_AI
Рет қаралды 12 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 965 М.
Future Proof Your Tech Career In the Age of AI
10:21
Travis Media
Рет қаралды 35 М.
Is the Future of Linear Algebra.. Random?
35:11
Mutual Information
Рет қаралды 272 М.
The future of AI looks like THIS (& it can learn infinitely)
32:32
The Giant sleep in the town 👹🛏️🏡
00:24
Construction Site
Рет қаралды 20 МЛН