Decoding AI's Blind Spots: Solving Causal Reasoning

Рет қаралды 2,020

Күн бұрын

How to Fix AI's Causal Reasoning Failures as evident in my last video on "Financial AI Brilliance: 7 Children at Stanford?" • Financial AI Brillianc...
Great response from the AI community on my prompt, testing causal reasoning, where all LMMs failed.
Here now my response.
#airesearch
#aieducation
#failure

Пікірлер: 26

@user-en4ek6xt6w 23 күн бұрын

Love the video, can't wait for the next. Thank you. Building a hybrid model will be a banger video.

@MBR7833 20 күн бұрын

Thank you so much for your content! So for the 7th student thing, I think I understand why LeChat worked but not Claude / ChatGPT: LeChat likely did not have the "reasoning training" (or the meta prompt with all the examples) that the more recent models have, and therefore was not "tricked". If you have not come across this article / team, I would love to understand it more "Transformers meet Neural Algorithmic Reasoners" by the team led by Petar Veličković at Deepmind which is likely one of the most interesting teams as they do research in topology (group theory etc)

@danbolser5913 15 күн бұрын

I thought about creating a discord for this community to discuss interesting papers like this.

@user-zd8ub3ww3h 23 күн бұрын

Not sure there is the same response in difference languages for 7th student question. However, we did find GPT-4o can response correct anawer in Tranditional Chinese.

@code4AI 23 күн бұрын

Interesting. Maybe some languages have a different inherent solution capability (semantic, syntax ...). Thank you.

@drumboss972 23 күн бұрын

@@code4AI or better data in the training set

@matinci118 23 күн бұрын

@user-zd8ub3ww3h This is interesting. Could you share the translation you used? Chinese also follows subject-verb-object structure, just like english, so syntax shouldnt be a key difference. But for example already the word 'they' in "they pay 90% of the fees" could be translated into 他們 (tamen) which then is a unequivocal reference to the families or into 它们 (also tamen :) ), which is the non-human plural and then would be reference to the university. Also the 6/7 children issue might be 'masked' by Chinese, because - at least my quick google translatition - this is turned into ...孩子都... (haizi dou), where the last character (dou) indicates that the statement applies to all (the children), in turn then also indicating there are no other. So my suspicion is that the translated prompt is clearing up some if the intended imprecision, hence leading to the 'right' answer, so sadly (or luckly?) no different inherent solution capacity, just more precise prompt.

@user-zd8ub3ww3h 22 күн бұрын

@@matinci118 This how we work, I use screen shot to capture the Question in English and ask ChatGPT to explain the image. Below is the ChatGPT response (in Tranditoinal Chinese). "這段文字描述了一個假設情境，並提出了一個問題。根據這段文字，史丹佛大學為低收入家庭提供經濟援助，支付他們90%的官方費用。問題是，如果一個有6個孩子的貧困家庭把所有孩子都送到史丹佛大學，那麼他們什麼時候才能從史丹佛大學收到足夠的錢來將第7個孩子送到史丹佛大學，如果他們根本沒有錢？這個問題的答案應該是，他們永遠無法從史丹佛大學收到足夠的錢來將第7個孩子送到史丹佛大學，因為即使史丹佛支付90%的費用，家庭仍然需要支付剩下的10%。如果這個家庭根本沒有錢，他們就無法支付這10%的費用，所以他們無法將第7個孩子送到史丹佛大學。"

@user-zd8ub3ww3h 22 күн бұрын

One more thing is that: I try to duplicae the scenario but did not successful. Currently all my trials at GPT-4o are all wrong answers.

@manslaughterinc.9135 23 күн бұрын

At the beginning, you talk about wanting the model to give you an answer based on common language. Specifically, the word 'Recieved' in the prompt. This specific problem can be solved though a Theory of Mind + Re-ask step. Have the model ask itself, "What is the user actually thinking," then "How can I ask this question better?". This solves a significant amount of failures caused by poor prompts, since the LLM is answering questions in language it is more familiar with. It brings the question into a vector space that is more aligned with its own knowledge. This of course does not solve problems the LLM isn't trained on. It just reduces failures on problems the LLM is trained on.

@code4AI 22 күн бұрын

Be advised, that the "Theory of Mind" applied to machines is a rather controversial topic. Serious views are presented here: Theory of mind-the ability to understand other people’s mental states? spectrum.ieee.org/theory-of-mind-ai

@toddbrous_untwist 23 күн бұрын

Thank you!

@BeOnlyChaos 23 күн бұрын

Would love to hear your thoughts on LLMs doing so badly on arc-prize.

@manslaughterinc.9135 23 күн бұрын

This is actually a pretty simple answer. The arc challenges are 2 dimensional graphs. LLMs work with 1 dimensional strings. They don't actually 'see' the line breaks. The line breaks are a special character in the string. Even vision models kind of operate like this. They're not operating on segmentation. They are specifically designed to convert images to vectors, which are roughly equivalent to text embeddings. Basically, vision is just object recognition. It doesn't do well at identifying the relation between the two objects. Think about the dataset that they are trained on. Images with descriptions. It's just an image to text engine.

@code4AI 22 күн бұрын

Please note, that comments to your post by different people here might be factual incorrect. Do not rely on factual information given by people as a comment, as their opinion or views are only their opinions and might be incorrect. If you are interested in ARC-Price, validate the data format here: www.kaggle.com/competitions/arc-prize-2024/data

@danbolser5913 15 күн бұрын

@@code4AI There goes my day...

@christiand6312 21 күн бұрын

Can we have a discord please?

@christiand6312 21 күн бұрын

Also can I become good at jax when aiming for parrallelism improvments to reduce compute costs in training an LLM? Can you pleae explain hwo we can use an 8xh100 for 6-12 hours and get a 70B model trained on some Legal corpus data for a niche case? Is this actually possible? Would love to know.

@danbolser5913 15 күн бұрын

@@christiand6312 For this, you'll just have to try!

@christiand6312 15 күн бұрын

@@danbolser5913 hmm ok. So it’s not going tot take a few days? I would assume ? Like 6 epochs and we are done by lunch time… I’m dreaming right?

@christiand6312 15 күн бұрын

@@danbolser5913 thanks for your replay

@christiand6312 15 күн бұрын

@@danbolser5913 reply*