Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

  Рет қаралды 104,759

Stanford Online

Stanford Online

28 күн бұрын

April 11, 2024
Speakers: Jason Wei & Hyung Won Chung, OpenAI
Intuitions on Language Models (Jason)
Jason will talk about some basic intuitions on language models, inspired by manual examination of data. First, he will discuss how one can view next word prediction as massive multi-task learning. Then, he will discuss how this framing reconciles scaling laws with emergent individual tasks. Finally, he will talk about the more general implications of these learnings. Slides here: docs.google.com/presentation/...
Shaping the Future of AI from the History of Transformer (Hyung Won)
Hyung Won: AI is developing at such an overwhelming pace that it is hard to keep up. Instead of spending all our energy catching up with the latest development, I argue that we should study the change itself. First step is to identify and understand the driving force behind the change. For AI, it is the exponentially cheaper compute and associated scaling. I will provide a highly-opinionated view on the early history of Transformer architectures, focusing on what motivated each development and how each became less relevant with more compute. This analysis will help us connect the past and present in a unified perspective, which in turn makes it more manageable to project where the field is heading. Slides here: docs.google.com/presentation/...
About the speakers:
Jason Wei is an AI researcher based in San Francisco. He is currently working at OpenAI. He was previously a research scientist at Google Brain, where he popularized key ideas in large language models such as chain-of-thought prompting, instruction tuning, and emergent phenomena.
Hyung Won Chung is a research scientist at OpenAI ChatGPT team. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT.
More about the course can be found here: web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: • Stanford CS25 - Transf...

Пікірлер: 54
@yoesemiat
@yoesemiat 15 күн бұрын
The fact that giving more freedom to the model and having less inductive biases affected by human subjectivity actually improves performance is really iluminating. Thanks.
@jean-pierrecoffe6666
@jean-pierrecoffe6666 5 күн бұрын
Nothing new under the sun, this is just the Bitter Lesson
@zyxbody
@zyxbody 8 күн бұрын
I dont understand anything but I like how these people teach.May all get to understand the concepts thats my only prayer.
@TrishanPanch
@TrishanPanch 25 күн бұрын
Outstanding. I teach an AI class and there are loads of great pedagogical nuggets here that I am going to borrow.
@ankitthawal1313
@ankitthawal1313 17 күн бұрын
Can you explain, what are those?
@lugia8888
@lugia8888 14 күн бұрын
Nice, a fake class.
@irshviralvideo
@irshviralvideo 13 күн бұрын
@@anshuraj4277 why bother going to college to learn ?
@calm694
@calm694 12 күн бұрын
@@anshuraj4277 learn english first before making going to AI CS
@packsw9243
@packsw9243 12 күн бұрын
@@calm694 "before making going" yeah you're a real genius
@michaelbernaski7337
@michaelbernaski7337 24 күн бұрын
Excellent. First talk is practical. Second is profound. Thank you.
@ariG23498
@ariG23498 19 күн бұрын
He has his slides in his head! Loved the content.
@inforoundup9826
@inforoundup9826 26 күн бұрын
Great talks by both speakers
@izumskee
@izumskee 25 күн бұрын
Very great talk. Thank you
@sanesanyo
@sanesanyo 25 күн бұрын
One of my favourite talks in recent times..learnt so much from this.
@ricopags
@ricopags 25 күн бұрын
Really grateful for this being uploaded! Thank you to both speakers and to Stanford for the generosity. Highlight of the video for me is the Hyung's sheepish refusal to get into predictions on the staying power/relevance of MoE or any specific architecture. It felt like a wasted question since the premise of his talk is "tl;dr Sutton's Bitter Lesson"
@sady01
@sady01 16 күн бұрын
What an amazing lecture. It was simple, yet groundbreaking
@itsaugbog
@itsaugbog 4 күн бұрын
Hilariously Jensen Huang from NVIDIA just spoke in an fireside chat recently about how they're already dependent on AI and models for designing chips so that last comment is already happening. Great talk.
@atdt01410x
@atdt01410x 23 күн бұрын
This lecture is super useful. really appreciate.
@adamlin120
@adamlin120 24 күн бұрын
Great and inspiring talks
@Aditya-ri7em
@Aditya-ri7em 7 күн бұрын
he came and started teaching like a teacher .
@Faustordz
@Faustordz 8 күн бұрын
Very intriguing!
@laalbujhakkar
@laalbujhakkar 25 күн бұрын
Thanks for all the extra popping into the mic during the intro brrrruh!
@MatijaGrcic
@MatijaGrcic 9 күн бұрын
Amazing!
@zacharykosove9048
@zacharykosove9048 20 күн бұрын
The students were asking some great questions, no wonder I don't go to Stanford
@roro5179
@roro5179 16 күн бұрын
im the dude at the end (dont go to Stanford xd)
@mprone
@mprone 16 күн бұрын
Questions looked pretty naive to me. What's "great" about them to you?
@CrazyFoxMovies
@CrazyFoxMovies 25 күн бұрын
Great lecture!
@lugia8888
@lugia8888 14 күн бұрын
All of this is BS 😂
@doinitlive3015
@doinitlive3015 4 күн бұрын
Types of leadership can be used as an analogy in the area of using less structure but at the same time performance is higher. A leader who utilizes an authoritarian type of leadership increases productivity within the team but decreases the team's creativity. Whereas a team under a democratic type of leadership are able to solve problems with increased creativity leading to innovative ideas.
@heyitsjoshd
@heyitsjoshd 20 күн бұрын
How do we know what is small vs large? For example, with emergent tasks, it highlights that more data could lead to more accuracy with enough compute. The small LM would have not seen accuracy improvements but the large LM did. For the tasks currently indicated as flat, couldn't we just not have enough compute now to know if these tasks would get more accurate?
@gmccreight2
@gmccreight2 24 күн бұрын
Thanks for the talk! Really interesting stuff. I had one question. At 1:04:00 Hyung suggests that uni-directional attention is preferable to bidirectional attention in turn-taking scenarios because it allows the reuse of calculated information in the KV cache. I'm trying to understand how this fits into his broader thesis that we should be moving towards more generic approaches. On the surface the use of the KV cache doesn't feel particularly generic. Does it make sense because masked self-attention is necessary for next token generation, anyhow, so using a causal attention mask universally makes sense?
@Lalala_1701
@Lalala_1701 19 күн бұрын
Andrew ng also took same kind of example to explain LM.
@DanBillings
@DanBillings 20 күн бұрын
Please put the subject of the talk in the title. You can then market the OpenAI speakers
@erebi8386
@erebi8386 12 күн бұрын
형원게이 힘내라
@dkierans
@dkierans 10 күн бұрын
Yeah, this is a pretty great talk. It is quite hard to figure out at what technical level to hit the widest audience. This is nice. Not as nice as those flaxen locks though.
@aliwaheed906
@aliwaheed906 5 күн бұрын
Maybe the emergent behavior happens because for that task to be learned there are a set of pre-requisite tasks that need to be learned first. Just brainstorming here.
@dodowoh3683
@dodowoh3683 20 күн бұрын
Surprised by the amount of hair an AI scholar may have retained.
@hedu5303
@hedu5303 21 күн бұрын
Strange world. This dude is almost a kid and gives a lecture
20 күн бұрын
I am happy to learn from any kid :)
@chaidaro
@chaidaro 18 күн бұрын
His intuition is older than me
@vireyes1595
@vireyes1595 16 күн бұрын
nah man gotta recognize game when you see it. dude’s a future titan of the industry and we’re out here getting his guest lecture for free. pretty solid win for all parties involved in my book
@SuperHeromindNsoul
@SuperHeromindNsoul 14 күн бұрын
True we can all learn from each other and Speakers here also learn from someone
@MrAmgadHasan
@MrAmgadHasan 13 күн бұрын
Indeed. Many of the recent breakthroughs ML were achieved by people in their 20s, mostly during or briefly after their PhDs.
@robertwilsoniii2048
@robertwilsoniii2048 8 күн бұрын
Something that always bothered me was that adding in random terms increases predicability power, holding sample size constant (scaling compute without increasing data size). The peoblem is it decreases explanatory power and ability to understand the individual contributions of each variable. It's like pop-astrology, star signs -- libra, gemini, leo... etc. -- adding extra variables improves scaling compute and predictability, but does it add anything to clarity? I suppose to make predictions clarity doesn't matter. That always annoyed me.
@elcanmhmmdli3305
@elcanmhmmdli3305 22 күн бұрын
Azerbaijan❤
@rasen84
@rasen84 25 күн бұрын
The second half is 100% wrong on the idea that scaling is what matters and adding complexity into the model, adding inductive biases bites you in the ass later. You're not considering the considerable amount of human labor allocated to data curation and handwritten instruction tuning data. That is necessary because the model is too simple and too dumb. The model doesn't have the necessary inductive biases to intelligently take any data. You need to add more inductive biases in order to obviate the need for human labor on data curation and creation.
@user-se3zz1pn7m
@user-se3zz1pn7m 23 күн бұрын
He is not talking about the immediate moment. He is discussing what kind of model would be preferable when there is an abundance of data and computing resources. He mentioned that due to the current limitations in computing resources, it's necessary to use models with some degree of inductive bias. Although he didn't say it explicitly, he probably thinks that models with inductive bias are also needed due to limitations in data. However, in the future, as more computing and data resources become available, models with less inductive bias will be better.
@rasen84
@rasen84 23 күн бұрын
@@user-se3zz1pn7m what I’m saying is that the data collection, creation and curation process should count towards model complexity and scaling hypothesis. You could be removing complexity from the model and offloading that complexity to human data curators and creators.
@user-se3zz1pn7m
@user-se3zz1pn7m 23 күн бұрын
​ @rasen84 , I believe we are on the same page. I agree with your point that "You could be removing complexity from the model and offloading that complexity to human data curators and creators." However, I think he is talking about the trends and the distant future, perhaps 10 years from now. Yes, if we remove complexity from the model and training methods, we will need more resources to compensate for the trade-off in data preparation. However, in the future, there may be a vast array of open-source data available and synthetic data generated through self-play approaches. Then, our goal will be to reduce assumptions in the model, give it more freedom and make it bigger . I believe this is what he intended.
@hang_8169
@hang_8169 15 күн бұрын
@@rasen84 I would argue even if you use old method which has more structure in it, you still need spend the same amount of effort on data if not more to be adhere to the structure that you impose on the model. Because your model has MORE assumptions on data that it expects not less.
@rasen84
@rasen84 5 күн бұрын
@@hang_8169 then it’s time to add more inductive biases.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 134 М.
КАХА и Джин 2
00:36
К-Media
Рет қаралды 4 МЛН
The Worlds Most Powerfull Batteries !
00:48
Woody & Kleiny
Рет қаралды 12 МЛН
Joscha at Microsoft
48:46
Simuli
Рет қаралды 1,6 М.
Your understanding of evolution is incomplete. Here's why
14:21
Take a Seat in the Harvard MBA Case Classroom
10:00
Harvard Business School
Рет қаралды 13 МЛН
Mapping GPT revealed something strange...
1:09:14
Machine Learning Street Talk
Рет қаралды 132 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 229 М.
Stanford CS25: V4 I Overview of Transformers
1:17:29
Stanford Online
Рет қаралды 43 М.
КАХА и Джин 2
00:36
К-Media
Рет қаралды 4 МЛН