The Most Cited Paper of the Decade - Can We Learn from It?

Рет қаралды 53,325

Күн бұрын

“Adam: A Method for Stochastic Optimization” - is one of the most highly cited papers ever published. Moreover, this paper was written in 2014 by two PhD students! Let’s see why it became so popular and if we can learn anything from it.
A relevant video by @SimonClark • I read the top 100 sci...
00:00 Incredible number of citations
02:37 Authors of the “Adam” paper
04:40 What is Adam method?
05:26 Let’s check the paper!
10:10 Can we learn from it?
10:38 YES
12:26 NO
13:52 Other highly cited papers
14:49 “Adam” is an unusual paper
16:02 Attitude to research
17:15 Other features
Andrey Churkin (Андрей Чуркин) 2024
andreychurkin.ru/

Пікірлер: 88

@zuruumi9849 3 ай бұрын

There are several points to learn from the most cited paper: - Practical usability: make non-scientists want to use it (that's why breakthroughs aren't most cited, they aren't immediately usable outside academia) - Concise, readable: leave details non-essential for use in separate sections to be easily skipped (that's why related literature and proofs are separate) - Graphs, images: if you want to say something important, make it into a graph/image, that's what people scanning through will actually notice - Advertise it: big conferences, cite/link/use it in open-source libraries, etc. In other words, if you want to get lots of citations don't write for academia, write a manual that people on the periphery (not strictly working for universities etc.) can notice, read, understand, and easily use.

@MacProUser99876 2 ай бұрын

Nice takeaways. appreciate your insightful comment.

@go00o87 20 күн бұрын

- it also needs to be on a topic many many people care about also - it needs to be published at the right time as it solves a pressing problem.

@Drudge.Miller 3 ай бұрын

Make a paper about making papers and become the meta paper publisher 😂

@bilsid 2 ай бұрын

Write a paper about citation and cite that paper in your paper making it the forst metacitation

@ClickBeetleTV 2 ай бұрын

Worked for Andrew Garfield

@tobiasyoder 2 ай бұрын

Pro that would be epic. Probably would go in a psychology journal?

@0xnika 3 ай бұрын

You coincidentally stabled across a fundamental fact: Reviewing ML papers is a quite successful strategy for youtube channels ;)

@SirGisebert 3 ай бұрын

Two more points: First, the adam paper entered a feedback loop, where its popularity resulted in a lot of deep learning tutorials on the internet mentioning it. Then, a lot of people with no idea about optimization algorithms pick it because it was recommended in a tutorial, further increasing its citations. Second, the name is in the title. That wouldn't matter for lesser known methods, but when you want to use the software implementation of a method you know nothing about (based on a tutorial you read), it is very easy to figure out which paper to cite when the name is in the title.

@KSayar 2 ай бұрын

This reminds me of a really trivial "method paper" about using arrays of GPU's for parallel matrix multiplication. This paper, published at a conference, said that with a simple modification of mathematical operations, one can use NVDA GPU chips for parallel processing. Most academics would be emberrassed to even submit a paper like this to a conference with their names on it as authors. Yet, the idea catapulted NVDA to a company worth1.5 Trillion dollars.

@incription 2 ай бұрын

Now 2.17T

@RalphDratman 2 ай бұрын

I'd like to know what you find potentially embarrassing about the paper. I'm not arguing -- I have no opinion on this topic -- I am just curious.

@KSayar 2 ай бұрын

@@RalphDratman It is a matter of theoretical depth. The basic idea of the paper is so trivial, so shallow, that a self-respecting academician wouldn't try to publish a paper about it. This is analogous to the notion of "embarrassingly parallel" computations. In parallel computing, parallelizing a computation to run on multiple processing elements (CPU's or GPU's) usually requires significant intellectual effort that only highly creative researchers can discover how to do it effectively, and they get to publish their idea in highly respected journals that publish only 5-10 % of submitted articles. There are other computations that are so easy to parallelize that effective methods are self evident. There is even a name for such computations: they are called "embarrassingly parallel computations." A self-respecting academician would be embarrassed to say that he invented a method to parallelize such computations, much less to submit a paper with his name on it as the author. To get a better idea about 'embarrassingly parallel" computations, you can check the sources here. www.google.com/search?q=embarrassingly+parallel+problems&oq=embarrassingly+parallel&gs_lcrp=EgZjaHJvbWUqBwgBEAAYgAQyCQgAEEUYORiABDIHCAEQABiABDIHCAIQABiABDIHCAMQABiABDIHCAQQABiABDIHCAUQABiABDINCAYQABiGAxiABBiKBTINCAcQABiGAxiABBiKBdIBCDMwMDNqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8#ip=1 more specifically, this one en.wikipedia.org/wiki/Embarrassingly_parallel#:~:text=In%20parallel%20computing%2C%20an%20embarrassingly,a%20number%20of%20parallel%20tasks.

@allinclusive169 Ай бұрын

Saying that this is "only" a method paper would be a great understatement. Firstly, because a lot of ML papers are "just method" papers. You develop a new method and test it on a set of well known datasets to show that your method works better than others. Another factor in the adoption of Adam (which is basically used everywhere all the time as the go to optimization technique now) was really easy to implement in popular machine learning libraries, which some other optimizers were not. Also... It's simply a great idea wrapped in a very well written paper.

@TheCheesyNachos 3 ай бұрын

Two things off the top of my head: 1. At 12:45, about the paper being a method paper. This is very typical in CS where a paper will introduce a problem and then propose a method to solve it, rather than making some discoveries alone. Maybe it will also prove some result also. 2. Worth mentioning also that the original Adam paper had an incorrect proof that was eventually corrected (probably why its arXiv version is edited a few times). I just thought that is also worth mentioning.

@Diogenes-archiv 3 ай бұрын

I never knew the second thing before. Thanks. I think the main factors are 1. The field of ML is fast-paced, and has been booming from the last decade. 2. GD-like Optimization methods are universal on ML paper, thus most ML papers will cite. 3. Adam is implemented in famous ML libraries. So I think the real factor will be that it is a successful Optimization method, which can be universally applied in the ML field.

@seanrrr 2 ай бұрын

Your first point is common in science as well. I can't speak for physics, but at least in biology and chemistry, some of the most highly cited papers are methods and protocols. I suppose that results are only relevant for so long, while methods are cited every single time someone uses them.

@workforyouraims 3 ай бұрын

It was an era of the boom of machine learning. Previously, in many fields there were non machine learning techniques to model the data, but since 2015 and on many of those techniques are barely used anymore since for example deep learning models, can do all the intermediate steps of data extraction, and you do not need many layers of algorithms to model the data. You just need one. To be honest, I think luck is also a great factor. Of course these researchers are really smart and hardworking etc, but it was the right time. It was an era of changing the methods from pre-machine learning to machine learning methods.

@chuscience 3 ай бұрын

Interesting. Thanks for the comment!

@leohuang990 3 ай бұрын

Having the Related Work section right before the Conclusion is common in computer sciences. This style may be specific to research subareas or simply advisors. I previously worked on crypto side channels and now on embedded system security. My previous advisor preferred "related work" as a subsection in the Introduction, while my current one prefers the other. I 100% agree with you about the advantage of method papers over discovery papers. However, it is not strange to see short but impactful papers in theoretical computer science. Hao Huang's paper in 2019 is an example. It is 5 pages long, the main body (a proof) is two pages long, and the math is at best graduate-level. But, it solves a 30-year-old conjecture of Boolean Sensitivity. The paper's value lies in its simplicity against a long-lasting problem.

@orbital1337 3 ай бұрын

Luck is such a massive factor ultimately. The contribution of the Adam paper in terms of new ideas doesn't really stand out to me. It is a small improvement over previous approaches which themselves are just various ways of implementing momentum into stochastic gradient descent (a very intuitive concept). It's a nice paper for sure but there are probably thousands of papers which have more substantial new ideas and yet end up with like 10 citations. You have to have exactly the right idea at the right time in history to have big impact. Also, it is so field dependent. In certain fields like ML, there are just way more papers overall. Let's be honest, most of them have barely any novelty and just tweak existing methods a bit. They appear in lower tier conferences and that's it. And then for a lot of the top papers by the big companies like DeepMind, the main "novelty" is that they threw 10x the compute power on the problem compared to anyone else. Like obviously you get better results if you spend $100 million on GPU clusters.

@liam9519 2 ай бұрын

It is luck in a sense that by chance the authors were the first to stumble upon this relatively simple and in hindsight obvious algorithm that just so happened to be the most robust deep learning optimizer out there for practically all use cases. As such it is the defacto choice for training basically any deep learning method and hence is cited in essentially *every* deep learning paper that is or will ever be released. It's almost like if they had stumbled upon the idea of 'matrix multiplication' and written a paper about it. That would also be a very highly cited paper.

@DivinesLegacy Ай бұрын

Brutal black pill

@Militaizi 3 ай бұрын

I think this highlights the importance of practicality in paper publishing and research. I use it almost daily, or at least weekly, ADAM, or nowadays its better performing derivatives in ML Engineering. I must admit I don't understand it anymore 100%, but I have seen it is the most robust optimization algorithm. I only have twice or 3 times re-read the papers that I read during my masters thesis initially. Other optimization techniques are harder to grasp, manifold.

@christiangreisinger2339 3 ай бұрын

Geoffrey Hinton was the PhD supervisor of Jimmy Ba (one of the two Adam paper authors). I feel like this could heavily be contributed to his success as Geoffrey is literally called "The Godfather of AI". He is one of the most cited people ever and has an enormlus influence on the whole scientific community

@ryyanoh 2 ай бұрын

I am a current student at the University of Toronto who has audited Jimmy Ba's CS course and it is known that he helps many graduate students with their papers and since UofT is a fairly large university, he likely has a lot of opportunities to do so.

@weeb3277 3 ай бұрын

I think you should start inviting guests who were cited numerous times.

@manueltiburtini6528 3 ай бұрын

Good story telling and narrative. I like your videos ! Thanks

@boylanpardosi4586 Ай бұрын

The Systematic Review of Systematic Reviews: A Systematic Review

@john.darksoul 10 күн бұрын

I absolutely love this paper. You can open it, rewrite the algorithm presented in your programming language of choice, and it just works :D

@Benforeva 3 ай бұрын

The Related Work section showing up late in the paper is advice I’ve seen from CS researcher Simon Peyton-Jones. He has popular Microsoft Research talks on KZfaq describing this format. As you noted it allows the reader to dive into your own original content as quickly as possible.

@emperor4102 3 ай бұрын

I love this channel Great work my friend

@gurneys13 3 ай бұрын

I’m not even a research student and I love this channel. I want to make special note of how good you are at communicating your ideas. You said somewhere you are from Russia, and honestly, you speak better than most native speaking English professors

@osman7900 2 ай бұрын

I have a friend who is a scientist at DeepMind. He says there are two criteria for measuring performance of the researchers. One is coming up with significant research ideas that may contribute to the development of a general AI and the second is convincing fellow researchers to work on those ideas. So their success criteria is not quantitative but qualitative.

@flyatnight-1812 3 ай бұрын

Just leaving a comment here but I think this video was very long (Even though I finished it cuz you're nice to listen to and the topic is interesting). However surely you present the things that we can learn from the adam paper FIRST, and then explain the backstory and check the paper etc.

@kilogods Ай бұрын

Wow man your videos are fantastic.

@giovannibarbarani464 3 күн бұрын

Adam is very important, everyone who works with DL knows that without Adam it won't work so well or it will take ages to train, making it just impractical. It's a paper with many leaks (and a wrong proof) but its impact is unquestionable. However RMSProb was quite good as Adam but it has not even cared about a pubblication lol (it is quoted from a blogpost and a coursera video).

@rmkky2470 2 ай бұрын

My professor for deep learning course is one of the author of this paper. pretty cool

@mehmetcansoyluoglu9575 20 күн бұрын

It's a nice video, and I would like to highlight a point where I believe should also be looked at especially when the number of publication of the authors of ADAM paper were compared in the year they published their paper. I believe the number of publication per author can be also evaluated from the previous years of initial ADAM publication. The question is, until they reach the level of publication of their ADAM paper how intense they focused on to this project, and if we can follow it through the number of publications they published before ADAM paper. Because, if they really focused on one big project, then it must be appeared at the number of publication until they publish ADAM paper. So I believe this would be another interesting aspect to look at on this matter.

@osman7900 2 ай бұрын

It is not surprising that method papers are cited more. Discoveries may be interesting or important but in practice it is the methods that have more impact. You can take a method and use it in a meaningful application across a variety of disciplines but what can you do with a discovery?

@matteodonati5113 2 ай бұрын

Bro, ResNet’s paper (2015) has 200k citations

@yuanfongsu 2 ай бұрын

Correct. scholar.google.com.tw/citations?view_op=view_citation&hl=zh-TW&user=DhtAFkwAAAAJ&citation_for_view=DhtAFkwAAAAJ:ALROH1vI_8AC

@BigBaibars Ай бұрын

Can you elaborate more on 15:50 where you said that Google facilities aid in general research processes? Its very interesting. Great video

@chuscience Ай бұрын

Hi! It is an interesting phenomenon indeed. In this particular case, if you check the CVs of Durk Kingma or Jimmy Ba, you will see that they did an internship/fellowship with Google around 2014-2015, that is, during their PhD studies. I don't know all details, obviously. But probably Google had a significant influence on their research. Many other famous papers in the field (for example, "Attention Is All You Need") are fully or partially affiliated with Google. Again, I don't have much experience here. But it seems that getting an internship (or any collaboration) with companies like Google can boost your academic career. Andrey

@kanalprobny1927 3 ай бұрын

Maybe this is such a highly cited paper, also because the proposed method is very good?

@surters 3 ай бұрын

I looked up my two acquaintances I knew had published, there was a similar difference between those two that was mentions in this video, maybe for the same reasons.

@ai._m 14 күн бұрын

The answer is very simple: because AI! It’s a go-to method in the most powerful new approach that we have, an approach that is applicable to everything.

@user-bj8lt3xs3e 3 ай бұрын

I just reacted to your last year video....same subject ;-)

@chivoronco4853 3 ай бұрын

No puedo evitarlo pero este tipo se parece a Sheldon de The Big Bang Theory 13:04

@GoldenBeholden 3 ай бұрын

While the sheer number of citations is still surprising, the seemingly unorthodox setup of the paper itself certainly isn't if you come from a computer science background. The truly fundamental papers tend to be more mathematical in nature, but many seemingly primarily serve as a more formal documentation of source code. Since software can be so quickly iterated upon, papers within the field are at their most useful if you can read and replicate them in your own work within an afternoon.

@alexandermuller3992 3 ай бұрын

You forgot to mention that the proof of convergence is incorrect :p

@asandhu28 3 ай бұрын

Nice video

@dmitriibochenko4618 2 ай бұрын

Soft kitty, Warm kitty, Little ball of fur. Happy kitty, Sleepy kitty, Purr Purr Purr

@nataliemreow 26 күн бұрын

what

@alvaromarcoperes8273 5 күн бұрын

@@nataliemreow from the tv show big bang theory

@user-hd3pz2ow1b 2 ай бұрын

thanks

@mitchumsport 2 ай бұрын

right place right time

@johnsmith1953x 2 ай бұрын

*Special Relativity is not a highly cited paper because its common knowledge.* It goes beyond citation realm, like Newton's Laws, Schrodingers equation, Maxwells equations, etc. Nobody cites those anymore because they are bigger than citations.

@SugarBoxingCom 3 ай бұрын

U either care about the science (screw citations and impact factor) ... Or about money. Can't sit on two chairs at the same time

@sergeipravosud1848 3 ай бұрын

Wow, this number of citations looks as something very beyond the truth! Even Lotfi A. Zadeh, who introduced fuzzy logic in his 1965 paper, has roughly 140k citations according to the Google Scholar.

@jordanlin4437 2 ай бұрын

Not to be nit-picky but the 2015 ResNet paper (Deep Residual Learning for Image Recognition) is cited over 200k times and is more widely considered *the* most cited recent paper. (I don’t know if you clarify this later in the video but I had to type this because of the title :P)

@vzxvzvcxasd7109 2 ай бұрын

Transformers is all you need was published less than 10 years ago, and is cited 110k videos

@ikhtiyor 6 күн бұрын

yeah adam is less sensitive version of Adagrad

@user-le1oc9js4h 2 ай бұрын

Хаха Чуркин Jokes aside, nice content :)

@weeb3277 3 ай бұрын

looks like some papers go viral, or equivalent of that. wait till AI starts publishing papers.

@igors.9420 3 ай бұрын

Андрюх , ты чет похудел как будто в своем зазеркалье

@ejuss4216 2 ай бұрын

KennyS brother

@reneguluscornes 3 ай бұрын

10:38 Just skipped randomly, I got my answer 😂 Edit: Twice now.

@3a146 3 ай бұрын

Sir Humphrey: Yes and No.

@xenonmob 3 ай бұрын

T490s?

@chuscience 3 ай бұрын

T14s

@koluj192 3 ай бұрын

bro aint no way dudes surname is churkin

@RobertOSullivan 2 ай бұрын

we use it in machine learning

@user-kw5qv6zl5e Ай бұрын

Probably worked out how to add two numbers together in the most obtuse way ever thought. ..so after that ..people who never thought about it that way...hey I've got another idea...I'll add 1 to that one and subtract 1 from the other..BUT I'll cite the original. ...see where we are going here ? ...a citation is more than likely an acknowledgement of lack of original thought....

@user-kw5qv6zl5e Ай бұрын

Put it this way...Diego Maradona didn't need to go to an Institute of Sport to learn how to play soccer. ...he was a genius on the pitch a natural ...(forget his off field)...in other words... no citations required...it was obvious....

@mehmetufukdalmis 3 ай бұрын

And there is a mistake in the paper. The bug in the proof was found years later.

@RexPilger 3 ай бұрын

Can you provide a link to the source for your comment?

@dagon4755 2 ай бұрын

@@RexPilger arxiv.org/pdf/1904.09237.pdf @ChuScience

@BobHank2 3 ай бұрын

The sunk cost of your PhD is the cause of your bias. The bias is that academics is currently significant. It's not. The University Industrial complex profits from tuition. Tuition at high rated universities has an inelastic price curve. The ever rising profits are not shared with the highly educated laborers. They are economically dependent useful idiots. The profits are invested into alumni preferred areas like sports and football. You know this is true for literature and philosophy depts. It's now true for the science depts.

@v-ba 3 ай бұрын

Wait what, alumni prefer sports and football? I'm not from the US so that sounds wild to me

@mapleveritas2698 2 ай бұрын

Graduate studies provide the time and space to think. That is the value of it. Now, most university administrators and indeed most professors don't think that. And apparently you don't think that either. That is fine. The point is that advances tend to come from people who have the time to think and do wild things, profit be damned. In universities or corporations. So, it is rather difficult to judge the value of such a place. For example, what is the value of Bell Labs in the old AT&T? Was it more valuable than Google right now? If you look at the bottom line, of course it was not. But if you looked at what they accomplished, that is a very different story. It is not always about the cost. I have a PhD but I did not become a professor. Wanting to have time and space to think and wanting to be a professor are two different things. I don't care if most people getting PhD want to be a professor. You don't have to, and I did not become one.