No video

New xLSTM explained: Better than Transformer LLMs?

  Рет қаралды 5,521

code_your_own_AI

code_your_own_AI

Күн бұрын

JUST days ago a new alternative to transformer LLMs was published: xLSTM, in particular mLSTM. The Matrix Long Short-Term Memory (mLSTM) network is an advanced variation of the traditional Long Short-Term Memory (LSTM) model. The core idea of mLSTM is based on "accumulated covariance" with exponential gating functions. I explain it in detail in this video and compare it to the classical attention mechanism.
The actual performance can't be independently evaluated at the moment, since the research paper was just published. I will keep you informed.
mLSTM differentiates itself by employing a matrix-based approach to its architecture, where both the input and recurrent weights along with the gates (input, forget, and output gates) are represented as matrices rather than the standard vectors. This configuration allows the mLSTM to process inputs and maintain internal states using matrix operations, facilitating a more intricate interaction between inputs and the recurrent network's hidden states.
One of the most significant innovations of mLSTM is its ability to capture and represent more complex relationships and dependencies within the data. By utilizing matrices to represent its states and operations, mLSTM can encapsulate relationships across multiple dimensions of the input data simultaneously, increasing the network's representational power and computational efficiency, especially for tasks involving high-dimensional data sets such as natural language processing and time series analysis involving multiple variables. This matrix approach not only enhances the depth of data interaction within each cell of the network but also allows the network to model interactions across different features within the data
All rights w/ authors:
xLSTM: Extended Long Short-Term Memory
arxiv.org/pdf/...
#airesearch
#ai
#newtechnology

Пікірлер: 13
@propeacemindfortress
@propeacemindfortress 3 ай бұрын
nice, my favorite timeseries staple get's an upgrade 😄 awesome find, and big big thanks for sharing
@first-thoughtgiver-of-will2456
@first-thoughtgiver-of-will2456 2 ай бұрын
this just makes me want to innovate off mamba
@wiktorm9858
@wiktorm9858 3 ай бұрын
Is there a ready-made pytorch implementation of this?
@timothywcrane
@timothywcrane 3 ай бұрын
I hope this resets the audio industry as well. LSTM are great for melody prediction etc... I wonder how this new modeling will be applicable and expandable in scope.
@Dom-zy1qy
@Dom-zy1qy 3 ай бұрын
I haven't had much luck creating a good model to predict melodies. Any resources you recommend?
@timothywcrane
@timothywcrane 3 ай бұрын
@@Dom-zy1qy check out @ValerioVelardoTheSoundofAI
@denishclarke4470
@denishclarke4470 2 ай бұрын
Hey, please provide the slides
@davidhauser7537
@davidhauser7537 3 ай бұрын
very cool
@SergiiNechuiviter
@SergiiNechuiviter 2 ай бұрын
Overcomplicated explanation. Too many formal definitions, which relay don't add to comprehensibility .
@thedoctor5478
@thedoctor5478 3 ай бұрын
woh woh. did you forgot to say a little something at beginning of video?
@thomasmitchell2514
@thomasmitchell2514 3 ай бұрын
Hahaha my wife rolls her eyes when I say it along with him after gleefully clicking on a new upload 😅 Also I can’t help echoing “beautiful” out loud even with headphones on 😂
@JonathanYankovich
@JonathanYankovich 3 ай бұрын
He said it :)
@user-wd8wx5md5z
@user-wd8wx5md5z 3 ай бұрын
​ @thomasmitchell2514 What are you all talking about ? What is the funny part ? all I see is machine learning stuff ...
Understand DSPy: Programming AI Pipelines
28:21
code_your_own_AI
Рет қаралды 4 М.
New Discovery: LLMs have a Performance Phase
29:51
code_your_own_AI
Рет қаралды 15 М.
OMG what happened??😳 filaretiki family✨ #social
01:00
Filaretiki
Рет қаралды 13 МЛН
Parenting hacks and gadgets against mosquitoes 🦟👶
00:21
Let's GLOW!
Рет қаралды 13 МЛН
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 160 МЛН
Kids' Guide to Fire Safety: Essential Lessons #shorts
00:34
Fabiosa Animated
Рет қаралды 16 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 290 М.
Transformers explained | The architecture behind LLMs
19:48
AI Coffee Break with Letitia
Рет қаралды 23 М.
Reacting to Controversial Opinions of Software Engineers
9:18
Fireship
Рет қаралды 2 МЛН
No more Fine-Tuning: Unsupervised ICL+
31:09
code_your_own_AI
Рет қаралды 4,9 М.
A* Search: How Your Map Applications Find Shortest Routes
16:17
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 966 М.
Accelerated Training by Amplifying Slow Gradients
26:29
Tunadorable
Рет қаралды 29 М.
xLSTM: Extended Long Short-Term Memory
57:00
Yannic Kilcher
Рет қаралды 35 М.
OMG what happened??😳 filaretiki family✨ #social
01:00
Filaretiki
Рет қаралды 13 МЛН