New xLSTM explained: Better than Transformer LLMs?

No video

New xLSTM explained: Better than Transformer LLMs?

Рет қаралды 5,521

Күн бұрын

JUST days ago a new alternative to transformer LLMs was published: xLSTM, in particular mLSTM. The Matrix Long Short-Term Memory (mLSTM) network is an advanced variation of the traditional Long Short-Term Memory (LSTM) model. The core idea of mLSTM is based on "accumulated covariance" with exponential gating functions. I explain it in detail in this video and compare it to the classical attention mechanism.
The actual performance can't be independently evaluated at the moment, since the research paper was just published. I will keep you informed.
mLSTM differentiates itself by employing a matrix-based approach to its architecture, where both the input and recurrent weights along with the gates (input, forget, and output gates) are represented as matrices rather than the standard vectors. This configuration allows the mLSTM to process inputs and maintain internal states using matrix operations, facilitating a more intricate interaction between inputs and the recurrent network's hidden states.
One of the most significant innovations of mLSTM is its ability to capture and represent more complex relationships and dependencies within the data. By utilizing matrices to represent its states and operations, mLSTM can encapsulate relationships across multiple dimensions of the input data simultaneously, increasing the network's representational power and computational efficiency, especially for tasks involving high-dimensional data sets such as natural language processing and time series analysis involving multiple variables. This matrix approach not only enhances the depth of data interaction within each cell of the network but also allows the network to model interactions across different features within the data
All rights w/ authors:
xLSTM: Extended Long Short-Term Memory
arxiv.org/pdf/...
#airesearch
#ai
#newtechnology

Пікірлер: 13

@propeacemindfortress 3 ай бұрын

nice, my favorite timeseries staple get's an upgrade 😄 awesome find, and big big thanks for sharing

@first-thoughtgiver-of-will2456 2 ай бұрын

this just makes me want to innovate off mamba

@wiktorm9858 3 ай бұрын

Is there a ready-made pytorch implementation of this?

@timothywcrane 3 ай бұрын

I hope this resets the audio industry as well. LSTM are great for melody prediction etc... I wonder how this new modeling will be applicable and expandable in scope.

@Dom-zy1qy 3 ай бұрын

I haven't had much luck creating a good model to predict melodies. Any resources you recommend?

@timothywcrane 3 ай бұрын

@@Dom-zy1qy check out @ValerioVelardoTheSoundofAI

@denishclarke4470 2 ай бұрын

Hey, please provide the slides

@davidhauser7537 3 ай бұрын

very cool

@SergiiNechuiviter 2 ай бұрын

Overcomplicated explanation. Too many formal definitions, which relay don't add to comprehensibility .

@thedoctor5478 3 ай бұрын

woh woh. did you forgot to say a little something at beginning of video?

@thomasmitchell2514 3 ай бұрын

Hahaha my wife rolls her eyes when I say it along with him after gleefully clicking on a new upload 😅 Also I can’t help echoing “beautiful” out loud even with headphones on 😂

@JonathanYankovich 3 ай бұрын

He said it :)

@user-wd8wx5md5z 3 ай бұрын

@thomasmitchell2514 What are you all talking about ? What is the funny part ? all I see is machine learning stuff ...