A Very Simple Transformer Encoder for Time Series Forecasting in PyTorch

  Рет қаралды 3,340

Let's Learn Transformers Together

Let's Learn Transformers Together

Күн бұрын

The purpose of this video is to dissect and learn about the Attention Is All You Need transformer model by using bare-bones PyTorch classes to forecast time series data.
Code Repo:
github.com/BrandenKeck/pytorc...
Very helpful:
github.com/oliverguhr/transfo...
github.com/ctxj/Time-Series-T...
github.com/huggingface/transf...
Attention Is All You Need:
arxiv.org/pdf/1706.03762.pdf

Пікірлер: 9
@mohamedkassar7441
@mohamedkassar7441 6 күн бұрын
Thanks!
@thouys9069
@thouys9069 Ай бұрын
nice man! it's these case studies that really generate insight. good stuff
@lets_learn_transformers
@lets_learn_transformers Ай бұрын
Thank you!
@jeanlannes4522
@jeanlannes4522 Ай бұрын
Hello man, great videos. Really helpful links. I have a question : do you pass every time series datapoint (for every single batch) through a linear layer? What is the intuition behind this "dimension augmentation" if I may call it this way ? I see a lot of Conv1D being used and am trying to understand how to perform a good embedding. I feel like most papers on TSF with transformers aren't clear on this matter.
@lets_learn_transformers
@lets_learn_transformers Ай бұрын
Hi @jeanlannes4522 - thank you! You are correct: each element of each time series is embedded "individually". Conv1D may be a better embedding approach for many (possibly most/all) problems. I used the linear approach because it was easy for me to understand, as it is almost an exact analog for word embedding with PyTorch's nn.Embedding() layer. The intuition (as far I understand) is that the model learns a vector representation for each individual "datapoint". When the datapoints are words in an NLP problem these vectors are a great measure of similarity between two words. For a problem with continuous data, this doesn't make as much sense because you could just as easily measure similarity with simple distance between two points. So, when the Linear layer learns something like 0.55 and 0.56 are similar, it's not as meaningful. One could argue that Conv1D is performing a similar task, but it is considering neighboring values in the embedding process, so it could generate "smarter" embeddings like 0.55 on an "increasing trajectory/slope" is different from 0.55 on a "decreasing trajectory/slope". This is something that I may try on my own now that you mention it! Do you mind sharing any sources where this is used if you have them on hand?
@jeanlannes4522
@jeanlannes4522 Ай бұрын
@@lets_learn_transformers Thanks for your answer. There is a philosophical question that remains : if every word has a meaning, does a single datapoint of a time series have one too ? Or only a sequence of these datapoints ? Should you tokenize your time series at the datapoint scale or at a few points scale to capture a little meaning (like a pattern, increasing, flat, decreasing, volatile etc.). ? But then how do you compress your data ? The question of multivariate time series remains (what if we have p features, p > 1 ?). One could argue that some words taken alone do not have a "meaning" (it, 's, _, ', .)... It is a difficult question. To get back to what you are doing, are you training the weights of your nn.linear(1,embed size) with the big transformer backprop ? Just to make sure I understand what you are doing. I am not sure if augmenting the dimension of a single datapoint makes sense. I really think you have to work with sub-windows of the original time series. But who knows.... I believe Conv1D is interesting too. Don't know if one is allowed to leak future neighboring values. But at least the past values can add meaning to the datapoint embedding as you say "increasing trajectory" added to a given value. The first time I read it was used was in MTS-Mixers: Multivariate Time Series Forecasting via Fac- torized Temporal and Channel Mixing and Financial Time Series Forecasting using CNN and Transformer.
@lets_learn_transformers
@lets_learn_transformers Ай бұрын
@@jeanlannes4522 I completely agree - thank you for a great discussion. The nn.linear weights are trained via backprop upstream from the Transformer Encoder. It is possible that this behaves ok because I'm using a very small Transformer - it is possible that the linear layer would be far too simple with a larger model. I ran some experiments on the sunspots data and found the two to be comparable - but since I'm not going in depth with hyperparameters or early stopping it's hard to tell how good the results are. Do you mind if I make a short follow-up video about this discussion? Would you like your name included / not included in the video?
@isakwangensteen6577
@isakwangensteen6577 Ай бұрын
When you say you extended the forecasting window, do you mean that the model now outputs more time step predictions or are you still just predicting one timestep into the future and unrolling the model for more days?
@lets_learn_transformers
@lets_learn_transformers Ай бұрын
Hi @isakwangensteen6577 - sorry for the lack of clarity. I mean that the model now outputs more time step predictions!
A Decoder-only Foundation Model For Time-series Forecasting
33:55
Gabriel Mongaras
Рет қаралды 2,5 М.
small vs big hoop #tiktok
00:12
Анастасия Тарасова
Рет қаралды 24 МЛН
КАРМАНЧИК 2 СЕЗОН 7 СЕРИЯ ФИНАЛ
21:37
Inter Production
Рет қаралды 399 М.
Did you believe it was real? #tiktok
00:25
Анастасия Тарасова
Рет қаралды 8 МЛН
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 220 М.
Can AI code Flappy Bird? Watch ChatGPT try
7:26
candlesan
Рет қаралды 9 МЛН
Transformer Attention (Attention is All You Need) Applied to Time Series
14:15
Let's Learn Transformers Together
Рет қаралды 758
I gave 127 interviews. Top 5 Algorithms they asked me.
8:36
Sahil & Sarra
Рет қаралды 608 М.
Time Series Forecasting with XGBoost - Advanced Methods
22:02
Rob Mulla
Рет қаралды 112 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 102 М.
Pytorch Transformers from Scratch (Attention is all you need)
57:10
Aladdin Persson
Рет қаралды 295 М.
small vs big hoop #tiktok
00:12
Анастасия Тарасова
Рет қаралды 24 МЛН