Рет қаралды 63,607
Swin Transformer paper explained, visualized, and animated by Ms. Coffee Bean. Find out what the Swin Transformer proposes to do better than the ViT vision transformer.
📺 ViT explained: • An image is worth 16x1...
📺 Transformer explained: • The Transformer neural...
📺► Positional embeddings (playlist): • Positional encodings i...
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, Yannik Schneider
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Paper discussed:
📜 Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin transformer: Hierarchical vision transformer using shifted windows." arXiv preprint arXiv:2103.14030 (2021). arxiv.org/abs/2103.14030
💻 Swin Transformer code on GitHub: github.com/microsoft/Swin-Tra...
Outline:
00:00 Problems with ViT / Swin Motivation
04:16 Swin Transformer explained
06:00 Shifted Window based Self-attention
08:58 positional embeddings in the Swin Transformer
09:29 Task performance of the Swin Transformer
Music 🎵 : Bay Street Millionaires by Squadda B
---------------------
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
KZfaq: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Video and thumbnail contain emojis designed by OpenMoji - the open-source emoji and icon project. License: CC BY-SA 4.0 16x16 pixels comprehensible artificial intelligence