💡 There is a SMARTER way to split your documents for GenAI apps

  Рет қаралды 1,488

Bitswired

Bitswired

Күн бұрын

Learn semantic splitting in this hands-on tutorial to improve your language model's performance on document processing tasks.
We dive into a practical Python implementation for finding optimal segmentation points by meaning, essential for retrieval-augmented generation.
Code along with me following the GitHub-hosted notebook and elevate your app's efficiency with this smart splitting strategy.
GitHub Repo: github.com/bitswired/semantic...
🌐 Visit my blog at: www.bitswired.com
📩 Subscribe to the newsletter: newsletter.bitswired.com/
🔗 Socials:
LinkedIn: / jimi-vaubien
Twitter: / bitswired
Instagram: / bitswired
TikTok: / bitswired
00:00 Why Do We Split Documents?
02:02 Semantic Splitting: The Theory
05:06 Semantic Splitting: The Practice
11:28 Takeaways

Пікірлер: 15
@natevaub
@natevaub Ай бұрын
Great video bro, keep going with these fire topics!
@bitswired
@bitswired Ай бұрын
Thanks frero 💪🏽 Let’s gooooo! Let’s make it work and play Elden Ring soon ahah
@HassanAllaham
@HassanAllaham 28 күн бұрын
This is one of the most powerful videos related to AI I ever seen. Very clear, very informative, and very useful. Thanks for the good content 🌹🌹🌹
@bitswired
@bitswired 27 күн бұрын
Thank you very much for your kind words! It means a lot to hear that the video had such a positive impact on you and it makes all the effort worth it. Thanks again for watching and for taking the time to leave such a thoughtful comment 👍🏽
@cyberpunkdarren
@cyberpunkdarren 7 күн бұрын
Once all the vectors are loaded into the vector database the text splitting no longer matters. As long as you dont split on a compound word or phrase it doesnt really affect the vectorspace.
@bitswired
@bitswired 7 күн бұрын
Hey :) I see your point but I would say that in practice it’s not the case. For instance if you embed an entire page versus multiple smaller paragraphs the resulting vectors will be different even though you’ve indexed the same text. And it affects the similarity search. That’s why pyramidal embeddings are a way to improve RAG performance by indexing the data at different precision levels and using multiple index to answer queries.
@vogendo7377
@vogendo7377 Ай бұрын
Very interesting
@bitswired
@bitswired Ай бұрын
Thanks big boss ❤️
@mariegautier3765
@mariegautier3765 Ай бұрын
Love it ❤ You know how to transmit your passion, congrats 😍🦍🔥
@bitswired
@bitswired Ай бұрын
Merci Bella ❤️🦍🐆 EKIP au max!
@oryxchannel
@oryxchannel Ай бұрын
Good presentation but I do not understand how it's different from document AI's that can do this automatically. Why do this manually?
@bitswired
@bitswired Ай бұрын
Hey :) You’re right there are libraries that does it for you. However the purpose of the video was to understand how it works in depth, to do so I proposed a simple implementation from scratch. The goal was to help people grasp the concept. I hope you still enjoyed the video 😁
@MichaelScharf
@MichaelScharf 5 күн бұрын
Grat Video! But totally annoying music
@MichaelScharf
@MichaelScharf 5 күн бұрын
It makes is hard to understand you and it distracts from your great work
@MichaelScharf
@MichaelScharf 5 күн бұрын
If your video content would not be so great, I would have stopped watching
Python's 5 Worst Features
19:44
Indently
Рет қаралды 85 М.
Have You Picked the Wrong AI Agent Framework?
13:10
Matt Williams
Рет қаралды 33 М.
Чай будешь? #чайбудешь
00:14
ПАРОДИИ НА ИЗВЕСТНЫЕ ТРЕКИ
Рет қаралды 2,8 МЛН
Countries Treat the Heart of Palestine #countryballs
00:13
CountryZ
Рет қаралды 21 МЛН
I wish every AI Engineer could watch this.
33:49
1littlecoder
Рет қаралды 45 М.
👌🏽 AI Chat Cheaper & Faster with Semantic Caching
6:49
Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)
23:16
Nodematic Tutorials
Рет қаралды 18 М.
Training Your Own AI Model Is Not As Hard As You (Probably) Think
10:24
Steve (Builder.io)
Рет қаралды 427 М.
🏆 Rust is CHANGING Python for the better
9:31
Bitswired
Рет қаралды 2,6 М.
Rust's Most Important Containers 📦 10 Useful Patterns
17:11
Code to the Moon
Рет қаралды 113 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 104 М.
WHY IS THE STACK SO FAST?
13:46
Core Dumped
Рет қаралды 134 М.
Apple watch hidden camera
0:34
_vector_
Рет қаралды 59 МЛН
С ноутбуком придется попрощаться
0:18
Up Your Brains
Рет қаралды 423 М.
ВСЕ МОИ ТЕЛЕФОНЫ
14:31
DimaViper Live
Рет қаралды 60 М.