Thanks for the good sample! Be aware about the limitations, though! The split_text_into_chunks() has a naive implementation. What if the placement of the 4000th falls into the middle of the word, or even the sentence? What if that word or sentence was a pivotal one for the meaning of the text?! The split should be smart enough to 1. Jump to the 4000th place, 2. See if it is the end of the paragraph (or sentence, or word) 3. Reduce the size of the chunk from the end till reaching an end of the paragraph. 4. Consider it an end of chunk. This easy improvement should improve the trustfulness of the implementation. Another consideration is worth disclosing the limitation of this approach: what if the PDF content has pictures the text refers to? We should probably switch to using a multi-modal Generative AI, like Bard, or something.
@DevSprout6 ай бұрын
Yep, in some cases the script we shared here wouldn't be enough, adding some additional functionality to handle these kinds of situations wouldn't be a bad idea. Thanks for sharing! 😊
@architkapoor25036 ай бұрын
Bucky Roberts from New Boston, is that you?
@siman2116 ай бұрын
Hey dude are you gonna remake your c++ tutorial from 12 years ago?