so what's actually better about this compared to whisper?
@vikrantkhedkar6451Күн бұрын
This is a really important stuff
@cyberpunkdarrenКүн бұрын
It sounds terrible. And clearly they are lying about "10 million hours". Just two chinese guys trying to rip you off.
@NoidoDev9 сағат бұрын
It doesn't. Compared to what? For what price?
@silvias4808Күн бұрын
OMG your point is right on the spot! That's exactly the problem I had to deal with in my project
@BreezyVenoM-di1hr2 күн бұрын
what version of chroma db you were using back then??
@puremajik2 күн бұрын
Thank you this was very instructive. Can you recommend the best libaries for : 1) sectioning a document based on topic changes, 2) summarizing each section while maintaining contextual continuity and coherence, and 3) combining the summaries into a cohesive final summary? I'm thinking something like transformers (Hugging Face), spaCy, Gensim, pandas?
@tvaddict64912 күн бұрын
I have a request. Can you please explain the Customer Support Bot that is an example in Langgraph documentation? Or if you could simplify some of the stuff from that tutorial so langchain beginners, who know agents and tools can follow the tutorial? I find that official langgraph tutorial video on YT extremely lacking.
@unclecode2 күн бұрын
The fact that it allows you to get a random speaker sample and then hold it for later use is very intriguing. It's something I wished for the first time I encountered the ElevensLab and similar platforms like Suno. Additionally, training the model with those extra tokens is another interesting feature that's challenging to achieve in ElevensLab. Now, I'm curious about the format of the data when you get that random speaker. Is it a tensor? If so, can you perform arithmetic on it? For example, if you have a sample representing a happy speaker, could you add or subtract it to/from other sounds? This could lead to some fascinating applications, similar to how you can manipulate word embeddings (e.g., "king" - "male" + "female" = "queen"). I'll definitely take a closer look at this. Thanks for sharing!
@PestOnYT2 күн бұрын
I'm using mbrola for my TTS in my home automation. Though it is outdated, the quality is still the best of the tools I've seen so far. This ChatTTS looks very promising.Which version did you use? Currently it is at 0.0.5 and that doesn't work the way you described it. Not even with the code sample shown on HF. The keyword "compile" is not in chat.load_models. The chat.sample_random_speaker doesn't exist either. I've used it with python 3.11. BTW: Would be nice if it could understand Speech Synthesis Markup Language (SSML). If anybody knows a similar TTS which does, drop me a hit please. :)
@mushinart2 күн бұрын
Any recommendations for arabic embeddings?
@aa-xn5hc2 күн бұрын
Thank you! 😃
@zmeta82 күн бұрын
can't get through building of pynini..perhaps it depends on some specific version of openfst.
@WillJohnston-wg9ew2 күн бұрын
One thing I can't figure out is why no one is talking about how using ChatGPT 4o or for that matter Gemini, to do coding leads to all kinds of poor results and errors. Based on my experience of using LLMs to code, I would say there is a long way to go before reliability is there for real world products.
@Aidev78762 күн бұрын
I'd be careful of a model from China without any people responsible behind it.. last time we had that was in 2019 lol
@aaronward91402 күн бұрын
what happened in 2019?
@fkxfkx2 күн бұрын
I think We're going to start seeing a lot more disguised threats coming out of china.
@Aidev78762 күн бұрын
@@aaronward9140 the China virus. Covid. Remember?
@Aidev78762 күн бұрын
@@aaronward9140 covid
@Psychopatz2 күн бұрын
can you please explain what you've found ? thanks
@yasminesmida-qc9ce2 күн бұрын
can i use any other open source llm model?which one do you recommend
@MukulTripathi2 күн бұрын
Have you looked into some comparisons of it with OpenVoice v2?
@buckyzona3 күн бұрын
great!
@patricktang33773 күн бұрын
Q is abbreviation for "Question"; and "Wen" is the PinYin for 问, which is the character "Question" in Chinese. This LLM was trained by Chinese tech giant company Alibaba (similar to AWS), and Simplified Chinese is the core for multilingual base in this model. It is interesting that Simplified Chinese is not included in the language chart in your video. 🤔
@touchthesun3 күн бұрын
Thanks, this is great stuff. I've been teaching myself to build agents in langchain for some months and it is slow going. I think I need to step back and re-architect to use LangGraph instead. Looking forward to seeing more of your material on this stuff!
@dr.mikeybee3 күн бұрын
I've been very disappointed with all these frameworks. They do somethings well, but they all handle context assembly very badly, and I think context assembly is the most critical part of agent building.
@alenjosesr31603 күн бұрын
Hi, can you do ollama function calling video?
@pokeastuff3 күн бұрын
Which version of Phi 3 are you using? I'm having trouble replicating your results for the structured_output example as Phi 3 is not returning any "tool_calls".
@unhandledexception19483 күн бұрын
Another fantastic and super educational video on this timely topic, and great framework. Always so much to learn from these videos....
@wonderplatform3 күн бұрын
what's the latency?
@vivekpatel27364 күн бұрын
@samwitteveenai can i get the image as a output based on the questions if yes how can I do it ?
@anubhavsarkar12384 күн бұрын
Hello. Can you please make a video on how to use the SeamlessM4T HuggingFace model with langchain ? Particularly for text to text translation. I am trying to do some prompt engineering with the model using Langchain's LLMChain module. But it does not seem to work ...
@al.gharaee4 күн бұрын
Tanx a lot. I wanted to say that the background scenes can be somehow distracting.
@TomGally4 күн бұрын
Note that the model has been censored for topics that are sensitive to the Chinese government. If you try asking it about the sovereignty status of Tibet, the independence of Taiwan, the Tiananmen incident, etc., it will throw an error in most cases. I observed this when testing the 72B instruct chat version on HuggingFace.
@susdoge37674 күн бұрын
useful,subscribed
@user-zc6dn9ms2l5 күн бұрын
Do not enable hf transfer . If you can not wait , do not use it . Or get full fiber optic connection and hope ms algoritm Will remember you aint on cable . I believe its called reno . Ms still seem to have issue scaling up speed of bandwith
@rupjitchakraborty80125 күн бұрын
This is such a great intro, thank you so much for the effort.
@MeinDeutschkurs5 күн бұрын
It is really good in creative writing if you prompt it to plan the story first. I got amazing results with the phrase “rich and enhanced narration”. It was also fun to play with untypical text strings based on patterns as an output format.
@samwitteveenai4 күн бұрын
Interesting I will try that out.
@xhan36745 күн бұрын
pronounced as "qian wen". Q stands for "Qian"(meaning "thousands"), "wen" means "ask". So Qwen in Chinese means "thousands of questions"
@samwitteveenai5 күн бұрын
Thanks !!
@unclecode5 күн бұрын
Amazing review. All the cool features and improvements in math and coding aside, I'm extremely happy to see we have an Apache model that sounds like all of us! Covering all main language regions has always been my number one concern. AI models were becoming extremely selective, and we know that when we lose genetic diversity, we face extinction. No matter our cultural background, our existence is deeply interconnected with all other cultures on planet Earth, even those we haven't yet heard of. Languages are like musical notes; losing even one changes the entire symphony. In my opinion, language defines us, and if AI is to preserve our essence, language models must reflect this diversity. The lack of linguistic diversity means a lack of humanity and, eventually, a form of cultural extinction. For me, this development is amazing, and I agree, Groq should really bring this into the game. I'm going to share your video on my X and draft a post about it. Thx for your video that motivated me to write about this topic :)
@AnthonyGarland5 күн бұрын
hmmm. my hobby is pizza as well. :)
@tjchatgptgoat5 күн бұрын
The production for this model started in China - I'm out.
@samwitteveenai5 күн бұрын
genuinely curious. why? This is made by Alibaba Cloud which is a very capitalist org.
@tjchatgptgoat5 күн бұрын
@@samwitteveenai Alibaba is no longer controlled by the original founder of Alibaba is controlled by the Chinese government. When you install this model on your computer you're basically telling every hacker in the Chinese government come steal your stuff. I love your channel but I'm sitting this one out.
@husanaaulia47174 күн бұрын
@@tjchatgptgoatIt shouldn't possible, model doesn't have access to the computer. Model isn't executable 🤔. Or I might wrong
@tjchatgptgoat4 күн бұрын
Just think about it for a second theyre giving us all of these large language models for free? It's because we the product. They have basically open source you and I not the model we're the ones providing the use cases. Now through China in a mix and your hecked to death. Hacking is what they do
@drsamhuygens3 күн бұрын
@@husanaaulia4717you are right. TJ is being paranoid.
@Viewable115 күн бұрын
Qwen 1.5 and 2.0 models appear to be optimized for tasks in the STEM areas, whereas other models appear to be optimized for creative writing and conversation. Qwen 2 7B instruct reached 79.9 on HumanEval, which is very impressive for a non-coding specific model of that size. Can't wait for a coding optimized version of Qwen 2. The strongest open source coding model is currently Codestral 22B before Llama 3 70B instruct and Mixtral 8x22B instruct
@ShawnThuris5 күн бұрын
Hey Sam, thanks again for putting out these videos. I think you may have a noise gate set too high on your voice as we occasionally lose trailing syllables in your videos. If you have a lot of background noise to exclude, look for a hysteresis adjustment on the gate -- then you can set the level needed to keep the gate open once it's open. If there's no hysteresis setting, the next best is to set a slower release time.
@samwitteveenai5 күн бұрын
yeah for some reason the noise reduction went crazy on this recording, even though it is the same as what I normally use. I think this could be due to Descript shipping a new version. Looking for alternatives today.
@eightrice5 күн бұрын
"democratizing AI" does not mean multi-lingual
@samwitteveenai5 күн бұрын
I agree it shouldn't only mean multi-lingual. For me it should mean making AI (models, frameworks and ideally hardware) more accessible to people which is hard to do without making things more multi lingual when the majority of the world doesn't speak English. I am curious what do you take it to mean?
@eightrice5 күн бұрын
@@samwitteveenai Decentralized training and inference so that the people at large can own the weights. We do that using a smart consensus protocol along with an architecture of incentives (a.k.a. economy). That way, the people can retain economic relevance. It also helps with safety and alignment.
@davidw86685 күн бұрын
BTW because you mentioned Together and a missing hosted version, you might wanna check openrouter, usually they're super fast making new models available.
@davidw86685 күн бұрын
So, it was trained on the gsm8k dataset we can presume don't we? What i noticed for previous versions anecdotally, is that it did pretty well in various languages in terms of style and tone of the particular language.
@samwitteveenai4 күн бұрын
I was wondering the same. I suspect they may have made a similar dataset rather than actually train on GSM8K.
@7hunderbird5 күн бұрын
I would humbly ask, please don’t put in long sections of black video for too long. It makes me think YT has bugged out. I would suggest a neutral tone color other than black. Thanks for your work on these informative videos! Keep it up. ❤
@MudroZvon5 күн бұрын
scotophobia?
@7hunderbird5 күн бұрын
@@MudroZvon no. Just was listening to the video while I took notes and the long black pause threw me off my rhythm. It’s dark for a substantial time of almost 30 seconds. I’ve had YT simply stop sending video before while continuing to play audio. And it was optional feedback overall.
@GoldenBeholden5 күн бұрын
I wonder how much performance you can get out of a 7B model like this using an approach similar to Anthropic's recent Monosemanticity paper. Are the answers these benchmarks are looking to be found encoded somewhere in the model given the right biases during inference, or do we really need those very large models after all?
@samwitteveenai4 күн бұрын
I certainly think a lot tricks are showing that these small models can do a lot more than originally thought. The game is not just about size alone.
@Nick_With_A_Stick5 күн бұрын
I had used it on my custom json mode benchmark, and I got it down to a .3% failure rate(after changing the multi-turn prompts to “json objects not detect, please try again differently”). It really likes to follow json mode, I would recommend if you need a model with consistent json mode. I only think it ever failed due to out of context. Once lm studio supports llama.cpp’s quantized kv cache, then I can finally use 128k context length.
@MukulTripathi5 күн бұрын
I do need a model to do that, can you share a git repo with your code to share how you achieved this?
@Nick_With_A_Stick5 күн бұрын
@@MukulTripathi sorry can’t its an active project I’m writing a paper over; however, if you use LM studio, you can use open ai API. Letting you set the response format to “json_object” and at the end of the prompt include “Return response in json format with the following json objects “‘whatever you want:’ ‘just an example:’ ‘as is this one:’” Then if you want my results, you make the script parse the json format into a new json file; if it fails, resend it to the model with a new prompt I had in my original comment, with a limit of 3 times. Also if you want a GGUF with more than 32k max context you can use the 128k 5bit qwen 2 7b gguf I made its under my HF profile “Vezora”.
@8ballAI5 күн бұрын
I'm working on json and structured text lmk outputs. You're much further ahead with that failure rate. Please share your GitHub if possible thx
@samwitteveenai4 күн бұрын
Good comment. It makes sense that it would be good at the JSON given the reasoning strength. Thanks for pointing this out.
@Nick_With_A_Stick4 күн бұрын
@@samwitteveenai awesome :)! YT deleted my second comment, but if you want to use a GGUF with more than 32k context, I uploaded qwen 2 7b 128k to my HF page: Vezora. And I learned if you put the code first, then the system prompt at the end, it tends to work even better (assuming due to needle in the haystack). My benchmark did 6800/6860. At the end. Very very impressive, and handled it self at higher context.
@munchaking18965 күн бұрын
Problem is its very censored and is useless for web browsing.
@xuantungnguyen97196 күн бұрын
Amazing as always. What are some better models (both open or closed source)? Thanks Sam
@tvaddict64916 күн бұрын
Thank you for going through the notebooks line by line. Helps noobs like me follow along.
@sanwellbeatz16307 күн бұрын
I like how you pronounce langchain
@am0x018 күн бұрын
I've been testing some agriculture stuff, maybe fine-tuning this model with roboflow datasets and see. 🤔
@MukulTripathi8 күн бұрын
9:15 it actually got the sunglasses the first time itself. If you read it again you'll see it. You missed it :)