No video

Anthropic's Latest Winner - Workbench

  Рет қаралды 18,494

Sam Witteveen

Sam Witteveen

Күн бұрын

In this video I go through Anthropic's latest release with their UI to generate prompts and then test and evaluate them to create a full testing suite for prompts and responses.
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/sam... (updated)
git hub.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:10 Claude Artifacts
00:19 Claude on Projects
00:39 Anthropic Workbench
01:48 Anthropic Workbench Demo

Пікірлер: 41
@erniea5843
@erniea5843 Ай бұрын
This feels like they took the ‘meta prompt’ idea and turned into a product. Nice overview!
@samwitteveenai
@samwitteveenai Ай бұрын
yeah I know they were working on some follow ups to Meta Prompt this is probably one of them.
@rickmarciniak2985
@rickmarciniak2985 Ай бұрын
Really appreciate that we can click a button to grab the code. Great content, Sam. Thanks!
@CodingtheFuture-jg1he
@CodingtheFuture-jg1he Ай бұрын
Great video as usual! The ability to have LLMs help you create prompts that get superior results from the LLM is sooo useful 🎉 Add the ability to manage and test and score your prompts and you solve a major challenge with getting Great results from Gen AI. Well done 👏
@adriandewinter7262
@adriandewinter7262 Ай бұрын
This is really great and will definitely help to move testing outside of where you are developing your larger solutions to tinker out bugs, especially in agentic setups where you have many different agents performing different tasks (like in Langgraph)
@vertigoz
@vertigoz Ай бұрын
3:10 That's one of the best usages given to content creators imo, allowing them to have a constant feedback between the swarm ai we are and the creator, back and forth
@technovangelist
@technovangelist Ай бұрын
This is pretty cool. I did something similar in one of my videos on ollama a few months ago and forgot about it. Perhaps I need to go back and put a pretty face on it. mine was inspired by the way we graded talk proposals for devops days seattle and boston. thanks for posting this
@1MinuteFlipDoc
@1MinuteFlipDoc Ай бұрын
awesome information! thank you! i've cut over to using claude 3.5 sonnet as well!
@DarrenAllatt
@DarrenAllatt Ай бұрын
Genius way to build a training dataset to make improvements to existing and future foundation models
@Dillonvu
@Dillonvu Ай бұрын
Very very useful to add this to their playground! We've been using their notebook prompt generator for awhile. Hope they eventually port the bit about reworking the prompt based on examples (or import the test cases with grade to rework the model)! Also hope they can add in the target model too (we badly need this for Haiku size models)
@RoryDavidWatts
@RoryDavidWatts Ай бұрын
Thanks for the video! Great rundown.
@Canna_Science_and_Technology
@Canna_Science_and_Technology Ай бұрын
At first, I was a little nervous. I was thinking I’m doing this applications work so maybe I’m doing something right until you said that you were right. Scared for a second lol
@babusivaprakasam9846
@babusivaprakasam9846 Ай бұрын
Straight to the point. Loved it.
@paraconscious790
@paraconscious790 Ай бұрын
This is amazing intro Sam!!!
@carlkim2577
@carlkim2577 Ай бұрын
I bet they are using Sonnet to help them build up a big load of features now and into the future. Clever!
@____2080_____
@____2080_____ Ай бұрын
2:41 👀 Whoa. This is good.
@ameroamigo1
@ameroamigo1 25 күн бұрын
Can the prompts in Workbench take data inputs from databases or spreadsheets? Seems your example relied on manual entry.
@micbab-vg2mu
@micbab-vg2mu Ай бұрын
3.5 is a great model :) - you just need more advance prompting techiques to get amazing results :)
@flwi
@flwi Ай бұрын
This video was useful but was a bit long ;-) Just kidding! Forget what I said and tell the author that I liked the video!
@SwapperTheFirst
@SwapperTheFirst Ай бұрын
Hi Sam, thanks for the interesting video. I do agree with you that CS 3.5 is the best model out there. And their side-by-side artifacts view is really good. Here I don't agree, even though I've done basically the same thing to "finetune" the prompt for Gemini and CS. The problem with this it feels very limited, while you can do a very sophisticated system, based on CSV to fine-tune prompts and version-control test cases and prompts. In another words, this tool is good for simple use cases, but for anything more advanced or specific you need to build your own simple app. With a help of CS 3.5, of course :) Keep up a great work! (just in case you're already using the YT comment analyzer, thus it will bump my comment's sentiment score up). Just kidding, seriously great work.
@clray123
@clray123 Ай бұрын
It's just a little piece of UI, which IMO should not be used if you want to avoid vendor lock-in (as you should). Just roll your own UI like this if you find it useful.
@alchemication
@alchemication Ай бұрын
Sonnet 3.5 rocks atm. Only the reliability is not yet on par with OpenAI, as we see quite a bit of 500’s at scale.
@samwitteveenai
@samwitteveenai Ай бұрын
You can solve this often by running on GCP or AWS. Both have this model
@alchemication
@alchemication Ай бұрын
@@samwitteveenai we tried bedrock and latency was massive, like 3x antrhropic, but i think it’s all teething stuff, and eventually reliability and availability across more data centres won’t be an issue. Nevertheless my model of choice for any sophisticated stuff atm
@olimiemma
@olimiemma Ай бұрын
What does GPT, Google, or Meta have that can do something like this or similar, or basically approach the problem this solves? Or is this an entirely new thing that Anthropic has brought into existence?
@tornyu
@tornyu Ай бұрын
7:58 is that a challenge? 😅
@husanaaulia4717
@husanaaulia4717 Ай бұрын
Feels like Claude want to make MoE and use this as for training 🤔
@IdPreferNot1
@IdPreferNot1 Ай бұрын
Great content... nearly spit out my coffee when model generated its second test! I'm feeling kind of chumpish sticking with OAI at this point, Like its willingness to do LONG SESSION copy paste, test, repeat, long context programming but maybe its time to switch?
@jnevercast
@jnevercast Ай бұрын
Are you saying you're doing more copy paste with Claude over ChatGPT?
@IdPreferNot1
@IdPreferNot1 Ай бұрын
@@jnevercast No. I've been able to do that without rate limit problems using the paid OAI. Claude sessions seemed a little stricter for use even in paid, but given the improvement over 4o, was looking at maybe moving base account to Anthropic.
@jnevercast
@jnevercast Ай бұрын
@@IdPreferNot1 ah yep, I'm using paid Claude and sure enough the rate limits are lower. But I also get a lot done with fewer tokens with Claude. I frequently hit rate limit on free GPT, I'm not sure how much more requests paid would give me, but at that point I just use an API key
@ShamusMac
@ShamusMac Ай бұрын
People are still using ChatGPT? And... people are still piping data to OpenAI after their recent hirings?
@agentred8732
@agentred8732 Ай бұрын
Newbie Question: It produces code, but how can you press that code into service? Thanks!
@tylerislowe
@tylerislowe Ай бұрын
ask claude, ask it sincerely with no ego (example "that code looks good claude, but can you help me to get it running? i am not a programmer or coder and i will need step by step instructions with links to downloads if possible to hand hold me through this process on my windows 10 machine") and it will take you very far. ask about everything you are unsure about (example "you said i need to choose an IDE claude, but what is that and what is its purpose? i am day one on learning this stuff so this is all very confusing"). claude helped me revert my python install from where i tried to use chatgpt to do projects but was stumbling about, didnt set system paths properly etc and it did it like i was being helped by a patient school teacher that cares about their job and your success. you just have to break through the barrier of not believing you can figure this stuff out and you will succeed. i have went from no functional coding ability (still can't write code unassisted whatsoever) to having a functional eBay inventory system for work, built a eBay search script to search valuable items for misspellings to find items hidden because of bad titles and keywording, made scripts that take an eBay report (basically a spreadsheet of sales or shipments and does useful transformations to them) and many more projects. my boss thinks i am a computer science wizard, and i like to think i kinda am. using claude as my "spellbook", i can cast many spells that i thought could only be ideas. believe in your ability and in claude as a fantastic tool to help you realize your ability.
@therainman7777
@therainman7777 Ай бұрын
It depends on what you mean by press into service. If it’s just for your personal use, you will need to install some components on your personal computer depending on the language the code is written in (for example, installing Python on your machine). Once you have these components installed you can run the code locally. If you want it to be accessible to the world, you will need to either bundle it into a web app or native app, or create a web API to allow users to interact with it over the internet. Both will require significant additional work, but Sonnet 3.5 can definitely help with those tasks as well.
@daniellee4752
@daniellee4752 Ай бұрын
is there an open source equivalent of this?
@hqcart1
@hqcart1 Ай бұрын
what if the comment is *adult content* or telegram scam, if following specific prompt, i think it's not as efficient as a generic one...
@clray123
@clray123 Ай бұрын
Clap, clap - nice example of how to implement automated censorship/user bashing/discrimination into your online products. (I guess this comment would be classified as unworthy due to "excessive negativity".)
Anthropic's Meta Prompt: A Must-try!
12:34
Sam Witteveen
Рет қаралды 91 М.
Prompt Poet - Character AI's Prompting Framework
20:16
Sam Witteveen
Рет қаралды 6 М.
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Mini Katana
Рет қаралды 33 МЛН
路飞太过分了,自己游泳。#海贼王#路飞
00:28
路飞与唐舞桐
Рет қаралды 42 МЛН
Little brothers couldn't stay calm when they noticed a bin lorry #shorts
00:32
Fabiosa Best Lifehacks
Рет қаралды 17 МЛН
Why Agent Frameworks Will Fail (and what to use instead)
19:21
Dave Ebbelaar
Рет қаралды 41 М.
The 4 Big Changes in LLMs
14:25
Sam Witteveen
Рет қаралды 17 М.
Building a RAG Pipeline with Anthropic Claude Sonnet 3.5
25:41
Richmond Alake
Рет қаралды 5 М.
Mesop - Google's New UI Maker
14:04
Sam Witteveen
Рет қаралды 77 М.
The SECRETS Of Successful Software Architects
10:56
Continuous Delivery
Рет қаралды 12 М.
Claude 3.5 and aider: Use AI Assistants to Build AI Apps
22:37
Coding the Future With AI
Рет қаралды 12 М.
Elon Musk STUNS The Industry With GROK 2
17:53
TheAIGRID
Рет қаралды 24 М.
What is an LLM Router?
9:16
Sam Witteveen
Рет қаралды 27 М.
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 46 М.