Evaluating LLM-based Applications

Sponsored by: Prophecy | Build a Generative AI App on Enterprise Data in 13 Minutes

How to evaluate an LLM-powered RAG application automatically.

Schoolboy - Часть 2

小偷捂晕妈妈强行闯入室内，写作业的小朋友灵机一动用两个橙子成功吓跑小偷！#儿童安全教育 #防拐 #儿童安全#儿童自救

Little girl's dream of a giant teddy bear is about to come true #shorts

ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤

Evaluating LLM-based Applications

Рет қаралды 23,255

Databricks

Жыл бұрын

Evaluating LLM-based applications can feel like more of an art than a science. In this workshop, we'll give a hands-on introduction to evaluating language models. You'll come away with knowledge and tools you can use to evaluate your own applications, and answers to questions like:
- Where do I get evaluation data from, anyway?
- Is it possible to evaluate generative models in an automated way?
- What metrics can I use?
- What's the role of human evaluation?
Talk by: Josh Tobin
Here’s more to explore:
LLM Compact Guide: dbricks.co/43WuQyb Big Book of MLOps: dbricks.co/3r0Pqiz
Connect with us: Website: databricks.com
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc
Facebook: / databricksinc

Пікірлер: 10

@AnandShah-ds 8 ай бұрын

Evaluations aside, I really enjoyed the presentation. I was hooked. Great story-telling skills Josh. Thanks for sharing your experience. We count on volunteers like you to spread knowledge.

@vaishnavipatil3319

@vaishnavipatil3319 Жыл бұрын

Thank you for clearing this concepts. Would like to see more videos from you on evaluation frameworks, methods.

@ndamulelosbg8887

@ndamulelosbg8887 5 ай бұрын

This is an exellent coverage of the challenging task of llm evaluatuon

@asfandiyar5829

@asfandiyar5829 11 ай бұрын

Just what I was after. Thanks

@PJ-hi1gz Ай бұрын

Great talk, thanks for sharing

@ndamulelosbg8887

@ndamulelosbg8887 5 ай бұрын

"Your opininon on LLMs does not matter" - I found this to be a great quote

@bharath_v 8 ай бұрын

Good One!

@manishsharma2211

@manishsharma2211 11 ай бұрын

Good work

@SpartanPanda 10 ай бұрын

Great storyline

@threevia.travel

@threevia.travel 6 ай бұрын

Very generic, expected something more tangible! Sounds common sense which might work or might not work

Sponsored by: Prophecy | Build a Generative AI App on Enterprise Data in 13 Minutes

33:41

Sponsored by: Prophecy | Build a Generative AI App on Enterprise Data in 13 Minutes

Databricks

Рет қаралды 2,8 М.

How to evaluate an LLM-powered RAG application automatically.

50:42

How to evaluate an LLM-powered RAG application automatically.

Underfitted

Рет қаралды 15 М.

Schoolboy - Часть 2

00:12

Schoolboy - Часть 2

⚡️КАН АНДРЕЙ⚡️

Рет қаралды 3,2 МЛН

小偷捂晕妈妈强行闯入室内，写作业的小朋友灵机一动用两个橙子成功吓跑小偷！#儿童安全教育 #防拐 #儿童安全#儿童自救

00:59

小偷捂晕妈妈强行闯入室内，写作业的小朋友灵机一动用两个橙子成功吓跑小偷！#儿童安全教育 #防拐 #儿童安全#儿童自救

疯狂导演刘浩影

Рет қаралды 61 МЛН

Little girl's dream of a giant teddy bear is about to come true #shorts

00:32

Little girl's dream of a giant teddy bear is about to come true #shorts

Fabiosa Animated

Рет қаралды 10 МЛН

ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤

00:45

ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤

Chapitosiki

Рет қаралды 12 МЛН

Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

49:50

Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

MLOps.community

Рет қаралды 4,6 М.

What are AI Agents?

12:29

What are AI Agents?

IBM Technology

Рет қаралды 113 М.

From Eyeballing to Excellence: 7 Ways to Evaluate & Monitor LLM Performance

1:04:16

From Eyeballing to Excellence: 7 Ways to Evaluate & Monitor LLM Performance

WhyLabs

Рет қаралды 743

Data + AI Summit Keynote Day 1 - Ali Ghodsi, Co-founder and CEO of Databricks

28:02

Data + AI Summit Keynote Day 1 - Ali Ghodsi, Co-founder and CEO of Databricks

Databricks

Рет қаралды 32 М.

Advancements in Open Source LLM Tooling, Including MLflow

39:43

Advancements in Open Source LLM Tooling, Including MLflow

Databricks

Рет қаралды 6 М.

[1hr Talk] Intro to Large Language Models

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

Рет қаралды 2 МЛН

Why Fine Tuning is Dead w/Emmanuel Ameisen

50:07

Why Fine Tuning is Dead w/Emmanuel Ameisen

Hamel Husain

Рет қаралды 29 М.

Testing Generative AI Models: What You Need to Know

32:52

Testing Generative AI Models: What You Need to Know

Databricks

Рет қаралды 3,7 М.

Fine-tune Multi-modal LLaVA Vision and Language Models

51:06

Fine-tune Multi-modal LLaVA Vision and Language Models

Trelis Research

Рет қаралды 19 М.

"okay, but I want Llama 3 for my specific use case" - Here's how

24:20

"okay, but I want Llama 3 for my specific use case" - Here's how

David Ondrej

Рет қаралды 170 М.

Попробую поставить мать на монитор... Только вот с питанием и Тконом надо разобраться...

0:32

Попробую поставить мать на монитор... Только вот с питанием и Тконом надо разобраться...

Brother-live_mob

Рет қаралды 691 М.

Вот почему не стоит покупать телевизор! Артикул WB: 243002728

0:27

Вот почему не стоит покупать телевизор! Артикул WB: 243002728

Семейка Мамедовых

Рет қаралды 116 М.

ПОЛУЧАЮ IPhone за 1 РУБЛЬ!? ЧТО ЭТО НА САМОМ ДЕЛЕ? ПРОВЕРКА РЕКЛАМЫ

13:42

ПОЛУЧАЮ IPhone за 1 РУБЛЬ!? ЧТО ЭТО НА САМОМ ДЕЛЕ? ПРОВЕРКА РЕКЛАМЫ

EVG

Рет қаралды 134 М.

Nokia 3310 top

0:20

YT 𝒯𝒾𝓂𝓉𝒾𝓀

Рет қаралды 3,8 МЛН

Хакер взломал компьютер с USB кабеля. Кевин Митник.

0:58

Хакер взломал компьютер с USB кабеля. Кевин Митник.

Последний Оплот Безопасности

Рет қаралды 2,2 МЛН

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

0:24

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

Aman Soni Official

Рет қаралды 8 МЛН

iPhone 15 Pro в реальной жизни

24:07

iPhone 15 Pro в реальной жизни

HUDAKOV

Рет қаралды 489 М.