Scaling Synthetic Data Creation with 1 Billion Personas | PersonaHub Dataset Explained

  Рет қаралды 498

Argilla

Argilla

Күн бұрын

Welcome to another episode of Data Explorer by Argilla! 🎥🚀 In this episode, we’re diving into the Persona Hub dataset, introduced in the paper “Scaling Synthetic Data Creation with 1 Billion Personas” by Xin Chan et al from the Tencent AI Lab.
This dataset focuses on increasing the variety in synthetic datasets by using personas. By assigning a persona to a large language model (LLM), we can create more diverse and realistic responses to instructions. The paper proposes a method to create these personas from world knowledge and public texts from the web.
Resources:
- Dataset repo: huggingface.co/datasets/proj-...
- Notebook to upload to Argilla: colab.research.google.com/dri...
- Paper: huggingface.co/papers/2406.20094
- Argilla Instance: huggingface.co/spaces/argilla...

Пікірлер: 3
@DanielVilaSuero
@DanielVilaSuero 22 күн бұрын
Very cool!
@kevon217
@kevon217 20 күн бұрын
Really cool. Thanks for walking through the hub.
@argilla-io
@argilla-io 10 күн бұрын
Any time! We hope to do this more based on community feedback.
GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681
46:53
The TWIML AI Podcast with Sam Charrington
Рет қаралды 4,1 М.
MISS CIRCLE STUDENTS BULLY ME!
00:12
Andreas Eskander
Рет қаралды 19 МЛН
КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂
00:29
МЯТНАЯ ФАНТА
Рет қаралды 10 МЛН
IQ Level: 10000
00:10
Younes Zarou
Рет қаралды 7 МЛН
DEFINITELY NOT HAPPENING ON MY WATCH! 😒
00:12
Laro Benz
Рет қаралды 64 МЛН
A Complete Overview of Word Embeddings
17:17
AssemblyAI
Рет қаралды 103 М.
What is Synthetic Data? No, It's Not "Fake" Data
6:49
IBM Technology
Рет қаралды 30 М.
QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)
36:58
Water powered timers hidden in public restrooms
13:12
Steve Mould
Рет қаралды 487 М.
Nemotron-4 340B - Need to Make a LLM Dataset?
10:13
Sam Witteveen
Рет қаралды 10 М.
Stanford's FREE data science book and course are the best yet
4:52
Python Programmer
Рет қаралды 685 М.
AI vs ML vs DL vs Generative Ai
16:00
Krish Naik
Рет қаралды 37 М.
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 1,2 МЛН
Это Xiaomi Su7 Max 🤯 #xiaomi #su7max
1:01
Tynalieff Shorts
Рет қаралды 2,1 МЛН
#samsung #retrophone #nostalgia #x100
0:14
mobijunk
Рет қаралды 13 МЛН