Рет қаралды 498
Welcome to another episode of Data Explorer by Argilla! 🎥🚀 In this episode, we’re diving into the Persona Hub dataset, introduced in the paper “Scaling Synthetic Data Creation with 1 Billion Personas” by Xin Chan et al from the Tencent AI Lab.
This dataset focuses on increasing the variety in synthetic datasets by using personas. By assigning a persona to a large language model (LLM), we can create more diverse and realistic responses to instructions. The paper proposes a method to create these personas from world knowledge and public texts from the web.
Resources:
- Dataset repo: huggingface.co/datasets/proj-...
- Notebook to upload to Argilla: colab.research.google.com/dri...
- Paper: huggingface.co/papers/2406.20094
- Argilla Instance: huggingface.co/spaces/argilla...