Рет қаралды 2,814
The study by Google Research investigates the impact of fine-tuning large language models (LLMs) with new factual knowledge and its potential to induce hallucinations. Specifically, it explores whether introducing new, previously unknown information during the fine-tuning process leads to the generation of factually incorrect responses by the models. The concern is that LLMs might learn to generate information that is not grounded in their pre-existing knowledge, increasing the likelihood of hallucinations.
A. Methodology
The researchers designed a controlled setup focused on closed-book question answering (QA) to study the effect of new knowledge. They varied the proportion of fine-tuning examples that introduced new knowledge (Unknown examples) versus those consistent with the model's pre-existing knowledge (Known examples). The methodology involved:
Dataset Construction: Using ENTITYQUESTIONS, which consists of factual triplets from Wikidata converted into QA pairs.
Categorization: Introducing a hierarchical system (SliCK) to classify fine-tuning examples into four categories based on the model's knowledge: HighlyKnown, MaybeKnown, WeaklyKnown, and Unknown.
Evaluation: Measuring the model's performance on test sets with varying proportions of Unknown examples, and analyzing the impact on hallucinations and knowledge integration.
B. Main Results and Insights
The study yielded several significant findings:
Integration of New Knowledge: LLMs struggle to integrate new factual knowledge through fine-tuning. Unknown examples are learned significantly slower than Known examples, indicating difficulty in incorporating new information.
Induction of Hallucinations: As the model learns new knowledge through fine-tuning, there is a linear increase in its tendency to hallucinate. This suggests that exposure to new knowledge can indeed encourage the generation of factually incorrect responses.
Role of Early Stopping: Implementing early stopping during fine-tuning minimizes the risk of hallucinations. This approach prevents the model from overfitting to Unknown examples, which are primarily responsible for inducing hallucinations.
Importance of MaybeKnown Examples: Fine-tuning with a mix of Known categories, particularly MaybeKnown examples, enhances the model's ability to utilize its pre-existing knowledge effectively. This balanced approach yields better overall performance compared to fine-tuning solely on HighlyKnown examples.
C. Insights
The study provides crucial insights into the fine-tuning process of LLMs:
Risk Management: Introducing new factual knowledge during fine-tuning carries the risk of increased hallucinations. To mitigate this, strategies such as early stopping and filtering out Unknown examples can be effective.
Knowledge Utilization: LLMs primarily acquire factual knowledge during pre-training, while fine-tuning is more effective for optimizing the use of this knowledge rather than integrating new facts.
Practical Implications: For practical applications, it is essential to carefully design fine-tuning datasets and monitor training dynamics to balance the benefits of new knowledge with the risks of hallucinations.
In summary, the study highlights the challenges and risks associated with fine-tuning LLMs on new knowledge, emphasizing the need for careful management of the fine-tuning process to maintain model accuracy and reliability.
[text generated by GPT4o]
All rights w/ authors:
arxiv.org/pdf/...
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
#airesearch
#newtech
#insights