“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans
Jun 23, 2024
auto_awesome
Researcher Johannes Treutlein and ML expert Owain Evans discuss LLMs' ability to infer latent information for tasks like defining functions and predicting city names without in-context learning. They showcase how LLMs can carry out tasks by leveraging training data without explicit reasoning.
LLMs can infer latent information from training data for downstream tasks without in-context learning, demonstrating out-of-context reasoning capabilities.
Inductive out-of-context reasoning in LLMs raises AI safety concerns due to unmonitored acquisition of sensitive information and potential risks of deception.
Deep dives
Inductive Out-of-Context Reasoning in LLMs
LLMs can infer latent information from training data and utilize it for downstream tasks without in-context learning. Experimental results show that LLMs, fine-tuned on specific data like distances between cities, can deduce latent information such as the identity of unknown cities like Paris. Although effective in some cases, inductive out-of-context reasoning (OOCR) is shown to be unreliable, particularly with smaller LLMs tackling complex structures.
AI Safety Implications
The abundance of potentially hazardous information in LLM training data raises concerns regarding safety measures. Inductive OOCR challenges traditional monitoring methods as LLMs can gather implicit knowledge without explicit prompts, posing risks of unmonitored acquisition and usage of sensitive information, potentially deceiving human overseers.
Relevance and Mechanisms of Inductive OOCR
Inductive OOCR's relevance in AI safety pertains to scenarios involving dangerous capabilities and loss of control. The study proposes the need to understand the mechanisms underlying OOCR, such as learning latent values through variable embeddings. Future investigations aim to explore such mechanisms for real-world implications of inductive OOCR.
1.
Exploration of LLMs' Inductive Out-of-Context Reasoning Abilities
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-context reasoning in LLMs. We show that LLMs can infer latent information from training data and use this information for downstream tasks, without any in-context learning or CoT. For instance, we finetune GPT-3.5 on pairs (x,f(x)) for some unknown function f. We find that the LLM can (a) define f in Python, (b) invert f, (c) compose f with other functions, for simple functions such as x+14, x // 3, 1.75x, and 3x+2.
Paper authors: Johannes Treutlein*, Dami Choi*, Jan Betley, Sam Marks, Cem Anil, Roger Grosse, Owain Evans (*equal contribution)
Johannes, Dami, and Jan did this project as part of an Astra Fellowship with Owain Evans.
Below, we include the Abstract and Introduction from the paper, followed by some additional discussion of our AI safety [...]