Published on
Abstraction and Reasoning Corpus

Test Time Training for Abstract Reasoning

This paper explores the use of Test-Time Training (TTT) to enhance the abstract reasoning capabilities of large language models (LLMs), specifically focusing on the Abstraction and Reasoning Corpus (ARC) benchmark. The authors argue that dynamically updating model parameters during inference, using a loss derived from the input data, can significantly improve performance on novel reasoning tasks.

Targeted Fine-tuning and Data Augmentation are Crucial

Pre-training the LLM on synthetic tasks similar to ARC, combined with a novel "leave-one-out" data augmentation strategy during TTT, proved essential for effective performance gains. This strategy involves creating new training examples by iteratively omitting one example from the original training set and applying various invertible transformations (rotations, flips, color permutations, etc.) to the remaining examples.

Per-Instance Adaptation Improves Performance

Training task-specific adapters using Low-Rank Adaptation (LoRA) for each ARC task significantly outperformed using a shared adapter across all tasks. This individualized training allows the model to specialize its parameters for each unique reasoning problem.

Augmented Inference with Self-Consistency Enhances Predictions

An augmented inference strategy, involving applying invertible geometric transformations to the input and aggregating predictions through a hierarchical voting scheme, further boosted accuracy. This approach leverages the inherent symmetries within ARC tasks to generate multiple prediction candidates and select the most consistent ones.

Source(s):

Keep reading

Related posts