INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

• INDUCTION benchmark tests finite-structure concept synthesis in first‑order logic across small relational worlds. • Models output a single logical formula that uniformly explains labeled target predicates in all worlds. • Three regimes-FullObs, Contrastive (CI), and Existential Completion (EC)-evaluate generalization and penalize formula bloat. • Experiments reveal sharp difficulty gradients, persistent hard structural families, and low‑bloat formulas generalize better on unseen worlds. • Elite recent models show distinct behaviors across tasks, hinting at varied concept‑generalization strategies. • The benchmark encourages open collaboration via arXivLabs, fostering community-driven advances in AI reasoning.

Article Summaries:

Summary

A new benchmark, INDUCTION, has been released to evaluate finite‑structure concept synthesis in first‑order logic. The task presents small relational worlds with labeled target predicates and requires models to produce a single logical formula that uniformly explains the target across all worlds, with correctness verified by exact model checking. The benchmark includes three regimes-FullObs, Contrastive (CI), and Existential Completion (EC)-and penalizes formula bloat. Experiments reveal sharp difficulty gradients and persistent hard structural families; low‑bloat formulas generalize better on unseen worlds. Recent top‑performing models exhibit distinct behaviors across tasks and metrics, suggesting varied strategies for concept generalization.

Sources:

https://arxiv.org/abs/2602.18956