• RLVR scaling limited by scarce verifiable training signals, especially for complex logic tasks. • Logical reasoning offers formal constraints and programmatically checkable answers. • SSLogic introduces an agentic meta‑synthesis loop that evolves entire task families. • Generate-Validate-Repair cycle iteratively builds executable Generator-Validator pairs. • Multi‑Gate Validation combines consistency checks and adversarial blind review for reliability. • Dataset grew from 400 to 953 families, 21,389 instances, boosting benchmarks by up to +5.2.
Article Summaries:
- Computer Science > Artificial Intelligence [Submitted on 23 Jan 2026] Title:Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning View PDF HTML (experimental)Abstract:Scaling verifiable training signals remains a key bottleneck for Reinforcement Learning from Verifiable Rewards (RLVR). Logical reasoning is a natural substrate: constraints are formal and answers are programmatically checkable. However, prior synthesis pipelines either depend on expert-written code or operate within fixed templates/skeletons, which limits growth largely to instance-level perturbations. We propose
Sources: