EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML (experimental)Abstract:We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. • We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI’s suite of agentic RL environments. • \corecraft{} is a fully operational enterprise simulation of a customer support organization, comprising over 2,500 entities across 14 entity types with 23 unique tools, designed to measure whether AI agents can perform the multi-step, domain-specific work that real jobs demand. • Frontier models such as GPT-5.2 and Claude Opus 4.6 solve fewer than 30% of tasks when all expert-authored rubric criteria must be satisfied. • Using this environment, we train GLM~4.6 with Group Relative Policy Optimization (GRPO) and adaptive clipping. • After a single epoch of training, the model improves from 25.37% to 36.76% task pass rate on held-out evaluation tasks.

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML (experimental)Abstract:We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI’s suite of agentic RL environments. \corecraft{} is a fully operational enterprise simulation of a customer support organization, comprising over 2,500 entit

Sources:

https://arxiv.org/abs/2602.16179