• Cortex‑X4 delivers 15% performance boost over Cortex‑X3, marking fourth generation with double‑digit IPC gains. • Achieves gains with under 10% area increase, keeping cache sizes identical to previous cores. • Front‑end overhaul: 10‑instruction wide pipeline, 10‑inst/cycle cache bandwidth, no macro‑op cache. • Branch predictor accuracy improved, reducing stalls and unifying misprediction penalty to 10 cycles. • Back‑end sees out‑of‑order enhancements and updated integer execution units for higher throughput. • Designed to exceed Cortex‑A720 performance in demanding workloads, targeting flagship mobile and edge devices.

Article Summaries:

  • Arm has unveiled the Cortex‑X4, its newest flagship performance core, promising a 15 % performance lift over the Cortex‑X3 while adding less than 10 % more area for the same cache sizes. The X4 targets peak workloads beyond the Cortex‑A720, delivering gains through extensive pipeline redesigns. Key changes include a widened front‑end that fetches up to 10 instructions per cycle, improved branch‑prediction accuracy, and a 20 % larger out‑of‑order buffer (384 entries). The back‑end now features two integer MAC units, an extra branch unit, and eight integer ALUs. Memory‑side updates add a new L1 temporal prefetcher, reduce bank conflicts, and enlarge the private L2 cache.

Sources: