Intel has rolled out the first major software update for its Arc Pro GPUs under Project Battlematrix – the LLM Scaler v1.0.
Announced at Computex 2025, Battlematrix aims to be a comprehensive, enterprise-grade solution for inference workstations running multiple Arc Pro GPUs, and this update brings some seriously impressive gains.
The LLM Scaler v1.0 container delivers up to 80% performance uplift thanks to multi-GPU scaling, PCIe P2P transfers, and a refined Linux-optimized stack. Intel has added vLLM performance tuning for long input lengths, boosting 40K sequence length throughput by 1.8× on 32B KPI models and 4.2× on 70B models. There’s also a ~10% output throughput bump for 8B–32B models, plus by-layer online quantization to lower GPU memory demands.
Experimental features include pipeline parallelism, torch.compile
integration, speculative decoding, and embedding/rerank model support. Enhanced multi-modal capabilities, maximum length auto-detection, and full data parallelism support round out the list. For benchmarking and diagnostics, Intel has enabled OneCCL tools and added XPU Manager functions for GPU power tracking, firmware updates, memory bandwidth monitoring, and more.
Designed with industry standards and ease of deployment in mind, Battlematrix’s enterprise features include ECC, SR-IOV, telemetry, and remote firmware updates. Intel says a hardened version with improved vLLM serving will arrive later this quarter, followed by a full feature release in Q4.
While some enthusiasts are already imagining creative uses – from training custom game asset generators to building AI texture models – others are debating whether Intel’s hardware can truly compete with NVIDIA’s raw AI throughput. Regardless, LLM Scaler v1.0 marks a notable step forward in Intel’s AI GPU ambitions.