Qualcomm’s new AI systems promise 10x bandwidth, lower power use

In a move that could reshape the future of data center AI performance, Qualcomm Technologies has launched its next-generation inference-optimized solutions: the AI200 and AI250 accelerator cards and racks. These systems mark a significant leap in Qualcomm’s push to bring scalable, power-efficient, and high-performance generative AI to global enterprises. The new lineup, which builds on Qualcomm’s neural processing unit (NPU) technology leadership, promises rack-scale performance with superior memory capacity. Qualcomm says the goal is clear: deliver fast, cost-efficient generative AI inference while maximizing performance per dollar per watt — a critical metric in modern AI infrastructure. Powering generative AI at scale At the heart of this announcement is the Qualcomm AI200, a purpose-built rack-level AI inference solution optimized for large language and multimodal model workloads. Each AI200 card supports an impressive 768 GB of LPDDR memory, enabling high scalability and flexibility for handling massive AI inference demands. By offering a lower total cost of ownership (TCO), Qualcomm aims to make deploying generative AI models more accessible for data centers seeking efficiency without compromise. The AI250 takes that ambition further. It debuts with a new near-memory computing architecture, which Qualcomm says offers over 10x higher effective memory bandwidth and drastically reduced power consumption. This innovation enables disaggregated AI inferencing, allowing hardware to be used more efficiently while still meeting demanding performance and cost requirements. Both rack solutions are designed with direct liquid cooling for thermal efficiency and feature PCIe for scale-up and Ethernet for scale-out. With a rack-level power consumption of 160 kW, these solutions reflect Qualcomm’s intent to deliver hyperscaler-grade performance with a focus on sustainability and operational optimization. Built for seamless integration “With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference,” said Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center, Qualcomm Technologies, Inc. “These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand.” Malladi added that Qualcomm’s rich software stack and open ecosystem support will make it easier for developers and enterprises to integrate, manage, and scale already trained AI models. The platform supports leading AI frameworks and one-click model deployment, enabling “frictionless adoption and rapid innovation.” End-to-end AI stack The company’s hyperscaler-grade AI software stack underpins the hardware, offering end-to-end support from application to system software layers. It is optimized for inference across major machine learning frameworks, generative AI platforms, and inference engines. Developers will be able to seamlessly onboard Hugging Face models via Qualcomm’s Efficient Transformers Library and AI Inference Suite: tools designed for operationalizing AI through ready-to-use applications, agents, and APIs. Qualcomm expects the AI200 to become commercially available in 2026, followed by the AI250 in 2027. The company said it plans to maintain an annual cadence of data center product updates, focusing on performance, energy efficiency, and continuous innovation in AI inference.

Qualcomm’s new AI systems promise 10x bandwidth, lower power use

Guess You Like