The hidden technology that's supercharging AI and high-performance computing

While the average desktop user probably doesn't need PCIe 5.0, let alone PCIe 6.0 or 7.0, there is one place where even those interconnect speeds aren't enough. The servers that power generative AI, and other high-performance computing workloads need fast lanes between memory, other peripherals, and the CPU, and PCIe isn't enough on its own. One way that the growing bandwidth needs of the datacenter are being addressed is by building on top of the PCIe specification, in a similar way that overlay networks like Tailscale build over the existing infrastructure to create a new way of connecting. Compute Express Link (CXL) is one of those new classes of interconnect that extends PCIe in new ways, and it enables fast, cache-coherent computing even when using multiple servers to do so. What is Compute Express Link (CXL)? High-speed interconnects between peripherals and your CPU PCIe (Peripheral Component Interconnect Express) lanes are arguably one of the most critical parts of any modern computer, enabling high-speed connectivity to graphics cards, network cards, and other handy devices. That bandwidth is even more crucial now for AI workloads, but even the upcoming PCIe 7.0 isn't enough to keep up with the voracious appetites of large language models. You might be aware that each component connected to the CPU has a latency penalty for accessing the data, which is why CPUs have ever-increasing amounts of cache. Directly connected DRAM has the next lowest latency, and then there's a massive jump to the latency penalty if the CPU has to go to the SSD for data. CPU memory controllers can only address so much RAM at one time, although the motherboard often doesn't have enough slots to support the full amount the CPU can handle. Enter CXL CXL is "an open standard industry-supported cache-coherent interconnect for processors, memory expansion, and accelerators." That boils down to a way to maintain memory coherency between the CPU memory space and however many CXL-enabled attached devices there are in use, whether they're other servers, accelerators, or CXL memory expansion cards. The three sub-protocols of CXL: CXL.io: Handles PCIe-style I/O operations for device discovery and configuration CXL.cache: Enables accelerator devices to cache and access the host CPU's memory directly CXL.mem: Makes it so a CPU can use memory attached to other CXL devices as if they were local system resources Like NVMe is a data transfer protocol over PCIe, CXL is a memory sharing protocol that uses PCIe as the backbone for transfer. This increases overall computing performance with multiple hosts, enabling more capacity and bandwidth than the number of direct-attach DIMM slots available on servers. Where is it used? The primary place you'll find CXL is in the datacenter, where multiple servers are linked together to create memory pools for use. It lets datacenters intelligently share memory resources between accelerators, CPUs, and other attached devices, providing enough bandwidth for high core count CPUs and their complex computing tasks. But some workstation platforms also support the technology. Gigabyte's TRX50 AI TOP motherboard supports CXL expansion cards like the one in the image above, adding up to 512GB of DDR5 RDIMM RAM to the system. The motherboard supports up to 2TB of DDR5 memory over eight RDIMM slots, but adding the AI Top CXL R5X4 increases that substantially. The other thing is that most other CXL solutions, like those from Samsung, have a fixed amount of memory onboard, while this card lets you choose your own amount. Your desktop might not need it But there are similar technologies available For most users, the general purpose of PCIe will be more than enough for their desktop computing needs. After all, we're still barely managing to utilize PCIe 5.0 fully. But that doesn't mean that low-latency, high-speed connectivity to the CPU or other devices will be constrained to the data center. Windows 11 2022 Update added DirectX 12 Ultimate, including DirectStorage, which directly lets modern GPUs decompress game assets from a compatible NVMe SSD, bypassing the CPU. There's always the chance that CXL will come to more desktop platforms, as there are some use cases for content creators and home lab enthusiasts, but it's still a ways off from mainstream adoption. Compute Express Link enables new ways for servers to communicate Compute is expensive, and getting even more so. CXL reduces the costs of data center computing by sharing resources intelligently among the servers. Combining this with containerized workloads improves efficiency, without adding more heat-generating hardware to the mix, and it's one of the technologies that will be crucial for enterprise computing.

The hidden technology that's supercharging AI and high-performance computing

Guess You Like