Рет қаралды 93
Jon Hermes, Staff Software Engineer - Arm
CXL attached memory devices have the potential to cut down on growing datacenter memory costs and will be a powerful tool for upcoming datacenter designs. However, the use of this far-memory induces potentially invalidating latency costs and bandwidth limitations for memory sensitive workloads. Page- and workload memory placement strategies can avoid some of the harm, but these strategies underutilize the far memory.
In this presentation, we demonstrate a strategy which allows for the cost-saving use of far-memory even for memory sensitive workloads via computational offloading. While emulating a far-memory environment, we identified and characterized both VectorDB and AI/ML workloads that would be excessively harmed by the use of far-memory, and show two different approaches to offload the most critical sections of computation to an emulated core on the far-memory side. In both cases, the very worst case is avoided and only a minor overhead cost is incurred.