Kernel-Initiated One-Sided Networking for GPU-Accelerated AI Workloads
Full paper
Mammoth: Macro-Level MPI Offloading to Off-path Accelerator in DPU
Full paper
Scalable, Topology- and Multi-HCA-Aware Hierarchical GPU Allgather Using Parallel Rings
Full paper