Intel® Omni-Path Host Fabric Interface (HFI)
Designed specifically for HPC, the Intel® Omni-Path Host Fabric Interface (Intel® OP HFI) uses an advanced connectionless design that delivers performance that scales with high node and core counts, making it the ideal choice for the most demanding application environments. Intel OP HFI supports 100 Gbps per port, which means each Intel OP HFI port can deliver up to 25 GBps per port of bidirectional bandwidth. The same ASIC utilized in the Intel OP HFI will also be integrated into future Intel® Xeon® processors and used in third-party products.
Each HFI supports:
- Multi-core scaling – support for up to 160 contexts;
- 16 Send DMA engines (M2IO usage);
- Efficiency – large MTU support (4 KB, 8 KB, and 10KB) for reduced per-packet processing overheads. Improved packet-level interfaces to improve utilization of on-chip resources.
- Receive DMA engine arrival notification;
- Each HFI can map ~128 GB window at 64 byte granularity;
- Up to 8 virtual lanes for differentiated QoS;
- ASIC designed to scale up to 160M messages/second and 300M bidirectional messages/second.
Intel® Omni-Path Host Fabric Interface (HFI) Optimizations
Much of the improved HPC application performance and low end-to-end latency at scale comes from the following enhancements:
Enhanced Performance Scaled Messaging (PSM).
The application view of the fabric is derived heavily from—and application-level software compatible with—the demonstrated scalability of Intel® True Scale Fabric architecture by leveraging an enhanced next generation version of the Performance Scaled Messaging (PSM) library. Major deployments by the US Department of Energy and other have proven this scalability advantage. PSM is specifically designed for the Message Passing Interface (MPI) and is very lightweight—one-tenth of the user space code—compared to using verbs. This leads to extremely high MPI and Partitioned Global Address Space (PGAS) message rates (short message efficiency) compared to using InfiniBand* verbs.
“Connectionless” message routing.
Intel® Omni-Path Architecture—based on a connectionless design—does not establish connection address information between nodes, cores, or processes while a traditional implementation maintains this information in the cache of the adapter. As a result, the connectionless design delivers consistent latency independent of the scale or messaging partners. This implementation offers greater potential to scale performance across a large node or core count cluster while maintaining low end-to-end latency as the application is scaled across the cluster.