Saturday, September 8, 2012

Infiniband Interconnect Terminologies Demystified

Hey Guys, just a short post about Infiniband (IB) terms and actual data rate possible through IB interconnects. I hate it when Marketing/Sales people in HPC field use signaling rate instead of actual/effective data rate possible through these interconnects at link layer. In context of this post I am dropping the other overheads in the stack, which decreases the speed further at OS layer. Days of pure technology are lost in the mist of Capitalism. So understand the terms clearly and give "In Your Face" responses to these people who try to mislead clients into black holes by blabbering about fake link speeds.

Infiniband switches perform cut-through method of switching to achieve the ultra-low latency. However, in some cases store-forward method is used we will focus on this at the end of the post. Consider the table in the following image, the base IB rate is 2.5 Gbps which is SDR. Remember always that link speed & link width goes hand in hand. Correct data rate can only be described with use of both the link speed and link width. As of today possible link speeds are 1X,4X & 12X. Never say "QDR" in conversation to avoid effective data rate confusions, always use "4X QDR" if you mean to communicate a signaling rate of 10 Gbps and effective data rate of 32 Gbps full duplex. Marketing/Sales people tend to use 40 Gbps for 4X QDR which is utterly baseless, QDR indicates a standard signaling rate of 10 Gbps, you don't multiply 4 and 10 to get 40 Gbps. 4X means 4 lanes of QDR capability, resulting in multiplexing the data over 4 lanes to achieve 32 Gbps effective speed after 8B/10B encoding. So in "4X QDR" you get 32 Gbits/sec of actual data rate in transmit and receive, in short 32 Gbps full duplex. So the core difference is of signaling rate in SDR,DDR,QDR & FDR connections. Signaling rate defines the quantity of bits which can signaled at single instance. FDR achieves more efficiency by using 64B/66B encoding as compared to 8B/10B encoding.

1X Transmit physical lane consist of one differential pair, i.e. two wires.
1X Recieve physical lane consist of one differential pair, i.e. two wires.

Infiniband Speed/Width/Encoding/Lanes/Wires

Infiniband protocol transfers data in serial fashion. Pure serial transmission is attempted on 1X Link. Speed of transmission depends upon the signaling rate or link width. Links above 1X capability multiplexes data to achieve parallel data transmission by transmitting chunks of frame in serial mode over multiple lanes.

Note: - Do not confuse "Lanes" over Virtual Lanes (VL's) in IB, this topic is out of scope for this post (someday I will post about it,don't worry). In short, VL is application layer abstraction to allow multiple applications to subscribe to Work Request Queue's in IB with same physical lanes.For the sake of this post, consider Lane as Physical Lane.

Image below gives a clear idea of how data is transmitted on 1X SDR link & 4X SDR link, same analogy applies to other combinations of link speeds and link widths.

Infiniband Data Transmission Pattern

Infiniband Differential Pairs in 1X & 4X

It is recommended to have a uniform link speed/width in one fabric domain in HPC environment to achieve optimum performance & low latency. Connecting 4X QDR capable HCA (Host Channel Adapter) to 1X QDR capable HCA results in Store-Forward switching as frame needs to assembled completely first in the buffer of one HCA and then forwarded to 1X QDR HCA in serial fashion. If IB network architecture & devices are capable of handling few lower link connections without affecting the latency of uniform speed connections, then it is fine to have mixed link speeds/widths.