diff options
Diffstat (limited to '')
-rw-r--r-- | Documentation/mic/scif_overview.txt | 98 |
1 files changed, 98 insertions, 0 deletions
diff --git a/Documentation/mic/scif_overview.txt b/Documentation/mic/scif_overview.txt new file mode 100644 index 000000000..0a280d986 --- /dev/null +++ b/Documentation/mic/scif_overview.txt @@ -0,0 +1,98 @@ +The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low +level communications API across PCIe currently implemented for MIC. Currently +SCIF provides inter-node communication within a single host platform, where a +node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of +communicating over the PCIe bus while providing an API that is symmetric +across all the nodes in the PCIe network. An important design objective for SCIF +is to deliver the maximum possible performance given the communication +abilities of the hardware. SCIF has been used to implement an offload compiler +runtime and OFED support for MPI implementations for MIC coprocessors. + +==== SCIF API Components ==== +The SCIF API has the following parts: +1. Connection establishment using a client server model +2. Byte stream messaging intended for short messages +3. Node enumeration to determine online nodes +4. Poll semantics for detection of incoming connections and messages +5. Memory registration to pin down pages +6. Remote memory mapping for low latency CPU accesses via mmap +7. Remote DMA (RDMA) for high bandwidth DMA transfers +8. Fence APIs for RDMA synchronization + +SCIF exposes the notion of a connection which can be used by peer processes on +nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A +process in a SCIF node initiates a SCIF connection to a peer process on a +different node via a SCIF "endpoint". SCIF endpoints support messaging APIs +which are similar to connection oriented socket APIs. Connected SCIF endpoints +can also register local memory which is followed by data transfer using either +DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and +kernel mode clients which are functionally equivalent. + +==== SCIF Performance for MIC ==== +DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus +SCIF shows the performance advantages of SCIF for HPC applications and runtimes. + + Comparison of TCP and SCIF based BW + + Throughput (GB/sec) + 8 + PCIe Bandwidth ****** + + TCP ###### + 7 + ************************************** SCIF %%%%%% + | %%%%%%%%%%%%%%%%%%% + 6 + %%%% + | %% + | %%% + 5 + %% + | %% + 4 + %% + | %% + 3 + %% + | % + 2 + %% + | %% + | % + 1 + + + ###################################### + 0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+- + 1 10 100 1000 10000 100000 + Transfer Size (KBytes) + +SCIF allows memory sharing via mmap(..) between processes on different PCIe +nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap +latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs. + +SCIF has a user space library which is a thin IOCTL wrapper providing a user +space API similar to the kernel API in scif.h. The SCIF user space library +is distributed @ https://software.intel.com/en-us/mic-developer + +Here is some pseudo code for an example of how two applications on two PCIe +nodes would typically use the SCIF API: + +Process A (on node A) Process B (on node B) + +/* get online node information */ +scif_get_node_ids(..) scif_get_node_ids(..) +scif_open(..) scif_open(..) +scif_bind(..) scif_bind(..) +scif_listen(..) +scif_accept(..) scif_connect(..) +/* SCIF connection established */ + +/* Send and receive short messages */ +scif_send(..)/scif_recv(..) scif_send(..)/scif_recv(..) + +/* Register memory */ +scif_register(..) scif_register(..) + +/* RDMA */ +scif_readfrom(..)/scif_writeto(..) scif_readfrom(..)/scif_writeto(..) + +/* Fence DMAs */ +scif_fence_signal(..) scif_fence_signal(..) + +mmap(..) mmap(..) + +/* Access remote registered memory */ + +/* Close the endpoints */ +scif_close(..) scif_close(..) |