Due to the accelerated growth in performance of microprocessors and the recent emergence of chip multiprocessors (CMP), the critical performance bottleneck of high-performance computing systems has shifted from the processors to the communications infrastructure. By uniquely exploiting the parallelism and capacity of wavelength division multiplexing (WDM), optical interconnects offer a high-bandwidth, low-latency solution that can address the bandwidth scalability challenges of future computing systems.
Keren Bergman, Columbia University
David Keezer, Scott Wills, Georgia Institute of Technology
Keren Bergman, Columbia University
Gary Carter, University of Maryland Baltimore County
This project aims to develop a scalable architecture for an optical packet-switching (OPS) fabric for use in massively parallel high-performance computing systems. The goal is to provide large-bandwidth, low-latency connectivity between each of the users (e.g. processing nodes and memory banks) in such a system. The data vortex is a distributed deflection-routing interconnection network architecture designed to fully exploit the properties of fiber-optic technology in order to achieve ultrahigh bandwidth, low latency, and a high degree of port-count scalability. Utilizing photonic switching elements which facilitate transparent, broadband, self-routed optical packets, we have constructed a complete 36-node 12×12 optical switch with terabit-per-second scale routing capacity (see images below). This is the first complete implementation of a truly OPS network where no central arbitration is required. The system is capable of routing packets with immense bandwidths—160 Gb/s (10 Gb/s per channel × 16 wavelength channels) has been demonstrated but much higher rates are possible—from any one of the 12 input ports to any one of the 12 output ports, providing a median (and average) latency of 110 ns.
A number of experimental investigations of the system behavior have been performed, demonstrating for instance the flexibility of the system to power and timing variations. Additionally, injecting packets with varying duration and varying numbers of wavelength channels into the network simultaneously has verified the system’s robustness to variable message sizes.
Although the implemented switch fabric accommodates 12 ports, the intended application requires extensive scalability to harbor multi-thousands of users. The precise photonic implementation of the physical layer in such scaled systems may limit the physical size and usable bandwidth. Because SOAs are employed as the key switching element, understanding their performance characteristics is important to optimizing the system. A recirculating testbed environment has been implemented to experimentally study the physical limitations of OPS networks induced by the individual switching elements. This testbed has enabled unique physical layer analysis in the context of system scalability investigations, the effects of different modulation formats, and the first investigation of polarization dependent gain in OPS networks. Modeling and simulation of the physical-layer behavior are used in conjunction with the experiments to optimize the information carrying capacity and efficiency in future OPS network implementations.
For a full overview of our body of work related to the data vortex OPS interconnection network systems and subsystems, as well as many improvements that we have made to the original fabric, see our recent invited paper in the Journal of Lightwave Technology.
To address the need for a practical solution for buffering optical packets, we have developed a novel optical packet buffer architecture. The transparent buffering design is comprised of identical SOA-based building-block modules, yielding straightforward scalability and extensibility. In a time-slotted manner, the buffer supports independent read and write processes without packet rejection or misordering. Both first-in first-out (FIFO or queue) and last-in first-out (LIFO or stack) prioritization schemes have been experimentally realized. Further, active queue management (AQM) can be implemented on the buffer architecture to allow for network congestion control. Simulations have verified the improved buffer performance with latency, and experiments have demonstrated the functional verification of the optical packet buffer modeling AQM. The basic optical packet buffer architecture has also been adapted to realize network interface packet injection control for an optical packet-switched network, accepting backpressure and controlling the traffic injected into the network.
Providing a seamless gateway from the compute nodes to the photonic network represents a key enabler for the realization of an end-to-end optical interconnection fabric. Format and data rate mismatches at the electronic/photonic edge necessitate the development of a unique network interface to address these critical bottlenecks. The design and development of an Optical Network Interface Card (ONIC) that will address these challenges to latency and bandwidth scalability is currently underway.
The ONIC manages the encoding, packetization, and multiplexing/demultiplexing of serial electronic data to a wavelength-parallel optical format in a low-latency manner that is transparent to the interconnected compute nodes. By abstracting away the interconnection network, the attached compute nodes can exploit the capacity of WDM – supporting high-bandwidth multi-wavelength striped message exchange – while being completely agnostic to the details of the underlying photonic interconnect.
The advancement of various standard protocols, such as PCI Express and InfiniBand, has enabled interconnectivity among diverse communicating modules and processors. By leveraging field programmable gate array (FPGA) technology, we have flexibility in the definition of the data exchange protocol, optical message structure, timing, and synchronization. This flexibility thus allows for the architectural exploration, experimental testing, and design validation of a variety of computing systems supporting different communications protocols.
PCI Express (PCIe) has emerged as the preeminent protocol standard for high-bandwidth chip-to-chip communication in current and future generation computing systems. PCIe is based on packetized serial point-to-point links and, in its current iteration, can support up to 16 data lanes per link signaling at 8 Gb/s per lane.
We have developed an all-optical photonic interface capable of transparently formatting serial data streams, such as PCIe, into high-bandwidth wavelength parallel photonic packets. Employing the aforementioned photonic interface, we have also demonstrated the end-to-end generation of a PCIe link originating from a remote endpoint across the interface to a host computer. The remote endpoint is implemented on a customized FPGA-based device. PCIe traffic originating from the endpoint and logically routed through the established PCIe link is demonstrated via direct memory accesses (DMA) initiated by an application running on an x86-based PC. Successful transmission of PCIe data at 2.5 Gb/s and maintenance of the logical PCIe link was experimentally confirmed across eight wavelengths.
InfiniBand is a switched-fabric interconnection network prevalent in today's high-performance computing systems. Infiniband routes variable-sized packets from source to destination at high data rates and low latencies. And like 10 Gigabit Ethernet, the individual copper cables linking the network together have been replaced with WDM fiber optics. However, the Infiniband switches are still electronic, necessitating an optic to electronic conversion at every switch in the network. Our research aim in this arena is to transparently send Infiniband packets over our optical packet-switched networks. To achieve this goal, we are adapting an off-the-shelf FPGA board to decode the Infiniband packet from a standard Infiniband HCA (essentially a network interface card) and transport it out over our optical network. The picture at left shows some of our test equipment, notably the FPGA board and physical adapters. We are working to expand our test setup to create a point-to-point link between two Infiniband HCAs using our optical packet format.
In order to evaluate critical architectural design considerations a macro-scale test-bed environment consisting of high-performance blade-level CMP compute nodes interfaced to an optical interconnection network via the aforementioned ONIC is currently under development. This interdisciplinary effort will leverage ongoing research in optical interconnection networks, low-latency photonic network interfaces, and communication intensive applications to create a coherent end-to-end prototyping environment. The implemented test-bed will facilitate the experimental validation of end-to-end optical message exchange in various hardware platforms and topological configurations, enabling realistic investigation of critical architectural concepts and real-world performance characterizations. Specifically, the test-bed will provide a platform for investigation of optical address encoding/decoding methods, demonstration of photonic end-to-end payload routing and transmission, and evaluation of interface/routing latencies and throughput scaling in a practical environment. Furthermore, by executing realistic application driven traffic on high-bandwidth inter-node optical communications platforms, this demonstrator will provide a means for evaluating high-performance WDM infrastructures as full network solutions for CMP-based advanced computing systems. It is our vision that the successful implementation this interconnect fabric system test-bed will help bridge the gap toward practical use of optical interconnects in future high-performance computing systems.
In an effort to bridge the gap between academic research and real-world application, we are collaborating closely with the engineers at Intel Research towards a path for viable commercialization of optical interconnects supporting multi-wavelength striped message exchange for high performance cluster computing. The focus of our collaboration includes the design and development of a unified experimental platform consisting of the ONIC and the interconnect fabric system test-bed. Investigations on various real-world performance metrics and design considerations will be conducted with continual interactions with Intel engineers to enable a cohesive partnership exploiting the resources and expertise of each group.
The limitations of main memory accesses have become a bottleneck for the scaling of high-performance computing systems. Memory systems must balance requirements for large memory capacity, low access latency, high bandwidth, and low power. As such, the electronic interconnect between a processor and memory is a key consideration in overall system design. The current trend in memory system design is to access many memory devices in parallel and transmit data over a high-speed, path-length-matched, wide electronic bus. Overall, the increasing number of memory devices, higher bus data rates, and wiring constraints are pushing the limits of electronic interconnects and threatening the performance gains of future computing systems. Optically-connected memory systems can enable continued performance scaling through high-bandwidth capacity, energy-efficient bit-rate transparency, and time-of-flight latency. Memory device parallelism and total capacity can scale to match future high-performance computing requirements without sacrificing data-movement efficiency.
Our optically-connected memory system provides an all-optical link between processing cores and main memory across an optical interconnection network, such as the interconnect fabric system test-bed. By implementing the processing cores on FPGAs we can explore novel memory architectures and model various applications. We have the ability to perform in-depth architectural exploration and validation, and we are working to close the growing performance gap between processors and memory.