New Crossbar Directly Switches Variable-Size Packets
by Manolis Katevenis and Nikos Chrysos
Networks carry variable-size packets, but router crossbars can only switch them after segmentation into fixed-size cells. This situation will soon change, with the development of a new architecture designed to remove the inefficiencies associated with packet segmentation and reassembly (SAR).
The Internet carries information in packets whose size varies from 40 bytes to a few kilobytes. Variable-size packets are also used in the majority of communication across the whole spectrum, from WAN, MAN, LAN, and cluster interconnects to storage, server, computer I/O, processor-memory interconnects, embedded systems, and networks-on-a-chip.
These ubiquitous networks are formed by interconnecting switches or routers, which are in turn usually built around a crossbar switch at their core. Crossbars allow multiple, simultaneous transfers of information, as shown in Figure 1, and are thus replacing old-time buses, which preclude any such parallelism.
Figure 1 shows a 3x3 switch with buffer memories at the inputs (each containing three per-output queues), as in the usual contemporary architectures. The crossbar is configured to pair inputs with outputs. Choosing a 'good' configuration is crucial and complicated: some configurations are inefficient (eg not pairing output C to input 2), while others are unfair. When the crossbar configuration is changed, all input-output pairings also change, and consequently, crossbars inherently operate on fixed-size cells.
To route variable-size packets, we segment them into fixed-size cells, get the cells through the crossbar, and then reassemble the original packets at the outputs. This introduces inefficiencies; for example, a 65-byte packet in a system employing 64-byte cells costs as much as a 128-byte packet. To make things worse, crossbar configurations are often imperfect, because the complex scheduling problem must be solved in just a few tens of nanoseconds. To compensate for these two inefficiencies, crossbars must switch cells faster than their rate of arrival. This ratio is called the crossbar 'speedup' factor; commercial products use speedups in the range of three, implying that the fastest lines that can be handled are about three times slower than the fastest crossbar that can be built!
In the last five years, a new crossbar architecture that improves scheduling efficiency has been investigated. In Figure 1, decisions at the outputs (ie choose an input to read from) are interdependent, because input conflicts are not allowed. The new architecture relaxes these dependencies by placing small buffer memories at each crosspoint, as shown in Figure 2.
|Figure 1: A crossbar switch allows parallel communication paths between arbitrary input-output pairs.
|Figure 2: Small buffer memories at the crosspoints allow distributed scheduling decisions. An important by-product is that operation with variable-size packets now becomes feasible.
Each output scheduler now chooses a packet from one of the non-empty buffers in its column; such choices are independent. Similarly, each input scheduler independently chooses to forward traffic to one of the non-full buffers in its row. In the long run, some buffers will empty and others will fill up, thus indirectly coordinating the scheduler decisions. This new architecture has become feasible because we are now able to integrate several MBytes of RAM inside crossbar chips; this allows much simpler and more efficient crossbar scheduling, and thus removes one of the two reasons for using crossbar speedup.
In the last few years, three research groups have observed that this new architecture is also capable of operating directly on variable-size packets, without segmentation and reassembly (SAR). Given the scheduler independence, there is no need to change configurations in synchrony and hence no need for a single, common, fixed cell size. This observation radically changes the entire system.
Without SAR, the second reason for crossbar speedup is also eliminated. Hence, the new switches can handle line rates as fast as the fastest crossbar that can be built, that is, line rates about three times higher than the old crossbars permitted. Further, cost is greatly reduced because output buffer memories, which were used to hold the cells that accumulate at the outputs due to speedup, and were also used for packet reassembly, are no longer necessary.
FORTH is one of the pioneers working on this new architecture and advocating its adoption. Our research group on Packet Switch Architecture, comprising about eight people in the Institute of Computer Science, Crete, Greece, is completing the design and layout of a buffered crossbar CMOS chip, containing roughly 150 million transistors, that directly switches variable-size packets. Our results can be found at the link below.
Switch Architecture in Europe
Packet Switch and Router Architecture is becoming increasingly important, as interconnection networks now constitute the backbone of all emerging information and communication systems, and the switch and router market is growing quickly. We foresee the emergence of commodity switches - low-cost, universal building blocks - that will alter the router market in the same way as PC clusters based on commodity processors altered the supercomputer market. In the last couple of years, about a dozen European organisations (research centres, universities and industry) that are heavily involved in R&D in this area have strengthened their cooperation, working towards radically improved switch and interconnection architectures, and towards a leading European presence in this crucial infrastructure area.
Manolis Katevenis, FORTH-ICS
Tel: +30 2810 39 1664