by M. Katevenis and C. Nikolaou
Two key ingredients are needed to make high performance parallel and distributed computer systems: high-speed computation and high-speed communication. The application of RISC principles to modern processor architecture has yielded high-speed computation engines. Our work is inspired by those same principles, in developing building blocks for high-speed communication switches (routers) and network-processor interfaces.
"Telegraphos" - from the Greek words "thle" (remote) and "grafw" (write) - is an R&D project to design architectures and build prototypes for such components. Telegraphos is funded by ESPRIT project 6253 "SHIPS". Our research focuses on congestion-tolerant networks, and on remote-write based network-processor interfaces. Our prototypes connect workstations within an office space, turning them into a parallel computer. Telegraphos I, the first prototype, currently in its last phase of design, uses low-end technology (FPGA) as a means for rapid prototyping, and will run at 200 Mb/s/link. Systems running at much higher speeds, using ASIC and full-custom CMOS, are next on our schedule. Our remote-write based interfaces drastically cut down the software overhead for communication, thus making Telegraphos into something more than a workstation farm: a true parallel computer.
Point-to-Point Links with Back-Pressure
High-speed networks cannot afford multiple-access media any more - turn-around delays and arbitration complexity are excessive. Telegraphos uses switches interconnecting multiple point-to-point links, thus providing high throughput and increased parallelism. Preventive flow-control - ticket (window) based back-pressure - is used at the link level, so that packets never have to be dropped and retransmitted. This economizes on network bandwidth, and reduces the hardware complexity. Storage is less expensive than communication in modern IC technology, so enough buffer space is provided for peak throughput not to be compromised.
Parallel programs may generate excessive and uncontrolled traffic loads at times; when this happens, the network must behave robustly. Switch buffers that are shared among packets going in different directions degrade the network throughput when the load exceeds a certain limit. Telegraphos uses dedicated buffers per virtual path in each switch, and implements flow-control at that granularity, thus achieving good and stable throughput at increased network load.
Remote-Write: Efficient Communication Primitive
The central operation of the Telegraphos processor-network interface is the remote write: when the virtual address of a store instruction executed by the processor translates into a physical address in the Telegraphos device I/O space, the Telegraphos network interface sends it to the destination node (determined by the physical page number) in a network packet. The Telegraphos interface at that node performs the (local) memory write. In this way, the existing memory protection (address translation) mechanism is also used to implement message protection: a process S can only send messages to the mailboxes of those processes R for which the OS has given S an entry in its translation table; similarly, a process can only receive (read) the messages that were sent into pages that it has access to. As a consequence, message exchange can be done strictly at user-level, without need for any time-consuming system call. Additionally, this scheme excludes receive buffer overflows, and provides automatic message re-assembly from the multiple packets into which the message was broken, again simplifying the hardware.
Communication and Sharing in Parallel Systems
Besides low-overhead message passing, we are also working on low-cost support for shared memory. Telegraphos I includes hardware support for eager sharing: writes into a page are automatically reflected, in hardware, into all copies of the page that may exist on multiple nodes. Also, besides remote writes, we provide remote atomic operations, blocking and non-blocking remote reads. Thus, for each page, the compiler, the run-time environment, or the OS can choose local replication or remote accessing; in case of replication, a choice of coherence protocol is provided between write-invalidate and write-update. These choices - static or dynamic - are assisted by hardware counters, counting the number of local and the number of remote accesses per sharable page.