Web applications and IXPs require large-scale networks to connect thousands—or even millions—of devices. Public and private clouds also need to build these large-scale networks to connect large numbers of servers.
Note
Chapter 17, “Cloud Computing, ” covers public and private clouds.
These operators do not use traditional network designs to connect large numbers of servers and devices. Instead, they use network fabrics. A network fabric differs from a traditional network in four important ways:
• Fabrics are repeatable and regular. Almost any section of a fabric can be moved to another place in the fabric without change or with only minor changes.
• Fabrics are nonplanar. A fabric cannot be described or drawn without pairs of connections crossing one another in the diagram.
• Fabrics rely on high fan-out to carry high traffic volumes rather than high link speeds.
• The characteristics of a fabric can be described in mathematical terms, making it (somewhat) easy to predict performance. For instance, the likelihood of a fabric dropping a packet is related to the oversubscription rate, which you can calculate directly by examining the fabric’s design.
Fabrics can also appear to be a single device from the “outside” or the connected devices. There are no “magical ports” on a fabric when sending traffic between two connected devices; every port provides (largely) identical service.
There are many kinds of fabrics, including toroid and hypercube. Designers rely on spine-and-leaf fabrics for most large-scale designs, however. There are two common kinds of spine-and-leaf fabrics: the Clos and the butterfly.
Clos Fabrics
By the early 1900s, telephone companies were building large networks. The size of these networks made them difficult to build, manage, and troubleshoot. Operators had to manually connect circuits using switchboards, as shown in Figure 13-4.
Figure 13-4 A Switchboard
These switchboards were replaced with mechanical relay crossbars and Strowger fabrics, which were large, heavy, and hard to wire, and required a lot of electricity. In 1938, Edson Erwin started working out an alternative: the spine-and-leaf fabric. Charlie Clos formalized the concept in a paper in 1952, and telephone companies started replacing their large crossbar and Strowger fabrics soon after. Figure 13-5 illustrates a Clos fabric.
Figure 13-5 A Telephone Clos Fabric
In Figure 13-5, telephone A calls telephone B. As the user at A dials B’s number, the various stages of the fabric open and close switches to build a complete electrical path between the two telephones. There are three stages in a Clos fabric:
• The input stage accepts connections from attached devices (telephones).
• The collector stage collects connections from the input stage and distributes them to the output stage.
• The output stage connects the fabric to the receiving device (telephone).
Building the circuit from A to B is called making the circuit, or more simply just a make. When A’s user hangs up, the circuit is broken; this action is called a break.
If telephone B’s line is already in use, the collector rejects the call, and A’s user will receive a busy signal. Rather than delivering part of the information, the fabric rejects the make.
Because the network will never drop any data, it is called nonblocking.
The switches in the input and output stages are also called leaves, and the collector is called a spine—hence the name spine-and-leaf fabric.
Figure 13-6 shows a small three-stage Clos fabric used to build a computer, rather than a telephone, network.
Figure 13-6 A Three-Stage Clos Data Center Fabric The three-stage Clos is illustrated “flat” on the left side of Figure 13-6, much like a Clos fabric used in a telephone network.
Servers, or workloads, can be attached to any leaf, which are also called top-of-rack switches (ToRs). The same network is illustrated on the right side of Figure 13-6, only with the spine routers (or switches) at the top of the diagram.
You should note two interesting points about this design:
• There are links between ToRs or spine routers. All links are between a leaf and spine switches.
• External connectivity is always attached in the same place as a server. The Internet or other external connection into a fabric should always be treated like any other workload.
Using the right routers or switches, you can build Cos fabrics with tens of thousands of ports.
Note
There are exceptions to the second point—where external connectivity attaches to the DC fabric.
These exceptions are outside the scope of this book.
Note
This book will always illustrate spine-and-leaf fabrics using a format like the illustration on the left of Figure 13-6.
In a telephone network, the sender always initiates the process of connecting (circuit) to the receiver. The make process is, therefore, unidirectional: circuits always flow from the sender to the receiver. In computer networks, however, any connected device can send traffic to any other connected device without making a circuit, so the flow is bidirectional.
Because traffic in computer networks can flow in either direction, they are called folded fabrics.
A final point of interest is when Clos fabrics are used to build a computer network, they are noncontending rather than nonblocking. There is no circuit setup process in a computer network, so the network itself cannot reject incoming traffic at the edge; a host cannot get a busy signal when it tries to send a packet through a network to another host.
In most cases, this is not a problem; every pair of hosts connected to a Clos fabric can send traffic to one another at the full bandwidth of their connection. However, if two hosts try to send traffic at their full bandwidth to a third host, the network will drop some of the traffic.
Note
This description of nonblocking assumes the fabric is not oversubscribed. Oversubscription is beyond the scope of this book.