Point-to-Point Routing in the Internet
4.4.1 The IP Protocol
The Internet network layer standard is the IP protocol, version 4, which
is specified in [RFC 791]. It is important
to note that while IP is the Internet's network layer standard it does
not specify a routing algorithm. Instead the IP standard specifies:
-
a packet (datagram) format - the format for all network
layer packets in the Internet. Recall that network layer packets
in the Internet are referred to as datagrams. Since this section
examines point-to-point routing in the Internet, we adopt this terminology
ere,
-
conventions for handling IP datagrams at routers and end systems.
As noted above, these conventions do not specify how datagram routes are
selected. Instead, they are primarily concerned with handling exception
conditions such as incorrect datagram format or lengths, and detected
bit errors.
The IPv4 datagram format is shown in Figure 4.4-1.
Figure 4.4-1: IPv4 datagram format
Some of the key fields in the IPv4 datagram are the following:
-
version number. These 4 bits specify the IP protocol
version of the datagram. By looking at the version number, the router
can then determine how to interpret the remainder of the IP datagram.
Different versions of IP use different datagram formats. The datagram
format for the "current" version of IP, IPv4, is shown in Figure 4..4-1.
The datagram format for the "new" version of IP (IPv6) is shown, and discussed
later, in Figure 4.5-XX.
-
header length. Because an IPv4 datagram can contain
a variable number of options (which are included in the IPv4 datagram header)
these 4 bits are needed to determine where in the IP datagram the data
(payload) actually begins. Most IP datagrams do not contain options so
the typical IP datagram has a 20 byte header.
-
TOS. The type of service (TOS) bits were
included in the IPv4 header to allow different "types" of IP datagrams
to be distinguished from each other, presumably so that they could be handled
differently in times of overload. When the network is overloaded,
for example, it would be useful to be able to distinguish network control
datagram (e.g., see the ICMP discussion in section 4.4.6) from datagrams
containing FTP data. It would also be useful to distinguish real-time datagrams
(e.g., used by an IP telephony application) from non-real-time traffic
(e.g., FTP). More recently, one major routing vendor (Cisco) interprets
the first three ToS bits as defining differential levels
of service that can be provided by the router. The specific level
of service to be provided is a policy issue determined by the router's
administrator. We shall explore the topic of differentiated service in
detail in chapter 7.
-
datagram length. This is the total length of the IP datagram
(header plus data) measured in bytes. Since this field is 16 bits
long, this limits the maximum size of the IP datagram to 65,535 bytes.
-
identifier, flags, fragmentation offset. These
three fields have to do with so-called IP fragmentation, a topic we will
consider in depth shortly. Interestingly, the new version of IP, IPv6,
simply does not allow fragmentation.
-
time-to-live. The time-to-live (TTL) field is included
to insure that datagrams can not circulate forever (due to, for example,
a long lived router loop) in the network. This field must be decremented
by at least one each time the datagram is processed by a router.
If the TTL field reaches 0, the datagram must be dropped.
-
protocol. This field is only used when an IP datagram
reaches its final destination. The value of this field indicates
the upper level protocol at the destination to which the payload
(data) portion of this IP datagram will be passed. For example, a
value of 6 indicates that the data portion is a TCP segment, while a value
of 17 indicates that the upper layer protocol is UDP. For a listing
of all possible numbers, see [RFC 1700].
-
header checksum. The header checksum aids a router in detecting
bit errors in a received IP datagram. The header checksum is computed
by treating each 2 bytes in the header as a number and summing these numbers
using 1's complement arithmetic. The 1's complement of this sum,
known as the Internet checksum, is stored in the checksum field.
A router computes the Internet checksum for each received IP datagram and
detects an error condition if the checksum carried in the datagram does
not equal the computed checksum. Note that the checksum must be recomputed
and restored at each router, as the TTL field, and possibly options fields
as well, may change. An interesting discussion of fast algorithms
for computing the Internet checksum is [Braden
1989]. It is also worth noting that there are alternate
methods for defining a checksum that allow a broader range of errors to
be detected. For example, the cyclic redundancy check [XXXX] used
in numerous other protocols provides better protection against some
form of bit errors than the Internet checksum.
-
source, and destination IP address. These
fields carry the 32 bit IP address of the source and final destination
for this IP datagram. We will study IP addresses shortly.
-
options. The options fields allows an IP header
to be extended to include information such as a complete list of
routers to be visited on the source-to-destination path (the source routing
option), a partial list of routers that should be visited (the loose source
routing option), timestamp information (timestamp option) or router address
information from a relatively small number of routers handling the datagram
(record route option) on its path from source to destination.. Header
options were meant to be used rarely -- hence the decision to save overhead
by not including the information in options fields in every datagram header.
However, the mere existence of options does complicate matters -- since
datagram headers can be of variable length, one can not determine a priori
where the data field will start. Also, since some datagrams may require
options processing and others may not, the amount of time needed to process
a IP datagram can vary greatly. These considerations become particularly
important for IP processing in high performance routers and hosts.
For these reasons and others, IP options were dropped in the IPv6 header.
-
data. Finally, we come to the last, and most important
field - the raison d'etre for the datagram in the first place! The
data field of the IP datagram contains the data to be delivered to the
upper layer protocol (see the protocol field discussed above) at
the destination.
IP addressing
Every host, router, gateway or network device that is capable of sending
or receiving an IP datagram has an IP address. More precisely, an
IP address is associated with each network interface and thus a host, router,
gateway or device will have one or more IP addresses. (Gateways,
in particular, will typically have multiple network interfaces/addresses
-- one per attached network). IP addresses are meant to be globally unique,
that is, no two network entities in the world can have the same IP address.
Figure 4.4-2: IPv4 address structures
Figure 4.4-2 shows the four possible formats of an IP address. A fifth
address, beginning with 11110 is "reserved for future use. An IP
address is 32 bits long and address is divided into a "network" part --
the bits that identify the autonomous system (AS) that the host,
router, gateway or network device belongs to, and a "host" part -- the
bits the identify the host, router, gateway within the AS. To a first
approximation, it is the network part of the address that is used for inter-domain
routing. (This is not precisely true, as a technique known as subnetting
allows the administrator who controls the "network" specified by the network
part of the address to place additional structure -- to define subnetworks
within its network -- using some of the "host" bits. A discussion
of subnetting can be found in RFC 950)
For a class A address, the first 8 bits identify the network, i.e.,
the autonomous system that we discussed in section 4.3, and the last 24
bits identify the host, router, gateway, or network device within that
AS. (For exposition purposes, we will assume from here on that
the network entity is a host). The class B address space allows
for 214 networks, with up to 216 hosts with each
network. Class C address use 21 bits to identify the network and
leave only 8 bits for the host identifier. Class D addresses are
reserved for so-called multicast addresses. As we will see in section
4.6, these addresses do not identify a specific host but rather provide
a mechanism through which multiple hosts can receive a copy of each
single packet sent by a sender.
Before leaving our discussion of addressing, a few comments are in order.
First, it should be noted that IP addresses do not reflect any geographical
relationship among entities. For example, unlike the telephone network
with its standard country code numbers, it is not possible to tell where,
physically, a network is. If there were so, it might have been possible
to design routing algorithms that would look at the address and route a
datagram according (e.g., from western Europe one would generally route
trans-atlantic to the west to reach a US address). Second,
mobile hosts may change the network to which they are attached, either
dynamically while in motion or on a longer time scale. Because routing
is to a network (AS) first, and then to a host within the network, this
means that the mobile host's IP address must change when the host changes
networks. Techniques for handling such issues are now under development
within the IETF and the research community [RFC2002
RFC2131].
Now that we've seen the structure of the IP address, you might wonder
where would you actually go to get one, e.g., for your network? The
first contact might well be with your Internet service provider who would
provide an address from a block of addressees that have already been allocated
to that ISP. But how does an ISP get a block of addresses?
IP addresses are managed under the authority of the Internet Assigned Numbers
Authority (IANA), under the guidelines
set forth in RFC 2050. The actual assignment
of addresses is now managed by regional Internet registries. As of
mid-1998, there are three such regional registries: the American Registry
for Internet Number (ARIN, which handles
registrations for North and South America, as well as parts of Africa.
ARIN has recently taken over a number of the functions previously provided
by Network Solutions), the Reseaux
IP Europeens (RIPE, which covers Europe
and nearby countries), and the Asia Pacific Network Information Center
(APNIC).
IP fragmentation and reassembly
We will see in Chapter 5 that different data link layer protocols have
different maximum packet sizes that they can carry. The maximum size
of the payload (data) part of a data-link packet is called the MTU (maximum
transfer unit). For example the MTU for Ethernet is 1,500 bytes.
A hard limit on the size of a data link packet places a hard limit on
the length of an IP datagram that is carried across that link, since
the IP datagram will be the payload of the data link packet. Suppose
then that a router receives an IP datagram on an incoming link, looks
up the destination in its router table to determine the appropriate outgoing
link, and finds that the outgoing link has an MTU that is smaller than
the lengh of the IP datagram. Panic sets in -- how is the router going
to cram this oversized IP packet into the payload field of the data link
packet.
The solution to this problem is to divide (or "fragment") the data in
the IP datagram among two or more smaller IP datagrams, and then send these
smaller datagrams over the outgoing link. Once an IP datagram has
been fragmented, it will not be reassembled until all fragments reach the
destination node.
The identifier, flags, and fragmentation offset
fields of the IP packet are used for fragmentation. Each fragment
carries the same source, destination and identifier field
values as the original datagram. The first bit in the two-bit flag
field is the DF (dont' fragment) bit. If this bit is set, an IP datagram
will not be fragmented. The other flag bit is the MF bit. This
bit is set to 1 in all except the last fragment. Finally, the offset
field gives the starting position in of the data in the data field of the
original original.
Figure 4.4-3 gives an example of how an 8K packet is fragmented into
two 4K packets.
Figure 4.4-3: IP Fragmentation
When the destination receives the fragments it uses the identifier flags
and the fragmentation offset to reconstruct the original datagram. The
payload of the datagram is only passed to the transport layer once the
IP layer has fully reconstructed the original IP datagram. If one or more
of the fragments does not arrive to the destination, datagram is "lost"
and not passed to the transport layer. (But, as we learned in the previous
chapter, if TCP is being used at the transport layer, then TCP will reccover
from this loss by having the source retransmit the data in the original
datagram.)
Following this section we provide a Java applet that generates fragments.
You provide the incoming datagram size, the MTU and the incomimg datagram
identification, it automatically generates the fragments for you.
4.4.2 Getting Started: acquiring an address.
Before looking at the routing algorithms that operate among the routers,
let's start at a host, where everything begins. One question immediately
comes to mind: how does a host get its own IP address? There are
at least three ways this can happen:
-
Dynamic Host Configuration Protocol (DHCP) [RFC
2131]. DHCP is an extension of the BOOTP [RFC
1542] protocol, and is sometimes referred to as Plug and Play. With
DHCP, a DHCP server in a network (e.g., in a LAN) receives DHCP requests
from a client and in the case of dynamic address allocation, allocates
an IP address back to the requesting client.
-
Reverse ARP protocol (RARP) [RFC 903].
The Reverse Address Resolution Protocol (RARP) allows a host to dynamically
find its IP address given they they know their link layer address (e.g.,
48 bit Ethernet address). Since this protocol is so tightly tied
to the data link layer, we defer its discussion until Chapter 5, where
we also discuss the Address Resolution Protocol (ARP).
-
Manual configuration. The IP address is configured into a
host (typically in a file) by a system administrator.
4.4.3 Intradomain Routing in the Internet: RIP, OSPF
and IGRP
Recall from our discussion in section 4.3 that intradomain routing protocols
are used to route within an autonomous system (AS). These intradomain
routing protocols are also sometimes known as interior gateway protocols.
Historically, three routing protocols have been used extensively for routing
with an autonomous system in the Internet: RIP (the Routing Information
Protocol), and OSPF (Open Shortest Path First), and IGRP
(Cisco's propriety Interior Gateway Routing Protocol).
RIP: Routing Information Protocol
The Routing Information Protocol (RIP) was one of the earliest intradomain
Internet routing protocols and is still in widespread use today.
It traces its origins and its name to the Xerox Network Systems (XNS)
architecture. The widespread deployment of RIP was due in great part
to its inclusion in 1982 of the Berkeley Software Distribution (BSD) version
of UNIX supporting TCP/IP. RIP version 1 is defined in RFC
1058, with a backwards compatible version 2 defined in
RFC 1723.
RIP is a distance vector protocol that operates in a manner very close
to the idealized protocol we examined in section 4.2.3. The version
of RIP specified in RFC 1058 uses hop count as a cost metric, i.e., each
link has a cost of 1, and limits the maximum cost of a path to 15.
This limits the use of RIP to networks that are less than 15 hops in diameter.
Recall that in DV protocols, neighboring routers exchange routing information
with each other. In RIP, routing tables are exchanged between neighbors
every 30 seconds using RIP's so-called response message, with each
response message containing that host's routing table entries for
up to 25 destinations. If a router does not hear from its neighbor
at least once every 180 seconds, that neighbor is considered to be no longer
reachable , i.e., either the neighbor has died or the connecting link has
gone down. A router can also request information about its neighbor's
cost to a given destination using RIP's request message. Two
routers communicating RIP messages do so by sending RIP request
or response messages within a UDP packet addressed to port 520.
The UDP packet is carried between routers in a standard IP packet.
The fact that RIP uses a transport layer protocol (UDP) on top of
a network layer protocol (IP) to implement network layer functionality
(a routing algorithm) may seem rather convoluted (it is!). Looking
a little deeper at how RIP is implemented will clear this up.
Figure 4.4-4 sketches how RIP is typically implemented in a UNIX
system, e.g., a host (end system) or a workstation or PC serving as a router.
A process called routed
(pronounced "route dee") executes the RIP protocol, i.e., maintains the
routing table and exchanges messages with routed processes
running in neighboring routers. Because RIP is implemented
as an application-level process (albeit a very special one that is able
to manipulate the routing tables within the UNIX kernel), it can send and
receive messages using the standard socket interface and the a standard
transport protocol. Thus, RIP is an application-layer protocol
(see Chapter 2), running over UDP.
Figure 4.4-4: Implementation of RIP as the routed daemon
Finally, let us take a quick look at a RIP routing table. The
RIP routing table below in Table 4.4-1 is taken from a UNIX host
giroflee.eurecom.fr. If you give a netstat
-rn command on a UNIX system, you can view the routing table
for that host. Performing a netstat on host giroflee.eurecom.fr
yields the following routing table:
Routing Table:
Destination
Gateway Flags
Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
127.0.0.1
127.0.0.1
UH 0 26492 lo0
192.168.2.0
192.168.2.5
U 2 13
fa0
193.55.114.0 193.55.114.6
U 3 58503 le0
192.168.3.0
192.168.3.5
U 2 25
qaa0
224.0.0.0
193.55.114.6 U
3 0 le0
default
193.55.114.129 UG
0 143454
Table 4.4-1 RIP routing table from giroflee.eurecom.fr
The host giroflee is a so-called "multi-homed" host that is connected
to three networks, each via a different interface (with a correspondingly
different data link layer controller). The second, third and fourth entries
in the table tell us that these three networks are attached to giroflee
via giroflee's network interfaces fa0, le0 and qaa0.
These giroflee interfaces have IP addresses 192.168.2.5, 193.55.114.6
and 192.168.3.5 respectively. To transmit a packet to any host
belonging to one of these three networks, giroflee will simply
send the outgoing IP datagram over the appropriate interface. Of
particular interest to us is the default route. Any IP datagram
that is not destined for one of the networks explicitly listed in the routing
table will be forwarded to the router with IP address 193.55.114.129;
this router is reached by sending the datagram over the default network
interface. The first entry in the routing table is the so-called
loopback interface. When IP sends a datagram to the loopback interface,
the packet is simply returned back to IP; this is useful for debugging
purposes. The address 224.0.0.0 is a special multicast (Class D) IP address.
We will examine IP multicast in section 4.5.
IGRP: Internal Gateway Routing Protocol
The Interior Gateway Routing Protocol (IGRP) [Cisco97b]
is a proprietary routing algorithm developed by Cisco Systems, Inc. in
the mid-1980's as a successor for RIP. IGRP is a distance vector
protocol. Several cost metrics (including delay, bandwidth, reliability,
and load) can be used in making routing decisions, with the weight given
to each of the metrics being determined by the network administrator.
This ability to use administrator-defined costs in making route selections
is an important difference from RIP; we will see shortly that so-called
policy-based interdomain Internet routing protocols such as BGP also allow
administratively defined routing decisions to be made. Other important
differences from RIP include the use of a reliable transport protocol to
communicate routing information, the use of update messages that
are sent only when routing table costs change (rather than periodically)
, and the use of a distributed diffusing update routing algorithm
[Garcia-Luna-Aceves 1991] to quickly compute loop free routing paths.
OSPF: open shortest path first
Like RIP, the Open Shortest Path First (OSPF) routing is used for intradomain
routing. The "Open" in OSPF indicates that the routing protocol specification
is publicly available (e.g., as opposed to Cisco's IGRP protocol).
The most recent version of OSPF, version 2, is defined in RFC
2178 - a public document.
OSPF was conceived at the successor to RIP and as such has a
number of advanced features. At its heart, however, OSPF is a link-state
protocol that uses flooding of link state information and a Dijkstra least
cost path algorithm that allows each router to compute its least cost path.
Individual link costs are configured by the network administrator.
Some of the advances embodied in OSPF include the following:
-
Security. All exchanges between OSPF routers (e.g., link state
updates) are authenticated. This means that only trusted routers can participate
in the OSPF protocol within a domain, thus preventing malicious intruders
(or networking students taking their newfound knowledge out for a joyride)
from injecting incorrect information into router tables.
-
Multiple same-cost paths. When multiple paths to a destination
have the same cost, OSPF allows multiple paths to be used (i.e., a single
path need not be not chosen for carrying all traffic when multiple equal
cost paths exist).
-
Different cost metrics for different TOS traffic. OSPF allows
each link to have different costs for different TOS (type of service) IP
packets. For example, a high bandwidth satellite link might be configured
to have a low cost (and hence be attractive) for non-time critical traffic,
but a very high cost metric for delay-sensitive traffic. In essence, OSPF
sees different network topologies for different classes of traffic, and
hence can compute different routes for each type of traffic.
-
Integrated support for unicast and multicast routing.
Multicast OSPF (RFC 1584) provides simple
extensions to OSPF to provide for multicast routing (a topic we cover in
more depth in Section 4.5). MOSPF uses the existing OSPF link database
and adds a new type of link state advertisement to the existing OSPF
link state broadcast mechanism.
Support for hierarchy within a single routing domain. Perhaps
the most significant advance in OSPF is the ability to hierarchically structure
a routing domain. Section 4.3 has already looked at the many advantages
of hierarchical routing structures. We cover the implementation of
OSPF hierarchical routing in the remainder of this section.
As OSPF routing domain (an "autonomous system," or AS, in OSPF parlance)
can be configured into "areas." Each area runs its own OSPF link
state routing algorithm, with each router in an area broadcasting its link
state to all other routers in that area. The internal details of
an area thus remain invisible to all routers outside the area. Intra-area
routing involves only those routers within the same area.
Within each area, one of more area border routers are responsible
for routing packets outside the area. Exactly one OSPF area in the
AS is configured to be the backbone area. The primary role
of the backbone area is to route traffic between the other areas in the
AS. The backbone always contains all area border routers in the AS
and may contain non border routers as well. Inter-area routing within
the AS requires that the packet be first routed to an area border router
(ntradomain routing), then routed though the backbone to the area border
router that is in the destination area, and then routed to the final destination.
Figure 4.4-5: hierarchically structured OSPF AS with four areas
A diagram of a hierarchically structured OSPF network is shown in Figure
4.4-5 . We can identify four types of OSPF routers in Figure 4.4-5:
-
internal routers. These routers, shown in black, are in a
non-backbone areas and only perform intra-AS routing.
-
area border routers. These routers, shown in blue, belong
to both an area and the backbone.
-
backbone routers (non border routers). These routers, shown
in gray, perform routing within the backbone but themselves are not area
border routers. Within a non-backbone area, internal routers learn
of the existence of routes to other areas from information (essentially
a link state advertisement, but advertising the cost of a route to another
area, rather than a link cost) broadcast within the area by its backbone
routers.
-
boundary routers. A boundary router, shown in blue, exchanges
routing information with routers belonging to other routing domains. This
router might, for example, use BGP to perform interdomain routing.
It is through such a boundary router that other routers learn about paths
to external networks.
4.4.4 Interdomain Routing in the Internet: BGP
The Border Gateway Protocol version 4, specified in RFC
1771 (see also RFC 1772, RFC
1773), is the de facto standard interdomain routing protocol in today's
Internet. As an interdomain routing protocol, it provides for routing
between administrative domains (a.k.a. autonomous systems).
While BGP has the same general flavor as the distance vector protocol
that we studied in section 4.2, it is more appropriately characterized
as a path vector protocol. This is because rather than propagating
cost information to a destination, BGP propagates path attributes.
We will examine these path attributes in detail shortly. We note
though that while these attributes include information such as the
names of the administrative domains on a route to the destination, they
do not contain cost information. Nor does BGP specify
how a route to a particular destination should be chosen given the attributes
of the various possible paths. That decision is a policy decision
that is left up to the domain's administrator. Each domain can thus
choose it routes according to whatever criteria it chooses (and need
not even inform of neighbors of its policy!) -- allowing a significant
degree of autonomy in route selection. In essence, BGP provides the
mechanisms to distribute path information among the interconnected
domains, but leaves the policy for making the actual route selections up
to the network administrator.
As in the generic distance vector protocol, BGP gateways typically exchange
routing information with their immediate neighbors, which are known as
peers in BGP. There are only four message types in the BGP protocol: OPEN,
UPDATE, NOTIFICATION and KEEPALIVE.
-
OPEN. BGP peers communicate using the TCP protocol and port number
179. TCP thus provides for reliable and congestion controlled message
exchange between peers. In contrast, recall that we earlier saw that
two RIP peers, e.g., the routed's in Figure 4.4-4 communicate via
unreliable UDP. When a BGP gateway wants to first establish contact
with a BGP peer (e.g., after the gateway itself or a connecting link has
just be booted), an OPEN message is sent to the peer. The OPEN message
allows a BGP gateway to identify and authenticate itself, and provide timer
information. If the OPEN is acceptable to the peer, it will send back a
KEEPALIVE message.
-
UPDATE. This BGP message is used to advertise a path to a
peer. We will consider this message in detail below.
-
KEEPALIVE. This BGP message is used to let a peer know
that the sender is alive but that the sender doesn't have other information
to send. It also serves as an acknowledgment to a received OPEN message.
-
NOTIFICATION. This BGP message is used to inform a peer that
an error has been detected (e.g., in a previously transmitted BGP message)
or that the sender is about to close the BGP session.
A BGP gateway uses the UPDATE message to advertise a route to a given destination
to the BGP peer. The UPDATE message can also be used to withdraw
routes that had previously been advertised (that is, to tell a peer that
a route that it had previously advertised is no longer a valid route).
An UPDATE message advertising a path contains the identity of one
or more destinations all of which share the specified attributes
of this path to that destination. Each destination is typically
the prefix part of a full IP address, indicating, for example a network
or set of networks that have this common prefix. There are seven attributes,
each of which has one of four possible types.
-
Well-known, mandatory. A mandatory attribute must be present in
every path description and every BGP router must be able to understand
the semantics of these attributes (i.e., process them and take any necessary
actions). The three mandatory attributes are ORIGIN, AS-PATH, and NEXT-HOP.
The ORIGIN attribute specifies whether the origin of the path information
came from within the destination autonomous system, or came from a different
interdomain protocol (e.g., EGP [RFC 904]).
The AS-PATH attribute contains a list of the autonomous systems (either
as a set or as a sequence) that are traversed on the path to this destination.
We note that because the AS-PATH attribute will always carry the identities
of the AS's to be traversed, it is easy to detect loops in routing (and
also avoid the count-to-infinity problem).
The NEXT-HOP attribute specifies the IP address of the next hop that
the remote peer should use as the first hop to the destination. Interestingly,
this need not be the IP address of the BGP peer that is sending the UPDATE
message (that is, unlike the distance vector protocol, the route being
advertised by a peer need not contain the peer as the first hop on the
path to the destination). This allows the possibility for route
servers [XXXX] to actually compute and advertise routes, freeing the
gateways from having to perform this activity.
-
Well-known, discretionary attributes. These attributes are
not required in a path description, but if they are present the BGP router
must be able to process them. The LOCAL_PREF attribute allows a gateway
to inform other BGP gateways within its autonomous system of its own "degree
of preference" value for this path. This value has meaning only within
the AS. The ATOMIC_AGGREGATE is used by BGP as part of the route
aggregation process. BGP provides an extensive mechanism for aggregating
routes to destinations; see RFC 1771
for details. Aggregation can be used to significantly reduce the
amount of routing information that needs to be exchanged.
-
Optional, transitive. These attributes are not required nor
is a BGP router required to process them. However, a BGP router
is required to include these path attributes whenever it sends information
about this path to its neighbors.
-
Optional non-transitive. These attributes are not required nor is
a BGP router required to process them. A BGP router is required
to remove these path attributes from any information about this path that
it sends to its neighbors.
Table 4.4-2 summarizes the well-known path attributes of BGP paths.
There is only one optional transitive attribute and one optional non-transitive
attribute specified in BGP 4. Since not even a BGP router need not
understand these optional attributes, we won't cover them here and refer
the interested reader to RFC 1771 for
details.
| Attribute name |
Attribute type |
Function |
| ORIGIN |
well-known, mandatory |
origin of route |
| AS-PATH |
well-known, mandatory |
list of AS's traversed by lath |
| NEXT HOP |
well-known, mandatory |
next hop on path to destination(s) |
| LOCAL_PREF |
well-known, discretionary |
preference value for path |
| ATOMIC AGGREGATE |
well-known, discretionary |
route can not be de-aggregated |
Table 4.4-2: Well-known BGP path attributes
Recall from our discussion above that BGP provides mechanisms for distributing
path information but does not mandate policies for selecting a route from
those available. Within this framework, it is thus possible for an
AS such as Hatfield.net to implement a policy such as "traffic from
my AS should not cross the AS McCoy.net," since it knows the identities
of all AS's on the path. (The Hatfield and the McCoy's are two famous feuding
families in the US). But what about a policy that would prevent
the McCoy's from sending traffic through the Hatfield's network?
The only means for an AS to control the traffic passes though its network
(known as "transit" traffic - traffic that neither originates in, nor is
destined for, the network, but instead is "just passing through") is by
controlling the paths that it advertises. For example, if the McCoy's
are immediate neighbors of the Hatfields, the Hatfields could simply not
advertise any routes to the McCoy's that contain the Hatfield network.
But restricting transit traffic by controlling an AS's route advertisement
can only be partially effective. For example, if the Jones are between
the Hatfield's and the McCoy's, and the Hatfield's advertise routes to
the Jones' that pass through the Hatfields, then the Hatfields can not
prevent (using BGP mechanisms) the Jones from advertising these routes
to the McCoys.
As noted above, BGP, which is the successor to EGP,
is becoming the de facto standard for interdomain routing for the public
Internet. BGP is used for example at the major network access points
(NAP's) where major Internet carries connect to each other and exchange
traffic. To see the contents of today's (less than four hours out of date)
BGP routing table (large!) at one of the major NAP's in the US (which include
Chicago and San
Francisco ), click here.
4.4.5 Why are there different Interdomain and Intradomain routing protocols?
Having now studied the details of specific interdomain and intradomain
routing protocols deployed in today's Internet, let us conclude by considering
perhaps the most fundamental question we could ask about these protocols
in the first place (hopefully, you have been wondering this all along,
and have not lost the forest for the trees!):
Why are different interdomain and intradomain routing protocols
used?
The answer to this question gets at the heart of the differences between
the goals of routing within a domain and between domains:
-
Policy. Between domains, policy issues dominate. It
may well be important that traffic originating in a given domain specifically
not be able to pass through another specific domain. Similarly, a
given domain may well want to control what transit traffic it carries between
other domains. We have seen that BGP specifically carries path attributes
and provide for controlled distribution of routing information so that
such policy-based routing decisions can be made. Within a domain,
everything is nominally under the same administrative control, and
thus policy issues play a much less important role in choosing routes within
the domain.
-
Scale. The ability of a routing algorithm and its data structures
to scale to handle routing to/among large numbers of networks is
a critical issue in interdomain routing. Route aggregation in BGP
is one example of how the issues of scale can be addressed. Within a
domain, scalability is less of a concern. For one thing, if a single administrative
domain become too large, it is always possible to divide it into two domains
and perform interdomain routing between the two new domains. (Indeed, OSPF
allows such a hierarchy to be built).
-
Performance. Because interdomain routing is so policy-oriented,
the quality (e.g., performance) of the routes used is often of secondary
concern (i.e., a longer or more costly route that satisfies a certain policy
criteria may well be taken over a route that is shorter but does not meet
that criteria). Indeed, we saw that among domains, there is not even
the notion of preference or costs associated with routes. Within a single
routing domain, however, such policy concerns can be ignored, allowing
routing to focus more on the level of performance realized on a route.
4.4.6. ICMP: the Internet Message Control Protocol
The Internet Message Control Protocol, ICMP, is used by hosts, routers,
and gateways to communicate network layer information to each other.
ICMP is specified in RFC 792. The most typical
use of ICMP is for error reporting. For example, when running a telnet,
ftp, or http session, you may have encountered an error message such as
"Destination network unreachable." This message had its origins in
ICMP. At some point, an IP router was unable to find a path (even
a default path) to the host specified in your telnet, ftp or http application.
That router created and sent a type-3 ICMP message to your host indicating
the error. Your host received the ICMP message and returned the error
code to the TCP code that was attempting to connect to the remote host.
TCP in turn returned the error code to your application.
ICMP is often considered part of IP, but architecturally lies just above
IP, as ICMP messages are carried inside IP packets. That is, ICMP
messages are carried as IP payload, just as TCP or UDP packets are carried
at IP payload. Similarly, when an host receives an IP packet with
ICMP specified as the upper layer protocol, it demultiplexes the packet
to ICMP, just as it would demultiplex a packet to TCP or UDP.
ICMP messages have a type and a code field, and also contain the first
8 bytes if an IP packet that caused the IP message to be generated in the
first place (so that the sender can determine which packet is sent that
caused the error). Selected ICMP messages are shown below in Table
4.4-3. Note that ICMP messages are used not only for signaling error
conditions. The well-known ping [ping man
page] program uses ICMP. ping sends an ICMP type 8 code 0 message
to the specified host. The destination host, seeing the echo request
sends back an type 0 code 0 ICMP echo reply. Another interesting
ICMP message is the source quench message. This message is seldom
used in practice. Its original purpose was to perform congestion
control -- to allow a congested router to send an ICMP source quench message
to a host to force that host to reduce its transmission rate. We
have seen in Chapter 3 that TCP has its own congestion control mechanism
that operates at the transport layer, without the use of network layer
support such as the ICMP source quench message.
| ICMP type |
code |
description |
| 0 |
0 |
echo reply (to ping) |
| 3 |
0 |
destination network unreachable |
| 3 |
1 |
destination host unreachable |
| 3 |
2 |
destination protocol unreachable |
| 3 |
3 |
destination port unreachable |
| 3 |
6 |
destination network unknown |
| 3 |
7 |
destination host unknown |
| 4 |
0 |
source quench (congestion control) |
| 8 |
0 |
echo request |
| 9 |
0 |
router advertisement |
| 10 |
0 |
router discovery |
| 11 |
0 |
TTL expired |
| 12 |
0 |
IP header bad |
Table 4.4-3: Selected ICMP messages
References
[Arin 1996] ARIN, "IP allocation report"
ftp://rs.arin.net/netinfo/ip_network_allocations
[Braden 1989] R. Braden, D. Borman, and
C. Partridge , "Computing the Internet Checksum," ACM Computer Communication
Review, Volume 19, Number 2 (April 1989)
[Bradner 1996] S. Bradner, A. Mankin, IPng:
Internet Protocol Next Generation, Adddison Wesley, 1996.
[Cisco 97] Cisco
- Advanced QoS Services for the Intelligent Internet.
[Cisco97b] Interior
Gateway Routing Protocol and Enhanced IGRP
[RFC 791] J. Postel, "Internet Protocol: DARPA
Internet Program Protocol Specification," RFC
791, Sept 1981.
[RFC 792] J. Postel, "Internet Control Message
Protocol, " RFC
792, Sep-01-1981.
[RFC 903] R. Finlayson, Mann, J. Mogul, Theimer,
"A Reverse Address Resolution Protocol," RFC
903, June 1884.
[RFC 904] D. Mills, "Exterior Gateway Protocol
Formal Specification," RFC
791, April 1984.
[RFC 950] J. Mogul, J. Postel
, "Internet Standard Subnetting Procedure," RFC 950, August 1985.
[RFC 1058] C.L. Hendrick, "Routing Information
Protocol," RFC
1058, June 1988.
[RFC 1256] S. Deering, "ICMP Router Discovery
Messages," RFC
1256, Sept. 1991.
[RFC 1542] W. Wimer, "Clarifications and Extensions
for the Bootstrap Protocol," RFC
1532, October 1993.
[RFC 1584] J. Moy, "Multicast Extensions to
OSPF," RFC
1584, March 1994.
[RFC 1700] J. Reynolds, J. Postel, "Assigned
Numbers" RFC
1700, Oct. 1994.
[RFC 1723] G. Malkin, RIP Version 2 - Carrying
Additional Information. RFC
1723, November 1994.
[RFC 1771] Y. Rekhter and T. Li, "A Border
Gateway Protocol 4 (BGP-4)," RFC
1771, March 1995.
[RFC 1772] Y. Rekhter and P. Gross, "Application
of the Border Gateway Protocol in the Internet," RFC
1772, March 1995.
[RFC 1773] P. Traina, "Experience with the BGP-4
protocol," RFC
1773, March 1995
[RFC 2002] C. Perkins, "IP Mobility Support,"
RFC
2002, 1996.
[RFC 2050] K. Hubbard, M. Kosters, D. Conrad,
D. Karrenberg, J. Postel, "Internet Registry IP Allocation Guidelines",
RFC
2050, Nov. 1996.
[RFC 2131] R. Droms, "Dynamic Host Configuration
Protocol," RFC
2131, March 1997.
[RFC 2178] J. Moy, "Open Shortest Path First
Version 2", RFC
2178, July 1997.
Copyright Keith W. Ross and Jim Kurose, 1998.
All rights reserved.