The lecture titled “Interdomain Internet Routing” [1] is aimed at explaining the routing between different administrative domains on the Internet, the process in which internet service providers (ISPs) exchange routing information including packets between each other and also the important features of the Border Gateway Protocol Version 4 (BGP4) [2][3] which is the widely used interdomain routing protocol in the Internet today. The lecture also discusses how the ISPs buy and sell Internet services to each other and how the demands of its customers affect the future of the Internet.
In a misleading abstraction, the internet is viewed as a graph with nodes/links with large amounts of redundancy. One can assume that the designer has routing algorithms deployed in such a way that it would detect faults and problems with the network and the routes with load-balancing capabilities to dynamically divert routes to less loaded paths.
But in reality, the internet is composed of interconnected ISPs competing to provide internet service to the customer. They must also cooperate with each other to achieve global connectivity.
ISPs are classified into three types: Tier 3 – localized end customers, Tier 2 – regional end customers, and Tier 1 – global end customers. The routers connected at the boundaries between ISPs communicate using a wide-area routing protocol. A more precise wide-area routing architecture is the interconnection of autonomous systems(ASes) that exchange reachability information using a wide-area protocol or Exterior Gateway Protocols(EGPs). The current wide-area protocol is the BGP4 [2][3]. On the other hand, the routing protocols that operate inside the ISPs are called Interior Gateway Protocols (IGPs), examples of which are Routing Information Protocol (RIP)[4], Open Shortest Path First (OSPF)[5], Intermediate System–Intermediate System (IS_IS)[6] and EIGRP. EGPs provide reachability information and implement routing policies in a scalable manner. However, IGPs are concerned with the optimization of a route.
A typical AS set up would be from small ASes to Regional ISPs and regional ISPs to larger ISPs with a backbone[7] network. From these, two types of interconnection exist between ASes[1]:
1.Provider
- customer Transit (“Transit”)
- provides access to all (or most) destinations in its routing tables.
- charges its customers for internet access: customer to destination and destination to customer(most of the time).
2.Peering
- two ISPs agreed with mutual access to a subset of each others’ routing table.
- subset of interest are transit customers and ISP’s own internal addresses
- business deal that may not involve financial settlement.
- common to competitors
- benefits: better end-to-end performance because of the direct link between ISPs and achievement of a routing information in a free manner.
When exporting routing tables to other networks, an ISP must be careful in advertising the routing information that it would send to other ASes for any destination because the ISP does not want it to act as transit for packets with no financial returns. Export policies are implemented by ISPs such that selective transits can be achieved by them. Selective transits are categorized as[1]: full transit capabilities for their own transit customers in both directions, some transit (between mutual customers) in a peering relationship, and transit only for one’s customers (and ISP-internal addresses) to one’s providers. Listed below are the export routing policies done by the ISPs[1]:
a. Export policies for transit customer routes(customer-to-ISP route advertisement):
- All potential senders can reach customer routes.
- Advertise routes to its transit customers and to as many other connected ASes as possible.
b. Export policies for transit provider routes(provider-to-ISP route advertisement):
- Transit to the routes exported by its provider to the ISP is not provided by the l latter because it isn’t making any money on providing such facilities.
- Do not advertise routes to its providers to any other connected ASes.
c. Export policies for peer routes:
- Export only selected routes from its routing tables to other peering ISPs.
- Export all routes to all transit customers.
- Export routes to internal addresses.
- Do not export routes from transit providers to other peering ISPs because this may cause a peering ISP to use the advertising ISP to reach a destination advertised by a transit provider.
Import Route policies are implemented by prioritizing routes to the following priority order: custmer then peer then provider.
Design motivation for the BGP4[1] are as follows:
1. Scalability: To remain scalable as the number of connected networks increased.
2. Policy: Implementation and enforcement of various routing policies.
- Implementation difficulties: attribute structure for route announcements and route filtering technique.
3.Cooperation under competitive circumstances: Structure to make local decision on how to route packets, from among any set of choices.
The Finite State Machine of BGP-4[8]:
Idle State:
– Initializes resources for the BGP process.
– Tries to establish a TCP connection with its configured BGP peer.
– Listens for a TCP connection from its peer.
– Changes its state to Connect.
– If an error occurs at any state of the FSM process, the BGP session is terminated immediately, and returned to the Idle State. Some of the reasons why a router does not progress from the Idle state are:
a. TCP port 179 is not open.
b. A random TCP port over 1023 is not open.
c. Peer address configured incorrectly on either router.
d. AS number configured incorrectly on either router .
Connect State:
– Wait for successful TCP negotiation with peer.
– BGP does not spend much time in this state if the TCP session has been successfully established.
– Sends OPEN message to peer and changes state to OpenSent.
– If an error occurs, BGP moves to the ACTIVE state. Some reasons for the error are:
a. TCP port 179 is not open.
b. A random TCP port over 1023 is not open.
c. Peer address configured incorrectly on either router.
d. AS number configured incorrectly on either router.
Active State:
– If the router was unable to establish a successful TCP session, then it ends up in the ACTIVE state.
– The router will try to restart another TCP session with the peer and if successful, then it will send an OPEN message to the peer.
– If it is unsuccessful again, the FSM is reset to the IDLE state.
– Repeated failures may result in a router cycling between the IDLE and the ACTIVE states. Some of the reasons for this include:
a. TCP port 179 is not open.
b. A random TCP port over 1023 is not open.
c. BGP configuration error.
d. Network congestion.
e. Flapping network interface.
OpenSent State:
– The router listens for an OPEN message from its peer.
– Once the message has been received, the router checks the validity of the OPEN message.
– If there is an error it is because one of the fields in the OPEN message don’t match between the peers, e.g. BGP version mismatch, MD5 password mismatch, the peering router expects a different My AS. The router will then send a NOTIFICATION message to the peer indicating why the error occurred.
– If there is no error, a KEEPALIVE message is sent, various timers are set and the state is changed to OpenConfirm.
OpenConfirm State:
– The peer is listening for a KEEPALIVE message from its peer.
– If a KEEPALIVE message is received and no timer has expired before reception of the KEEPALIVE, BGP transitions to the Established state.
– If a timer expires before a KEEPALIVE message is received, or if an error condition occurs, the router transitions back to the IDLE state.
Established State:
– In this state, the peers send UPDATE messages to exchange information about each route being advertised to the BGP peer.
– If there is any error in the UPDATE message then a NOTIFICATION message is sent to the peer, and BGP transitions back to the IDLE state
BGP sessions are divided into two types, one is between BGP-speaking border routers of different ASes or simply “eBGP sessions” and the other is sessions between BGP- speaking internal routers in the same AS or simply “iBGP sessions.” The standard mode or eBGP is where routing policies are implemented and it is the session where different ASes exchange routing information depending on the policies that it was administered: most of the time a subset of the routing table is exchanged with other ASes.
There are two main goals of deploying the BGP as an EG: one is to suppress routing information looping or loop-free forwarding and the other which
allows each AS as a monolithic entity . Internal dissemination of externally learned routes to routers within the AS is called “iBGP sessions.” Full-mesh iBGP sessions require all internal BGP routers within an autonomous system have a iBGP session with each other. The eBGP routes updates from eBGP routers are then passed through the iBGP neighbors. iBGP is not an IGP; it operates on top of the TCP so that internal BGP routers can exchange information using BGP. IGPs are not used to send eBGP updates because IGP has a poorer scalability than BGP. IGPs also don’t implement the same kind of attributes present in BGP. Information that would be routed may not be preserved using IGP: it is better for BGP sessions to run inside the AS so that information would not be lost.
A problem with full-mesh configuration is that it can only support quadratic scalability of the AS. This is acceptable in a small AS, but for large ASes with ten of thousands of nodes that would require full-mesh iBGP session, this would pose a problem. Two solutions were formulated to solve the scalability problem of the full-mesh configuration. The first method was the utilization of route reflectors[9] and the second was the setup confederations[10] of BGP routers.
The route reflector replaced the full-mesh iBGP with a hub and spoke configuration. The best route to a destination prefix is determined the route reflector. The route reflector then sends the information to all of its client BGP routers. Listed below are the rules that followed in a route reflector scheme[1]:
1. If a route reflector learns a route via eBGP or via iBGP from one of its clients, then it re-advertises that route over all of its sessions to its clients.
2. If a route reflector learns a route via iBGP from a router that is not one of its clients, it re-advertises the route to its client routers, but not over any other iBGP sessions.
Implementation of a route reflection having only one root will have scalability issues, too. Thus, the network deploys multiple router reflectors in a hierarchical order .BGP confederations on the other hand use the divide-and-conquer technique wherein the AS is partitioned into several ASes. Full-mesh iBGP is then implemented on the sub-ASes.
IBGP topologies must be implemented carefully to achieve loop-free forwarding and complete visibility. The propagation of routing updates on BGP sessions
depends upon what type of BGP session they are in. eBGP sessions are point-to-point: thus, the traversal of the BGP router update is guaranteed because it would
only be sent to the other end of the point-to-point connection. Regardless of whether the update is learned from either a iBGP session or a eBGP sessions, a router would
advertise it over an eBGP session.
IBGP sessions does not require both routers to be directly connected: the other router maybe positioned somewhere within the AS. The iBGP relies on the
IGP for connectivity between the two endpoints of the BGP session and establish the route to the next-hop IP address named in the attribute. Right configuration is the key to achieve loop-free forwarding and a more complete visibility with regards to implementation of the iBGP topology to be used.
There are three important BGP attributes: LOCALPREF(local preference), ASPATH (AS path vector), and MED (Multiple-EXit Discriminator). They are the ones mostly referenced when routing decisions are made using BGP sesions. A set of rules to evaluate the attributes are used by the protocol to choose a route from multiple choices. Also take note, that these rules are vendor-specific, meaning priority of attributes are different from one vendor to another.
A customer can exchange routes and packets over multiple ASes usign multi-homing[11]. In BGP, the multi-homing variant used is a multiple-linked, singe IP
address(space). When used in BGP sessions, the customer announces its IP address to upstream links. When one of the links fails, the BGP would detect the failure
and in return would prevent sending information to the failing link. One of the dangers of employing it in As to As connection is that it would be a source of single point of failure because it would serve as an edge router to two ASes that are not linked together.
BGP has a slow fault detection and recovery mechanisms. Withdrawal messages are also sent to neighboring routers when a fault is detected. Damping such route updates is done by the protocol to avoid route oscillations. Fault detection may take several minutes as well as convergence of the between two multi-linked ASes.Convergence also depends on what topology the eBGP sessions are linked.
Critique:
The lecture was very informative on the different aspects surrounding wide area networking. It thoroughly discussed BGP4 which is currently the most used protocol for AS-to-AS communication. It also discussed the protocols algorithm, its attributes and the rules that it follow to select a path from multiple links. It conception may have also been one of the reasons why the internet proliferated in the 1990s. From what I have read, research and development of the protocol was started in 1989 and BGP4 was deployed in 1993 in AS to AS communication and maybe one of the key reasons why the internet proliferated in the mid-90s. Although it is widely used, a downside of the protocol is that failure detection and recovery is slow. HOwever, its RFC document still indicates that it is still a work in progress. I assume that the improvement of failure and recovery mechanisms of the protocol is at the top of the list priorities in the ongoing development of the BGP4.
[1] H. Balakrishnan, N.Feamster, “Lecture 4: Interdomain Internet Routing”, 2001 to 2005
[2] Y. Rekhter, T.Li. “A Border Gateway Protocol 4 (BGP-4).” Internet Engineering Task Force,
March 1995. RFC 1771.
[3] Y. Rekhter, T.Li. “A Border Gateway Protocol 4 (BGP-4).” Internet Engineering Task Force, October 2004.
http://www.ietf.org/internet-drafts/draft-ietf-idr-bgp4-26.txt
Work in progress, expired April 2005.
[4] C. Hendrick. “Routing Protocol Information”. Internet Engineering Task Force, June 1988 RFC 1058.
[5] J. Moy. “OSPF Version 2”, March 1994. RFC 1583
[6] D. Oran. “OSI IS-IS Intra-Domain Routing Protocol”, Internet Engineering Task Force, Feb. 1990 RFC 1142.
[7] http://en.wikipedia.org/wiki/Internet_backbone
[8] http://en.wikipedia.org/wiki/Border_Gateway_Protocol
[9] T.Bates, R.Chandra, and E.Chen. “BGP Route Reflection – A Alternative to Full Mesh IBGP”.
Internet Engineering Task Force, April 2000. RFC 2796
[10] P. Traina, D. McPherson, J.Scudder. “Autonomous System Confederations for BGP”. Internet Engineering Task Force, Feb. 2001 RFC 3065.
[11] http://en.wikipedia.org/wiki/Multi-homed