Recent disruptions to two undersea internet cables in the Baltic Sea have yet again...
The BGP Multi Exit Discriminator (MED) attribute and tie-breaking
As an “optional non-transitive” attribute, the MED is only present if a router in the local AS or the neighboring AS sets it. So if AS 10 adds the MED attribute to a prefix and then sends an update with that prefix to AS 20 and AS 20 sends it to AS 30, AS 20 will see the MED value that AS 10 inserted, but AS 30 won’t.
The reason for the MED’s limited propagation is that its original purpose is only to allow directing traffic over the desired link if there are multiple links between two ASes. So normally, the MED is only considered when two or more routes are received from the same neighboring AS. If routes are received from different neighbor ASes, the MED is not compared and BGP looks at the tie breaking rules instead. However, routers can be configured with bgp always-compare-med and then, well, the MED is always compared if multiple routes to a destination have the same local preference and AS path length.
Also, in RFC 1771, the original BGP version 4 RFC, it is left unspecified how a route with no MED attribute compares to a route with the MED attribute present. As such, some routers would treat a route with a missing MED as having the worst possible MED. But when the BGP-4 protocol specification was revisited in RFC 4271, the IETF decided that a missing MED should be treated as an MED of 0. However, there may still be routers out there that exhibit the old behavior, and there are certainly routers that have been configured with bgp bestpath med missing-as-worst.
Now consider the figure, where AS 30 connects to ASes 10 and 20 in San Francisco and New York City. (AS 30 has two routers in both locations so the Denver router sees two paths through each location.) In AS 30’s Denver router, we look at the paths towards two ASes that are reachable through both AS 10 and AS 20: AS 40, which announces the prefix 40.0.0.0/8 and AS 50, which announces the prefix 50.0.0.0/8. AS 10 doesn’t set any MEDs while AS 20 sets an MED of 2 on the routes that it advertises.
AS 30 has limited capacity between Denver and San Francisco, but ample capacity between New York and Denver. As a result, AS 30 increases the MED by 5 for routes learned in San Francisco, as shown in the BGP table for the AS 30 Denver router. Which of the routes BGP prefers now depends on the always-compare-med and med missing-as-worst settings.
When both always-compare-med and med missing-as-worst are in effect, then the best route is the one through AS 20 by way of New York City, which has a metric of 2. The routes through San Francisco have an MED of 5 and 2, respectively, while the AS 10 route through NYC has an infinite MED.
When always-compare-med is in effect but not med missing-as-worst (and assuming a router that implements the RFC 4271 missing MED behavior) then the AS 10 route through New York City goes from worst to best, its MED effectively being 0 and thus beating the 2, 5 and 7 ones.
When always-compare-med is not specified, then whether the AS 10 route through New York City is considered better than the one through San Francisco again depends on med missing-as-worst, while the AS 20 route through NYC is better (MED = 2) than the AS 20 route through San Francisco (MED = 7). However, which is better, the winning AS 10 route or the AS 20 route through NYC? For that decision, the MED doesn’t come into play. The remaining BGP route selection rules are (from RFC 4271):
- If at least one route was received over eBGP, only consider eBGP-learned routes.
- Select the routes with the lowest interior cost.
- Select the routes with the lowest BGP identifier.
- Select the route learned from the lowest BGP neighbor address.
Rules d) and e) make sure that packets are handed off to an external AS as quickly as possible; “interior cost” means the cost or metric of the path towards the route’s next hop address as per the interior routing protocol. For instance, suppose the OSPF cost from AS 30’s router in Denver to San Francisco is 200 and from Denver to NYC 100.
If med missing-as-worst is in effect, then when the route selection algorithm arrives at rule e), the remaining two routes towards 40.0.0.0/8 in contention will be:
Network | Next Hop | Metric | LocPrf | Weight | Path |
40.0.0.0/8 | 10.0.2.1 | 5 | 100 | 0 | 10 40 i (San Francisco) |
20.0.1.1 | 2 | 100 | 0 | 20 40 i (New York) |
At this point, the AS 20 route through NYC wins because it has an interior cost of 100 while the AS 10 route through San Francisco has an interior cost of 200. When med missing-as-worst is not in effect, the choice is between:
Network | Next Hop | Metric | LocPrf | Weight | Path |
40.0.0.0/8 | 10.0.1.1 | 5 | 100 | 0 | 10 40 i (New York) |
20.0.1.1 | 2 | 100 | 0 | 20 40 i (New York) |
At this point rule e) sees identical interior costs (both 100) so the algorithm progresses to the final tie breaking rules. The BGP identifier is one of the IPv4 addresses of a BGP router, which is used to determine if two BGP sessions are two sessions towards the same router or two sessions towards two different routers. Obviously there is no value in selecting the route through a router with a low BGP identifier—rules f) and g) just make sure that the algorithm always finishes with just a single “best” route. But because some networks have lower IP address ranges that they use for their BGP routers than others, when all else is equal certain networks tend to get preferred over others.
In any event, BGP’s best route towards 40.0.0.0/8 goes through New York City. This is what we want, because AS 30’s link from Denver to NYC is much bigger than its link to San Francisco, and the paths from Denver to Chicago through NYC and San Francisco are similar in length.
However, in the case of prefix 50.0.0.0/8, this is suboptimal, because AS 50 is in San Francisco, so the path directly to San Francisco is much shorter than the one through NYC. A solution to this could be for AS 30 to determine that 50.0.0.0/8 is a route that is preferentially reached through San Francisco (perhaps because AS 10 and/or AS 20 tag it with a community value that indicates they learned the prefix in California). If AS 30 then removes the MED +5 penalty for 50.0.0.0/8, the MED for both the routes through AS 20 are the same, so the algorithm reaches rule e) with the following routes remaining (assuming no med missing-as-worst):
Network | Next Hop | Metric | LocPrf | Weight | Path |
50.0.0.0/8 | 10.0.1.1 | 100 | 0 | 10 40 i (New York) | |
20.0.1.1 | 2 | 100 | 0 | 20 40 i (New York) | |
20.0.1.1 | 2 | 100 | 0 | 20 40 i (San Francisco) |
In this case, the router in Denver will still select a route through NYC, because those share the lowest interior cost. (Which one depends on the BGP identifier.) However, in Las Vegas, the interior cost towards San Francisco is 100 while the interior cost towards NYC is 200 (100 from Las Vegas to Denver + 100 from Denver to NYC). So at least part of the network will send its traffic over the shorter path.
As a result, it’s best to have consistent MEDs on routes learned from the same neighboring AS in different locations. That way, the risk that traffic takes unnecessary detours is significantly reduced. Also, comparing the MEDs sent by different ASes is problematic because the MED values will not have a consistent meaning: perhaps one AS uses an MED of 10 for “good” and 20 for “bad” while another uses 100 for “good” and 200 for “bad”. However, MEDs can be useful in certain circumstances. For instance, when connecting to multiple internet exchanges or having private peering as well as internet exchange peering in the same city. In that case, it’s useful to increase the MED for the connections that are less preferred.
Boost BGP Performance
Automate BGP Routing optimization with Noction IRP
SUBSCRIBE TO NEWSLETTER
You May Also Like
From Idle to Established: BGP states, BGP ports and TCP interactions
Understanding BGP states is essential to grasp how BGP operates. Similar to interior gateway protocols (IGPs) like...
ACK and NACK in Networking
In networking, communication between devices relies on the efficient exchange of data packets. Among the essential...
BGP and asymmetric routing
What is asymmetric routing? Asymmetric routing is a network communication scenario where the forward and reverse paths...