Propagation of updates
BGP is a real-time system, so when a BGP session is established or a new prefix is injected into BGP, the new information typically propagates very quickly. You can monitor this using one of the many “looking glasses” that are available around the globe. (Look for “BGP looking glass” in a search engine.) However, sometimes it takes a little longer of prefixes to propagate to all corners of the internet. The main reason for this is the minimum route advertisement interval (MRAI). RFC 4271 specifies that a certain minimum amount of time should elapse between eBGP updates for “routes to a particular destination”. The suggested value for the MRAI is 30 seconds.
The idea is that if there’s an update for the same prefix every five seconds, the first update is propagated, and then nothing happens for the next 30 seconds. After the 30 seconds, the most recent information is sent to the neighbor or neighbors in question. So the updates at 5, 10, 15, 20 and 25 seconds aren’t propagated. This reduces the instability that may be injected into the BGP system.
When a new prefix is announced because it’s newly injected into BGP or a session that has been down comes back up, usually the MRAI doesn’t come into play, because the update typically travels fastest over the best path. So that best path is installed in the BGP table and used. The update then comes in over longer paths, but those updates aren’t propagated because they contain paths that are worse than the already known path. (Exceptions are possible when for traffic engineering or policy reasons, a path that’s longer than the shortest path is preferred).
However, this is different when routes are removed from BGP, and to a lesser degree when routes are made less preferred, for instance, by path prepending. In that case, BGP will start “path hunting”. Consider the topology in figure 1.
Figure 1. Path hunting example topology
ASes 20, 30, 40 and 50 all receive the prefix that AS 10 advertises. AS 20 propagates the prefix to AS 30, AS 30 propagates it to AS 40, and AS 40 propagates it to AS 50. This results in the BGP tables shown in table 2. The greater-than sign indicates the best path.
AS |
Prefix |
AS path |
20 |
198.51.100.0/24 |
> 10 |
30 |
198.51.100.0/24 |
20 10
> 10 |
40 |
198.51.100.0/24 |
30 10
> 10 |
50 |
198.51.100.0/24 |
40 10
> 10 |
Table 2. Initial AS paths
When at this point AS 10 goes down or otherwise stops advertising prefix 198.51.100.0/24, each AS will remove the direct path towards AS 10, as shown in table 3.
AS |
Prefix |
AS path |
20 |
198.51.100.0/24 |
|
30 |
198.51.100.0/24 |
> 20 10 |
40 |
198.51.100.0/24 |
> 30 10 |
50 |
198.51.100.0/24 |
> 40 10 |
Table 3. AS paths immediately after AS 10 becomes unreachable
Now, the following updates will happen:
- AS 40 tells AS 50 that the path is now 30 10
- AS 30 tells AS 40 that the path is now 20 10
- AS 20 tells AS 30 that 198.51.100.0/24 is now unreachable
After this round of updates, the BGP tables look as shown in table 4.
AS |
Prefix |
AS path |
20 |
198.51.100.0/24 |
|
30 |
198.51.100.0/24 |
|
40 |
198.51.100.0/24 |
> 30 20 10 |
50 |
198.51.100.0/24 |
> 40 30 10 |
Table 4. AS paths after the second round of updates, shortly after AS 10 becomes unreachable
Immediately after receiving its update from AS 30, AS 40 needs to send another update to AS 50, informing it that the path is now even longer. And AS 30 no longer has a route towards 198.51.100.0/24, so it needs to send a withdrawal towards AS 40.
However… This is where the minimum route advertisement interval kicks in, as AS 30 already just sent an update to AS 40 and AS 40 one to AS 50. So for 30 seconds, nothing happens. Then the updates are sent, resulting in the BGP tables shown in table 5.
AS |
Prefix |
AS path |
20 |
198.51.100.0/24 |
|
30 |
198.51.100.0/24 |
|
40 |
198.51.100.0/24 |
|
50 |
198.51.100.0/24 |
> 40 30 20 10 |
Table 5. AS paths after the third round of updates, 30 seconds after AS 10 becomes unreachable
AS 40 now sees the withdrawal from AS 30 and needs to propagate that withdrawal to AS 50. But the MRAI delays the update once again, so this withdrawal is delayed by another 30 seconds. After that delay, we reach a new stable situation where 198.51.100.0/24 is considered unreachable by every AS, as shown in table 6.
AS |
Prefix |
AS path |
20 |
198.51.100.0/24 |
|
30 |
198.51.100.0/24 |
|
40 |
198.51.100.0/24 |
|
50 |
198.51.100.0/24 |
|
Table 6. AS paths after the fourth round of updates, 60 seconds after AS 10 becomes unreachable
Note: The MRAI is intended to work on individual prefixes, but implementations may group prefixes together and apply the MRAI to the group in order to save bookkeeping overhead. As a result, the MRAI may slow down a prefix even if it’s updated only once rather than multiple times. |
So the path hunting behavior means that when a prefix becomes unreachable, BGP will explore longer and longer paths before the prefix disappears from routing tables everywhere. When a prefix really becomes unreachable, it’s not a problem that for about two minutes, packets try to follow longer and longer paths before they’re ultimately lost anyway. However, path hunting can be very problematic when the prefix in question is a more specific prefix, and there’s also a less specific prefix covering the same address range.
For instance, suppose that a network has prefix 198.51.100.0/23 and is connected to two ISPs. In order to get their traffic engineering to work the way they want, they decide to split the /23 into two /24s, and announce 198.51.100.0/24 to one ISP and 198.51.101.0/24 to the other ISP. The /23 is announced to both ISPs as a backup. If now the link to the first ISP goes down, path hunting will happen for 198.51.00.0/24. Once the path hunting is over, traffic will flow towards the /23, but during path hunting, there will be instability, as paths keep changing every 30 seconds. Some of these paths may work, others may not. Typically, this takes two minutes, with pings coming through during some of the 30-second periods and going unanswered during others.
Note: So it’s best to not depend on a less specific advertisement as a backup for a more specific advertisement. In the situation with two ISPs, it would have been better if each /24 were advertised to both ISPs, but with different properties to influence traffic engineering. However, it’s possible for prefixes announced with many prepends to still receive incoming traffic, so this option isn’t always ideal, either. |