There are also conditions that break a BGP session that worked before.
Debugging those issues starts by looking at the state a BGP session is in.
BGP’s finite state machine (as outlined in RFC 4271) has the following states:
On Cisco routers, the state of BGP sessions is shown in the State/PfxRcd column of the show ip bgp summary or show bgp ipv4 unicast summary commands (IPv4) or show bgp ipv6 unicast summary command (IPv6) output. The Connect, OpenSent and OpenConfirm states tend to be very transient; most of the time a BGP session state is Idle, Active or Established. If the session is in the Established state, all is well. Note that you won’t see the word “Established” in the output of the show … summary command: rather, you’ll see the number of prefixes received from the neighbor in question.
The two states you’ll encounter when things aren’t working are Idle and Active. That’s right: when the state of a BGP session is Active, this doesn’t means there’s an active BGP session. What it means is that BGP is actively trying to set up a session with the neighbor. If the neighbor is also in state Active, the two will connect within seconds to about a minute. However, if the other side doesn’t respond, the session can remain in Active for a long time.
In state Idle, the router is currently not trying to set up a BGP session. Reasons for this can be that there is no route towards the neighbor, or the neighbor refused an earlier connection attempt. Sometimes the output of the show bgp ipv4 unicast neighbors <address> command (or the IPv6 version) will indicate the reason for the Idle state, but not always. After making changes to the configuration or resetting sessions, the state of a BGP session may go to Idle for a short time and then revert to Active, but when severe error conditions persist, the session may remain in Idle indefinitely.
A special case are “Idle (PfxCt)” and “Idle (Admin)” in the show … summary output. Idle (PfxCt) means the session is in the Idle state because the neighbor has sent more prefixes than the configured maximum-prefixes limit. The session will remain in Idle until it’s cleared/reset with the clear bgp ipv4 unicast <address> command. Idle (Admin) means that the BGP session is in shutdown state, as per the following configuration:
!
router bgp 65065
neighbor 192.0.2.1 remote-as 65066
neighbor 192.0.2.1 shutdown
!
This is a good way to temporarily disable a BGP session without the need to remove it from the configuration. Re-enable the session with no neighbor <address> shutdown in the BGP section of the configuration.
When something sufficiently bad happens that a router needs to tear down a BGP session, it sends a NOTIFICATION message to the other side just before it terminates the session. Notifications are limited to pre-defined messages, so older routers may not know the most recent codes. You’ll also often see “notification sent” in logs, which means “we terminated the session from our end”. “Notification received” means the other end terminated the session. Some notification types are fairly obvious, such as a mismatched AS number, or “no supported AFI/SAFI”, which means that the two routers wanted to use the BGP session for different protocols, e.g., one wanted to do IPv4 and the other IPv6. There’s also a “cease” notification that routers can use if they want the other side to stop trying to initiate BGP sessions, with a short RFC explaining the different sub-codes that may be used.
The last notification sent or received is usually listed in the output of the show bgp ipv4 unicast neighbors <address> command.
When a BGP router is configured to use an MD5 password on a session, but the remote router isn’t, or the remote router uses a different password, the BGP session won’t establish. Or, if it was already established when the password is configured, the session will eventually go down as the hold timer expires. However, no MD5-related notifications are sent. The reason for this is that with an MD5 password configured, packets that carry BGP messages that have no MD5 checksum or an incorrect one (because the other side used a different password) are ignored and not processed by BGP. However, the reception of these packets will be logged in most routers’ logs.
When configuring a BGP session, it’s always useful to know whether the other side is already configured or not. If not, then you’ll have to wait for the other side to set up their end. When setting up peerings, it’s helpful to send the other side the output of the show bgp ipv4 unicast summary | include <AS number> command. If the session or sessions don’t come up, the other side can now easily check whether you used the correct IP addresses and AS numbers. If the session or sessions do come up, the other side can check if the number of prefixes you receive from them matches the number they’re sending.
If both ends are configured, but the sessions remain in Idle or Active, after giving things a few minutes to settle down, use the show bgp ipv4 unicast neighbors <address> command to see if there are any local errors or notifications. You may also want to perform a ping or traceroute towards the remote IP address to check whether it’s reachable. Double check the AS number and IP address(es), a copy/paste error is easy to make and sometimes old information keeps floating around.
If a session with an MD5 password doesn’t come up, but there are no BGP errors, check your logs for MD5 checksum errors and/or double check whether the other side is using a password at all and is using the correct password.
When a session does establish but you don’t receive any prefixes, this may be because the other side isn’t sending any (which can be checked with show bgp ipv4 unicast neighbors <address> advertised-routes) or you have a prefix list or route map configured that filters too aggressively. The opposite issue is the one where you send a copy of the full BGP table to the other side. That’s appropriate towards BGP customers, but not towards peers or transit ISPs. So either you didn’t configure a filter, or the filter isn’t working as intended. This will typically trigger a maximum prefix limit at the other side, with the appropriate notification. When this happens, set up the correct filter and then ask the other side to reset the session.