Recent disruptions to two undersea internet cables in the Baltic Sea have yet again...
SD-WAN and NetFlow
Both 4G/LTE and broadband Internet links are less expensive than MPLS links, but unlike MPLS, which ensures reliable delivery of packets and excellent quality of service for VOIP, they often fail. The reliability of MPLS is achieved by marking packets with labels so they follow predetermined network paths. This, however, is not a case of broadband or 4G/LTE connection that cannot compete with MPLS in terms of reliable packet delivery. For this reason, organizations that have implemented SD-WAN often combine connections from multiple providers to achieve 99.99% availability in case of a link failure.
SD-WAN cloud-enabled architecture is depicted in Picture 1. The branch office is connected via a Wireless WAN and broadband Internet links to cloud services (i.e. Office 365, AWS, Salesforce, etc.). Cloud and Internet performance is improved because traffic to and from the cloud is directly sent to the Internet instead of the enterprise data center. If an SD-WAN controller evaluates that one of the connections to the cloud fails or performance is degraded in terms of packet loss, latency or jitter, traffic is rerouted over the other connection with optimum parameters so the cloud session is not interrupted. Traffic between the branch and data center is encrypted across the entire network within tunnels. Private MPLS network is used for connection to the enterprise data center for in-house real-time applications such as voice or video and for other mission-critical traffic.
Picture 1: SD-WAN Cloud-enabled Architecture
(source: https://www.sdxcentral.com/networking/sd-wan/definitions/essentials-sd-wan-architecture/)
SD-WAN becomes a hot trend in networking, according to IDC the SD-WAN market will grow at a 40.4% compound annual growth rate from 2017 to 2022 to reach $4.5 billion. The market opportunity for SD-WAN is huge and it continues to grow. Cisco is pretty much aware of this trend and the acquisition of Viptela in 2017 for $610 million in cash indicates their interest in SD-WAN. It also proves that technology has reached its maturity and SD-WAN products present significant sales opportunity. SD-WAN has been already incorporated into Cisco ISR/ASR routers running IOS XE such as ISR models 1000, 4000 and ASR 5000 [1]. Thanks to it, Cisco customers may easily migrate to SD-WAN solution without a need to replace existing Cisco devices.
As there is no standard algorithm for SD-WAN controllers, device manufacturers each use their own proprietary algorithm in the transmission of data. These algorithms determine which traffic to direct over which link and when to switch traffic from one link to another. Such information should not be hidden for customers so they can determine which application is rerouted the most, what is the trigger that causes rerouting in terms of link failure, packet loss, latency or jitter and how often it happens. Those are the answers on the legitimate questions and a reality check of SD-WAN performance that help customers verify whether the SD-WAN is working as they expect. Furthermore, knowing a real reason for rerouting helps customers take corrective measures to avoid rerouting in the future, e.g. replacing a faulty DSL link with a new one, etc.
Some vendors have already implemented IP Flow Information Export (IPFIX) (RFC 7011 and 7012) export to their SD-WAN devices. This upgrade significantly improves the visibility of SD-WAN performance. The export IPFIX information from Cisco (Viptela) SD-WAN vEdge router to an IPFIX collector is depicted in Picture 2. The router sends Cflowd version 10 also called IPFIX to the collector via the public Internet where records are accessed by the IPFIX analyzer. The source IP address of the IPFIX messages is randomly chosen from any of the interfaces in a VPN. Both collector and analyzer are placed in the data center so that the IPFIX traffic sent as UDP or TCP segments within the data center is not being encrypted. Cflowd can track GRE, ICMP, IPsec, SCTP, TCP, and UDP flows.
Picture 2: Cflowd v10 aka IPFIX Exported from Cisco vEdge Router
The Viptela cflowd software exports 22 IPFIX information elements to the cflowd collector that are a subset of elements defined in RFC 7012 and maintained by IANA. The exported elements cannot be modified. The most interesting elements that can be used to find out a reason why flow is terminated is flowEndReason (136) with the following values:
0x01: idle timeout
The Flow was terminated because it was considered to be idle.
0x02: Active timeout
The flow was terminated for reporting purposes while it was still active, for example, after the maximum lifetime of unreported flows was reached.
0x03: End of Flow detected
The flow was terminated as the metering process detected signals indicating the end of the Flow, for example, the TCP FIN flag.
0x04: Forced end
The flow was terminated because of some external event, for example, a shutdown of the metering process initiated by a network management application.
0x05: Lack of resources
The Flow was terminated because of a lack of resources available to the Metering Process and/or the exporting process.
However, to find out a real reason why the flow was rerouted we need additional values such as:
0x06: Jitter
Flow rerouted due to excessive jitter in VoIP transmissions
0x07: Packet Loss
Flow rerouted due to excessive packet loss in UDP connections
0x08: Retransmits
Flow rerouted due to excessive TCP retransmits
0x09: RTT flow rerouted due to excessive TCP setup times
The information elements (IEs) documented by IANA clarifying why something occurred to a flow are: forwardingStatus(89), flowEndReason(136), firewallEvent(233). vEdge routers export only 22 IEs, with the FlowEndReason (136) being one of them. However, this information element does not provide information on how and why a specific flow is rerouted.
Conclusion:
Network administrators require insight into changes performed by SD-WAN so they do not have to rely blindly on vendor’s algorithm used for traffic optimization and recovery from failures. Export of flow information from an SD-WAN device is certainly helpful in this effort, however, the current flow export implemented in Cisco SD-WAN devices does not provide enough details that can be used to determine a cause of rerouting. Without it, network professionals are not able to confirm that their SD-WAN architecture is working properly, reacting on events such as loss, jitter, latency and moving traffic via the optimal path.
SUBSCRIBE TO NEWSLETTER
You May Also Like
From Idle to Established: BGP states, BGP ports and TCP interactions
Understanding BGP states is essential to grasp how BGP operates. Similar to interior gateway protocols (IGPs) like...
ACK and NACK in Networking
In networking, communication between devices relies on the efficient exchange of data packets. Among the essential...
BGP and asymmetric routing
What is asymmetric routing? Asymmetric routing is a network communication scenario where the forward and reverse paths...