RTT stands for Round Trip Time and it is a length of time that a packet takes from point A to point B plus the length of the time that acknowledgment of the packet takes from B to A. The performance of TCP flow is affected by RTT. The distribution of RTTs can affect dramatically not only the data rates realized by individual flows sharing a link but also the utilization of Internet links [1]. Congestion control, as one of the features of TCP, ensures that the sender does not overload the network and it decides how quickly data is being sent. TCP congestion control includes techniques that prevent congestion or they help to mitigate congestion after it occurs.
Google’s Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion algorithm that is a successor of CUBIC for the YouTube service depends heavily on RTT. As Google says “Here, BBR yielded 4 percent higher network throughput, because it more effectively discovers and utilizes the bandwidth offered by the network. BBR also keeps network queues shorter, reducing round-trip time by 33 percent; this means faster responses and lower delays for latency-sensitive applications like web browsing, chat, and gaming. Moreover, by not overreacting to packet loss, BBR provides 11 percent higher mean-time-between-rebuffers”[2]. BBR measures a network delivery rate and RTT after each ACK, building an explicit network model that includes the maximum bandwidth and minimum RTT. Based on the model, BBR knows how fast to send data and the amount of data it can send over a link [3].
One-Way Delay (OWD) is the length of the time that a packet takes from point A to point B across the network. RTT and OWD can be measured actively when network traffic related to a network service is injected by additional traffic which will be measured. The well-known utility used for measurement RTT is ping. RTT used in the congestion control algorithm has an impact on service response time and it depends on OWD. Therefore, OWD can be used as an indicator of network issues and load. The article introduces passive measurement OWD based on flow data. It is based on an excellent thesis “One-Way Delay (OWD) Measurement based on Flow Data in Large Enterprise Networks”, written by Jochen Andreas Kögel [4].
The passive approach of OWD measurement based on NetFlow has several advantages. First of all, NetFlow has been widely implemented by network vendors and it can be easily enabled. Also very likely, NetFlow is already configured in your network for accounting and reporting purposes, so you do not need to change the existing configuration. Therefore, no investment is required comparing to active measurement when software or hardware probes are needed. In contrast to active measurement, we do not need to pre-select a path between two points where probes are inserted. Instead, OWD samples are calculated from NetFlow data between any network devices. However, in order to calculate OWD from NetFlow data, traffic must be present on the path.
There are several requirements that we need to take into account. First of all, the measurement results should be delivered as soon as possible to an upper layer application that takes OWD measurements. This allows IT staff to respond quickly to problems that cause performance degradation.
The measurement samples must be ordered by the timestamps they report the OWD value for. If a measurement sample with a certain timestamp has been already delivered to the application, no sample with an earlier timestamp must be delivered to the application thereafter.
Flow data-based OWD measurement should provide as many OWD samples as possible. The high number of samples allows for better granularity. Also, we can create accurate mean values of a certain time interval so the impact of random errors is reduced while confidence is increased.
The design must consider that resources such as CPU, RAM are limited. It allows obtaining OWD samples from flows even under the highest load where the flow record rate is the highest. The flow rate depends on traffic and flow capturing devices.
Extracting OWD from flow records works if there are at least two Observation Points (OP) that create flow records for a flow. For performing passive OWD measurement, timestamps of the same packet at different network locations have to be gathered. Performing passive OWD measurement for all transmitted packets is impossible due to the high load. Hence, only a fraction of packets is selected (sampled) for measurement.
Kögel has developed an efficient processing approach that takes into account network and flow capturing effects. In order to provide reliable measurements, errors must be detected, corrected and/or quantified. For this purpose, he creates an exporter profile that describes these errors and improves the online accuracy and efficiency of the extraction of OWD from flow records. In the first (offline) phase, exporter profile values are created for each NetFlow exporter based on a large volume of reference data, collected during a typical working day. In the second (online) phase, flows are received from the exporters and the profiles are used as an input for OWD calculation. Thanks to the profile values created during the offline phase, OWD values are efficiently computed while detecting, compensating and quantifying online errors.
The enterprise network depicted on Picture 1 has been used for evaluation of the flow-based OWD measurement accuracy with profile support for five days. NetFlow v5 records are exported from routers located in France and Australia and are sent to the collector in Germany. For each exporter, profile parameters are obtained during an offline phase to improve the method accuracy. Blunder filter, systematic error compensation, and other correction methods have been applied during the online phase.
The accuracy of flow-based OWD measurement has been evaluated against the active TCP based RTT measurement. During these measurements, the measurement machine sends every 20 minutes one burst of 1000 TCP ACK packets to each ping peer with a rate of 10 packets per second. The ping peers in France and Australia respond with TCP RST messages. The measurement samples from active and flow-based measurements that result from the same packets flowing through the network are available and can be compared. According to Kögel, the profile-based correction leads to highly accurate OWD samples, which only show the random error that can be known from the profile parameters.
Picture 1: Network Topology for Evaluation Accuracy of Passive Flow-based OWD Measurement
Source: http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/Kl_Diss_40189.pdf
The accuracy of flow-based OWD measurement has been evaluated against the active TCP based RTT measurement. During these measurements, the measurement machine sends every 20 minutes one burst of 1000 TCP ACK packets to each ping peer with a rate of 10 packets per second. The ping peers in France and Australia respond with TCP RST messages. The measurement samples from active and flow-based measurements that result from the same packets flowing through the network are available and can be compared.
Picture 2: Path Germany-France: Comparison of active RTT and flow-based OWD
Source: http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/Kl_Diss_40189.pdf
Picture 3: Path Germany-Australia: Comparison of active RTT and flow-based OWD
Source: http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/Kl_Diss_40189.pdf
The obtained results showed that flow-based OWD measurement is feasible, but requires profile support for error corrections and for dimensioning of the online processing chain. This is specifically true for the systematic errors, sporadic large errors (blunder), and the errors caused by system clock skew.
The thesis shows the feasibility of flow-based OWD measurement with profile support. Flow-based OWD measurement with profile support can be used as an addition or replacement for active OWD measurements that are performed today.