The end-to-end latency between two systems depends on four factors:
In other words: in most situations, the dominant factor that determines latency is the physical distance that packets have to travel. Obviously, not much can be done about the actual distance between the two systems that communicate, but sometimes packets are routed over much longer paths than necessary.
We now know where latency comes from. Let’s look at its effects.
The obvious effect of high latency is that it takes longer for a network request to be handled. In theory, if the RTT is 10 ms, it takes 5 ms for a request to flow from the client system to a server, and then 5 ms for the requested data to be returned. In practice, it’s longer, because most protocols need to send multiple packets back and forth before data can start to flow. For instance, TCP uses a three-way handshake: the client sends the first packet to the server. The server sends back an acknowledgment, and then a third packet from the client to the server is the first that can carry data in the form of (for instance) an HTTP request. The fourth packet in the exchange is then the first that can deliver the requested data to the client. With an RTT of 10 ms, this takes 20 ms, but if the RTT is 100 ms, all of this takes 200 ms. This is well above the value suggested by the rule of thumb that users start noticing latency of applications at 100 ms.
Also, before a TCP session can be opened, an application will almost always need to look up the destination server’s IP address in the Domain Name System (DNS). That’s at least one other round trip (time). So it’s important to avoid using a DNS server far away with a high RTT. Some server software first does a reverse DNS lookup for the client’s IP address before it accepts a connection or processes a request. This can add significant delays for users, especially ones far away, while testers who come from previously seen IP addresses won’t see such a delay. Make sure that such reverse lookups, if needed for logging purposes, happen asynchronously and don’t block handling connections and requests.
Another source of application delays is the negotiation of security parameters. Setting up TLS/SSL takes several more round trips as encryption and hashing algorithms are negotiated, the identity of the server is verified by the client and a session encryption key is determined. Again, because developers and testers are usually relatively close to a system, the impact of the additional round trips is usually not obvious to them. But users far away will see a noticeable delay as a dozen or so packets need to go back and forth before actual data can be transferred.
However, the delays aren’t over once the first data packet starts to flow.
A very simple way to communicate over a network is to send a data packet, wait for the other side to acknowledge that the packet has been received, and then send the next packet. This works well over short distances, but it quickly becomes unworkable over longer distances. Suppose the RTT is 50 ms, because the distance between the sender and receiver is about 5000 km (3000 miles). After sending a packet, the sender then waits 100 ms before receiving an acknowledgment and sending the next packet. So the transmission rate is one packet per RTT. This allows for just 20 packets per second across a continent or an ocean—just 30 kilobytes per second with standard 1500-byte Ethernet packets!
So TCP’s approach is to try and figure out how many packets need to be “in flight” in order to fully utilize the available bandwidth. Suppose the available bandwidth is 100 Mbps. That’s about 8300 1500-byte packets per second, or 415 packets per 50 ms RTT. Thus, TCP sends 415 packets. Then it waits for the first packet to be acknowledged to send the 416th, and so on, keeping a window of 415 packets in flight. In practice, TCP is continuously probing available bandwidtfh so it will inject more packets into the network than the available bandwidth can handle. Then at some point, one or more packets are lost, TCP reduces its window size and as a result, slows down its transmission rate.
In order to avoid massively overloading the network at the beginning of a session, TCP uses an algorithm called slow start. The name of the algorithm isn’t particularly well-chosen, as TCP doubles its window and thus its transmission rate every round trip. So it ramps up its transmission rate pretty quickly. The trouble is, it starts with a window size of only a few packets. So it takes about five round trips for the window to reach the 415 packets needed to fill up a 100 Mbps connection with a 50 ms RTT. So that’s 250 ms before TCP reaches full speed. This get worse fast as RTT increases: with a 100 ms RTT, it now takes six round trips, but each round trip takes twice as long, so it’s 600 ms before TCP is running at full speed.
The moral of the story: when dealing with any kind of interactive network application, it’s always helpful to keep latency as low as possible: this will allow the data to start flowing sooner as well as let TCP reach its full speed faster. Also, a high latency tends to make the problems caused by packet loss worse, and vice versa.