Understanding the meaning of the various error messages that we, system administrators, face in our daily routine is key to quickly pinpoint and troubleshoot issues and bottlenecks. So, among the more frequent errors on modern distributed systems, we have the “request timed out” and “destination host unreachable” messages.
Errors like these may ruin anyone’s day. Actually, they are signs of a diverse range of issues that will surely require attention and immediate action.
In this tutorial, we’ll dig a little bit into these errors, and talk about some tools to help troubleshoot them.
The Internet Control Message Protocol (ICMP) was first designed along with the venerable IP protocol, (RFC791) and is defined on the RFC792, dated back to 1981. It goals to provide diagnostic and control messages to IP networks.
Like any of the other major internet protocols, for instance, TCP or UDP, it runs over IP. Regardless of some of its original functions being now deprecated, it’s still quite used. Many of its control messages occur to signal network or its gateways connectivity issues. The more common ICMP control messages associated with those issues are:
The “Destination Unreachable” control message, including its subclass “Destination Host Unreachable”, occurs when the user host or its gateways can’t find a path to reach the destination.
Thus, when that happens, it is usually due to the lack of available and suitable routes from the user to the destination. We must note that, although unusual, there are network and firewall configurations or administrative actions that may generate this kind of message.
On the other hand, The “Request Timeout” associates with the inability of client software to receive an answer (to an echo request control message, for instance) in a specific amount of time. It is not an actual ICMP control message.
It is generated when after some time, no answer was received whatsoever due to many factors: congestion in the network, the host being down or unresponsive, or packet losses.
In contrast, the “Time Exceeded” ICMP control message is not related to time but with distance. Whenever the network stack creates a new IP packet, it inserts a “Time to Live field”, also known as TTL. IPv6 specification renamed this field to “Hop Limit” retaining its functionality.
While traveling through the network, each time the packet goes to the next hop, its TTL value is decreased by one. As soon as it reaches 0 (zero), a “Time Exceeded” control message is sent to the packet’s origin host to inform that the packet would travel longer than the original packet’s TTL.
This behavior is shown on this image, the client sends an IP packet with a starting TTL of 10 towards a destination server. Each hop in the path decreases the TTL until it reaches its destination:
The maximum possible TTL (or Hop Limit in IPv6) is 255, meaning that this is the largest hop distance allowed for a single point-to-point Internet conversation. It is possible to enhance this distance with proxies in the route that can disassemble the packets, and assembling new ones, even though it is a behavior typical of man-in-the-middle class of attacks.
We can note on the table that the Traceroute control message is now deprecated. This may sound strange for administrators who use traceroute or tracert commands on daily basis to diagnose network connectivity issues.
The fact is that, currently, its functionality was actually redundant. By the way, it has been implemented by sending successive ICMP echo requests, TCP SYN, or UDP probe packets. By progressively increasing the sent packets TTL, from 1 to the number of hops needed to reach the target.
When the packet can’t reach its destination, or TTL goes to zero, an ICMP control message is sent back. That way, the client software registers the IP of the hosts that sent the corresponding ICMP control messages.
When faced with errors like “request timed out” or “destination host unreachable”, the first thing we must do is to try to pinpoint what is its underlying cause.
Any operating system that implements the IP stack must have a set of troubleshooting utilities at the system administrator’s disposal. We’ll take a look at the more common ones.
The first network diagnosis command that comes to mind is ping. This utility is almost as old as the IP protocol itself. It works by sending ICMP echo requests. Then it measures the time needed to receive a correspondent echo reply control message.
If that does not occur at a specific time, it issues “Request timed out” messages as a result.
For instance, to check if there is connectivity between two hosts and to check their communication latency (one of ping‘s output is the round-trip time – RTT) and packet loss. That way, we can use:
# ping -c 10 18.104.22.168 PING 22.214.171.124 (126.96.36.199) 56(84) bytes of data. 64 bytes from 188.8.131.52: icmp_seq=1 ttl=53 time=31.8 ms 64 bytes from 184.108.40.206: icmp_seq=2 ttl=53 time=31.7 ms 64 bytes from 220.127.116.11: icmp_seq=3 ttl=53 time=31.2 ms 64 bytes from 18.104.22.168: icmp_seq=4 ttl=53 time=30.2 ms 64 bytes from 22.214.171.124: icmp_seq=5 ttl=53 time=31.1 ms 64 bytes from 126.96.36.199: icmp_seq=6 ttl=53 time=79.1 ms 64 bytes from 188.8.131.52: icmp_seq=7 ttl=53 time=30.6 ms 64 bytes from 184.108.40.206: icmp_seq=8 ttl=53 time=319 ms 64 bytes from 220.127.116.11: icmp_seq=9 ttl=53 time=31.5 ms 64 bytes from 18.104.22.168: icmp_seq=10 ttl=53 time=29.9 ms --- 22.214.171.124 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9015ms rtt min/avg/max/mdev = 29.904/64.621/319.028/86.009 ms
The parameter -c is used to specify how many probe packets will be sent. As we can see. while probing the IP 126.96.36.199 with 10 packets, we had 0% packet loss and an average round-trip time of 64 milliseconds, with 86 milliseconds mean standard deviation.
The current version of the ping command has a large number of options, we can see them all using the command ping -h.
For the purpose of estimating latency and data loss, sometimes we need a large number of packets, and bigger payloads, as in:
# sudo ping -f -c 579 -s 1460 10.1.1.1 PING 10.1.1.1 (10.1.1.1) 1460(1488) bytes of data. ............ --- 10.1.1.1 ping statistics --- 579 packets transmitted, 567 received, 2.07254% packet loss, time 9249ms rtt min/avg/max/mdev = 148.030/154.707/408.936/25.042 ms, pipe 14, ipg/ewma 16.000/149.251 ms
The command with the -f (fastest packet rate possible) and -s (packet payload size) options set, generates higher loads, and the -f option is restricted to superuser only. As a matter of fact, we should never use the -f option to probe other parties’ systems, as it is considered spurious traffic and may be understood as malicious.
On a side note, the options -f and -s had some history of being abused by DoS (Denial of Service attacks). Like the famous Ping of Death attack, targeting a packet fragmentation bug, where a multimillion-dollar machine could be brought down by a single command as simple as:
# ping -s 65535 <host IP>
Even though these attacks have few vulnerable targets nowadays, ICMP control messages are extensively used by prospective attackers to map networks and potential targets.
That’s why it is usual to use firewalls to control what ICMP control messages can pass to the internal network elements and to what destinations.
The best practice is to allow only incoming ICMP traffic to a handful of internal hosts needed for basic troubleshooting and restrict outbound to only ICMP echo request and reply control messages.
If we see messages like “destination host unreachable”, we know that we have some issue with the routing discovery to that host. Accordingly, to pinpoint where it occurs, we can use the traceroute command.
This command traces what are the hops in between two hosts.
We can select the protocol used to probe (among ICMP, UDP, or TCP), the number and timing of the probes, hint the routers, gateways, or interfaces to use, and try to probe the MTU (Max Transmission Unit, i.e., the largest payload without fragmentation) along the routes or gues the reverse route from the target to our own host.
For instance, let’s map the route from our host to google DNS:
# traceroute -n 188.8.131.52 traceroute to 184.108.40.206 (220.127.116.11), 30 hops max, 60 byte packets 1 10.0.2.1 4.162 ms 3.856 ms 3.705 ms 2 18.104.22.168 9.591 ms 9.481 ms 9.371 ms 3 100.120.69.45 9.265 ms 100.120.69.47 9.155 ms 9.991 ms 4 22.214.171.124 10.101 ms 100.120.22.7 10.107 ms 100.120.22.247 9.663 ms 5 100.120.24.241 33.581 ms 100.120.22.212 27.239 ms 27.133 ms 6 100.120.20.240 30.848 ms 100.120.20.246 28.394 ms 100.120.25.80 23.335 ms 7 * 126.96.36.199 30.368 ms * 8 * * * 9 188.8.131.52 28.958 ms 32.079 ms 31.915 ms
The option -n asks to not resolve hostnames. The command’s output showed 9 hops from the local machine to the google DNS server, and there are some very interesting things to note:
For a complete list of the extensive number of parameters, we can issue a traceroute –help command.
In this tutorial, we briefly talked about common errors like “Destination Host Unreachable” and “Request Timeout”. Furthermore, we got a better understanding of how they relate to each other, and some of the tools we use to analyze them.
However, the are several network analysis tools that can enhance our understanding of network issues:
Sometimes, we need to go deeper. Analyze the traffic, review its content, using tools that can show the network packets’ anatomy, headers, and payload. For that matter, we can use Tcpdump or, a graphical packet analyzer like Wireshark.
Other times, we need to gather more information about the network itself and its hosts. So, we can use Nmap, a tool that can do a very detailed analysis of the hosts in a network, and detect what services they run, among many other uses.