Visualize Your Network

Chris Binnie

If the first item on a sys admin’s to-do list is building a Linux server, then the second item is surely monitoring that server. If you’ve been a sys admin for a while, you’ve almost definitely been exposed to the multitude of SNMP-based monitoring tools available on Linux systems. SNMP (the Simple Network Messaging Protocol) is an excellent method to use for gathering changing data on a frequent basis with negligible additional load to the collector of the statistics or the distributor. However, SNMP tools only let you view historical data , so be warned that it does not show you what’s happening on your systems and networks right now. In this article, I’ll look at monitoring even the busiest of networks in near real time without having to squint at screeds of packet sniffing data that’s scrolling up your screen so quickly you can’t read it.

Popular for a Good Reason

Once you are used to the configuration file format and have figured out how to set up SNMP communities on a remote (or local) device, MRTG (http://oss.oetiker.ch/mrtg/) is actually pretty straightforward to use. Couple that with the intuitive tool that’s bundled with MRTG, cfgmaker (http://oss.oetiker.ch/mrtg/doc/cfgmaker.en.html), and you should find that setting up MRTG to run on networks with a significant number of devices isn’t too laborious at all. As acronyms go, it’s fairly unusual, so in case you’re curious, MRTG stands for “The Multi-Router Traffic Grapher.” One of the sample graphs for MRTG is shown in Figure 1.

Figure 1: MRTG graph. Inbound traffic is green, and outbound traffic is shown in blue.

These days, MRTG and RRDtool (http://oss.oetiker.ch/rrdtool) are mentioned almost in the same breath. Both of these truly fantastic tools, which would proudly complement any sys admin’s toolbox, are available because of a lot of hard work by the talented Tobias Oetiker, an IT specialist by day in Switzerland. On the MRTG and RRDtool sites, he describes himself as the person responsible for the initial or original design and as the “main developer” of both tools. Apparently, the main reason RRDtool was written, following the worldwide success of MRTG, was because MRTG needed a performance boost under certain circumstances.

High Performance

Collecting statistics by polling every five minutes or so soon creates an unmanageably large data set, but of course, with the use of Oetiker’s design, the data is dealt with very cleverly. MRTG’s plain text logfile contains an entry for each collected sample on a specific device. Through some smoke and mirrors, the flat text file doesn’t grow too large or grow out of control, yet it still manages to contain enough data to generate graphs for daily, weekly, monthly, and yearly statistics. One of my logfiles for a switch port that’s been checked for around five years is a minute 58KB. This methodology really appeals to my sense of simplicity: highly effective data basing with the use of tiny text files. Figure 2 demonstrates RRDtool’s graphing capabilities.

Figure 2: RRDtool graph.

Newish Kid on the Block

In addition to RRDtool is the now popular Cacti (http://www.cacti.net), which, despite being a relative newcomer compared with MRTG, continues to grow in popularity. Cacti bolts onto the extremely powerful RRDtool engine, and it’s an understatement to say that you can graph all sorts of data. Using SNMP, you can write collecting functions that use php-snmp , ucd-snmp , or net-snmp , and all of the configuration is accessible via an intuitive web-based interface. With Cacti, you can graph disk space on different hard drive partitions and ping response times of a remote host in the cloud; count errors on a network interface; check spikes in RAM usage; monitor system load averages; and even keep an eye on your MySQL Server’s resource usage. The possibilities are nearly boundless.

Networks Not Systems

Leaving systems aside and considering networks for a moment, Oetiker’s SmokePing (http://oss.oetiker.ch/smokeping), which he describes as “Best of breed latency visualisation,” is worth a mention. It produces interactive graphs you can drill down into and is produced to a very high standard, as with all of Oetiker’s contributions.

MRTG is almost always used to graph bandwidth samples collected tirelessly from network switches. The invaluable graphs make it extremely easy to keep an eye out for unusual traffic patterns (which are nefarious, in intent or otherwise). In addition, with the handy MRTG Total Traffic Generator utility (http://freshmeat.net/projects/mrtgtraffgen), you can use MRTG’s flat logfiles to store information about the collected samples and then generate statistics about the number of gigabytes each switch port has shipped in a given time. This means you have access not only to a myriad of aesthetically pleasing graphs, but to human-friendly statistics that might be more suited to the less technical minded. Figure 3 shows MRTG Total Traffic Generator with some slightly modified output. Although it hasn’t been maintained since 2002, with only some additional PHP security added to its simple scripts, it still ticks over nicely. Of course, because it’s written in PHP, you can easily alter the design to fit the brand of your site.

Figure 3: MRTG Total Traffic Generator output.

Polling Frequency

The SNMP-based methods of keeping an eye on systems and networks are unquestionably excellent at providing historical graphing of traffic over the last five minutes (MRTG defaults to collecting five-minute samples). After a while, you can quickly spot when a graph flatlines, meaning a server isn’t generating inbound or outbound traffic and is probably down, or spikes irregularly, meaning something untoward might be going on. Those spikes could be anything from a full (i.e., not incremental) backup repeating every hour because of a configuration error to an attack of some type.

To pick up changes more frequently, you could alter the default five-minute update, of course, but after contacting Oetiker a few years ago about his recommended frequency for polling, five minutes seems to be just about right in most cases.

Because MRTG works on average traffic over a set period of time, it can miss bandwidth spikes, so you could likely miss anywhere from 20% to 50% of the highest bandwidth spikes. How much is missed exactly depends on a few things, but you should not base your estimate of the next Internet connection’s capacity on graphs produced by samples collected over five-minute polling periods.

What You Really Need Is …

Historical graphing is all fine and good, but you can’t tell your boss how many megabits your Internet connection is really using without using statistics gathered in close to real time. Step forward iftop (http://www.ex-parrot.com/pdw/iftop), by Paul Warren and Chris Lightfoot, which uses two-second, as opposed to five-minute, averages.

The name derives from the popular top package. Top measures a system’s CPU load and puts the busiest CPU process to the top of the list, whereas iftop puts the busiest network connection on a network interface to the top of the list (if being short for interface in this case). The iftop site even opens with the statement that “iftop does for network usage what top does for CPU usage” and then goes on to say it’s handy for answering why an ADSL line is slow.

Don’t let the mention of an ADSL line put you off, because iftop has been used on high-capacity, gigabit, multihomed Linux routers. However, if you’re attempting to push a massive amount of traffic through libpcap (the library used for communicating with the network interface), then for goodness sake switch off DNS resolution first (you can use the -n switch at start-up or press the n key while it’s running to disable DNS lookups). On a relatively busy network link (e.g., more than a couple of hundred megabits), imagine the significant number of (mostly reverse) DNS lookups needed to populate a table of network connections. The result, even on a well-stocked machine, can be a noticeable slowdown and might push the router load to an unwelcome high. That said, iftop is incredibly stable when used on production routers under high network load.

Figure 4 illustrates the iftop default view, which omits source and destination port numbers. The upper line has a => right-pointing arrow, depicting outbound traffic (from the interface, that is), whereas the line underneath with a <= left-pointing arrow highlights inbound traffic.

Figure 4: Default iftop view.

The numbers on the right-hand side refresh frequently. Despite the amount of information shown, the output is pretty easy to fathom, and iftop’s a breeze to navigate – a testament to the quality of the product.

The manual explains that the traffic tallies on both the right-hand side and the bottom of the display are showing totals for each connection over the time periods of 2, 10, and 40 seconds. In the example in Figure 4, the uppermost horizontal line is the network connection currently pushing the most bandwidth out, closely followed by the second line.

Tweaks and Tuning

In the same vein as the ubiquitous top, the mighty iftop also has a whole host of switches, parameters, and command-line options, so now that you’ve seen the default view, you can get your hands a bit dirtier. Figure 5 illustrates iftop getting down to a granular network level and displaying traffic types, as opposed to just individual network connections. If you think along the lines of output from packet sniffers, like tcpdump (http://www.tcpdump.org), this view is a useful addition that cuts out the need to spawn another program.

Figure 5: Iftop displaying no hostnames, but a source port and one line comprising both inbound and outbound traffic for each network connection.

Using iftop

Imagine that your network grinds to a halt, and you log in over SSH, fire up iftop, and see a massive amount of ICMP traffic (as in the ping of death attack , which had packet sizes of 65KB instead of 56 bytes, with the intention of crashing a remote machine or flood a network). Even from the default view, with

iftop -i eth0

where -i declares which interface you want to monitor, you can simply press the f key to enter one of the many filters manually (e.g., icmp ) to highlight the offending protocol or network connection.

A large number of handy filter parameters can be run at start-up or by pressing f while iftop is running, making it very dynamic and adaptable. You can do all sorts of things like freeze the output order of the connections (e.g., so you have time to copy and paste an IP address to perform a WHOIS lookup in a different shell). Additionally, you can also run it in promiscuous mode if your server or router has multiple network interfaces and pick up more of the server’s traffic on one display. Promiscuous mode will include traffic not specifically destined for your server, too – sometimes from other devices on the network with promiscuous mode enabled, which can be handy to check for malicious activities on a LAN, for example.

The highly configurable iftop apparently allows any filter in the start-up command to be added during the program while it’s running. If you’ve used tcpdump or ngrep (http://ngrep.sourceforge.net), then the regular expressions you can use in iftop should be familiar with words such as and and not , splicing otherwise complex command sequences together nicely. Even as a beginner, they’re surprisingly easy to decipher. A few demonstrations are in the Example filters section for reference.

It’s worth pointing out at this juncture that iftop is one of many Linux tools that uses the libpcap library to collect traffic details from an interface and then output it via ncurses over a console so it can offer pretty bar graphs that update periodically. The most commonly known utility is probably the feature-replete IPTraf (http://iptraf.seul.org). However, I’ve experimented with many of these alternatives, and I always end up back with my favorite iftop again.

Iftop is readily available as a package in most popular Linux distributions, starts up quickly, and installs in seconds. Some reasons iftop has become an invaluable addition to the sys admin’s tool kit are its rock solid performance, simplicity. and efficacy. If you use iftop for a while, you’ll also agree that under scrutiny, it never fails to please according to any of these three criteria.

Example Filters

If you don’t want to remember a long regular expression for a start-up filter, then just create a Bash alias inside your .bashrc or .bash_profile file, along the lines of:

alias inbound='iftop -i eth0 -p -f "dst net 19.19.19.0/24"'

alias inbound='iftop -i eth0 -p -f "dst net 20.20.20.0/24"'
alias web='iftop -i eth0 -p -f "host www.mydomain.com"'

To apply filters and alter how your iftop instance runs, you have three options. First, you should be able to add any of the regular expressions to a running iftop instance by pressing the f key and Enter after typing in the filter. Second, you could add a filter with -f "filter instruction" to the start-up, as I have in the Bash aliases above. Third, you could create a config file called ~/.iftoprc inside the root user’s home directory (libpcap needs fully elevated access to the interface, so root privileges are required to run iftop). The format of the config file is, as you’d expect, relatively easy to understand, and ~/.iftoprc might look something like Listing 1 (see the man pages for more options).

Listing 1: The ~/.iftoprc config file

dns-resolution: no
port-resolution: yes
show-bars: yes
promiscuous: no
port-display: on
hide-source: no
hide-destination: no
use-bytes: no
sort: 2s
line-display: one-line-both
show-totals: yes
log-scale: yes

The most notable line is the first for busy networks, with DNS lookup disabled. Pressing the n key while it’s running switches DNS on and off as you need it. On a busy network, you might enable DNS lookup after you’ve applied a traffic-suppressing filter so you’re not running quite so much traffic through it. Otherwise, be warned, it will probably perform more DNS lookups in 10 seconds than your average multinational corporation does in a week, which might not increase your popularity among suppliers and colleagues alike.

Simple yet powerful regular expressions let you specifically watch one host (or network) and silently ignore another without any great effort. Bear in mind, you can use the power of words, such as and or not , to join your filter sequences together.

You could check how often the machine with IP address 192.168.1.16 is querying a Google DNS resolver with:

src 192.168.1.16 and dst google-public-dns-a.google.com

Or rather, as the man pages suggest, you could ignore all broadcast traffic on the interfaces(s) with:

not ether host ff:ff:ff:ff:ff:ff

To count web traffic only, unless it is being directed through a local web cache, use:

port http and not host webcache.example.com

Or, you could experiment by simply using the single words icmp or udp to show only ICMP or UDP traffic.

Also, there’s a clever distinction between screen filters and net filters. The above examples were all net filters, but they apply equally well to screen filters. My understanding is that you run a net filter and an additional screen filter so you’re not directly affecting the captured information, meaning you can just flick back to it after you’ve checked something using the screen filter. That way, your statistics won’t be skewed by briefly looking at something else. To keep head scratching to a minimum, if your data isn’t quite what you expected it to be, it’s probably worth making sure you haven’t accidentally left a screen filter enabled. Screen filters don’t affect the running totals at the bottom of the screen, which is a nice touch, making switching back and forth between filters easy to follow.