0

System Administrators (SAs) have a tough job: Dealing with users and user accounts, security, patching, updates, upgrades, disk space, performance and other miscellaneous tasks often known as "other duties as assigned." For some SAs, the day never ends. Despite the challenges, pitfalls and occasional irate user; system administration is a fulfilling job with intangible rewards like no other position in IT. To assist those weary SAs in their quest to conquer their Linux systems, I've devised this list of 12 native Linux system monitoring tools that are always at my fingertips.

Any user may issue these commands, if they exist and haven't been protected by the SA. They are harmless and are read-only commands. The only problem with them is that ordinary users might inform the SA of a performance problem before the SA knows about it and that can irritate an overworked SAs nervous system.

1. top - It's only fitting that at the top of this list, that you'd see 'top.' Top is a diagnostic tool and a real time monitoring tool. Execute this command to see a running list of the top system resource consuming processes on a system. Try it for yourself by typing top <ENTER> at the command prompt. To quit top, press the 'q' key.

top - 14:55:04 up 3 days, 20:49, 2 users, load average: 0.07, 0.05, 0.06
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.8%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2832420k total, 2578360k used, 254060k free, 277288k buffers
Swap: 1540088k total, 0k used, 1540088k free, 1914544k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17686 root 5 -10 690m 549m 535m S 2 19.9 16:31.18 vmware-vmx
21487 khess 15 0 12584 1060 788 R 0 0.0 0:00.07 top
1 root 18 0 10316 684 568 S 0 0.0 0:01.54 init
2 root RT 0 0 0 0 S 0 0.0 0:00.18 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root RT 0 0 0 0 S 0 0.0 0:00.18 migration/1
6 root 34 19 0 0 0 S 0 0.0 0:30.78 ksoftirqd/1
7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0
9 root 10 -5 0 0 0 S 0 0.0 0:00.07 events/1
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 khelper
33 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
38 root 10 -5 0 0 0 S 0 0.0 0:00.00 kblockd/0

2. uptime - The uptime command is simple. It gives you a quick snapshot of system performance and the amount of time the system has been live since the last reboot. Type uptime <ENTER> at a prompt to see your uptime stats. An example of uptime is shown below:

14:57:56 up 3 days, 20:52, 2 users, load average: 0.04, 0.04, 0.05

3. vmstat - The vmstat (virtual memory statistics) command has nothing to do with virtualization but rather it has to do with the health of your system from a swap space point-of-view. Typically, a user issues the vmstat command as shown:

$ vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 253564 277376 1914556 0 0 3 12 23 11 0 0 99 0 0
0 0 0 253564 277380 1914556 0 0 0 23 1064 832 0 0 100 0 0
0 0 0 253564 277380 1914556 0 0 0 205 1114 884 0 0 99 0 0
1 0 0 253440 277380 1914556 0 0 0 7 1060 811 0 0 100 0 0
0 0 0 253812 277380 1914560 0 0 0 16 1089 903 38 3 59 0 0

From the vmstat man page:

vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
The first report produced gives averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory reports are instantaneous in either case.

4. free - Free displays the amount of free physical memory (RAM) in a system, the used physical memory, free and used swap memory and buffers used by the kernel.

$ free
total used free shared buffers cached
Mem: 2832420 2578732 253688 0 277416 1914556
-/+ buffers/cache: 386760 2445660
Swap: 1540088 0 1540088

5. ps - The ps command shows you a snapshot of currently running processes. It has several possible switches (options) but the most common is the ps -ef (See every process in full format) command. Any user may issue the ps command.

A partial ps listing is given below:

UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Apr24 ? 00:00:01 init [3]

root 2 1 0 Apr24 ? 00:00:00 [migration/0]
root 3 1 0 Apr24 ? 00:00:00 [ksoftirqd/0]
root 4 1 0 Apr24 ? 00:00:00 [watchdog/0]
root 5 1 0 Apr24 ? 00:00:00 [migration/1]
root 6 1 0 Apr24 ? 00:00:30 [ksoftirqd/1]
root 7 1 0 Apr24 ? 00:00:00 [watchdog/1]
root 8 1 0 Apr24 ? 00:00:00 [events/0]
root 9 1 0 Apr24 ? 00:00:00 [events/1]
root 10 1 0 Apr24 ? 00:00:00 [khelper]
root 33 1 0 Apr24 ? 00:00:00 [kthread]
root 38 33 0 Apr24 ? 00:00:00 [kblockd/0]
root 39 33 0 Apr24 ? 00:00:00 [kblockd/1]
root 40 33 0 Apr24 ? 00:00:00 [kacpid]
root 180 33 0 Apr24 ? 00:00:00 [cqueue/0]
root 181 33 0 Apr24 ? 00:00:00 [cqueue/1]
root 184 33 0 Apr24 ? 00:00:00 [khubd]

6. iostat - Iostat reports CPU, disk and partition (I/O) statistics. The iostat has several possible switches available to it for specific output. It is part of the sysstat package.

An example of CPU iostat is given below:

$ iostat -c

Linux 2.6.18-53.el5 (system.domain.com) 04/28/2010

avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.00 0.43 0.11 0.00 99.28

7. w - The w (what) command is better than the who command for seeing who's logged on and what they're doing.

$ w
15:28:59 up 3 days, 21:23, 2 users, load average: 0.00, 0.03, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
khess pts/0 megamachine 12:26 1:38m 0.04s 0.04s -bash
khess pts/1 megamachine 12:30 0.00s 0.09s 0.01s w

8. sar - The sar (System Activity Reporter) command is part of the sysstat package. It should be installed by any SA who wants to keep up with extensive system performance measurements. The default setting is to take a system snapshot every ten minutes providing the SA with a 24-hour historic view of performance. It's a valuable tool when trying to find bottlenecks and failures over a one day period.

The sar command has more than three dozen switches associated with it. To see an extensive list of its capabilities, use man sar.

$ sar
Linux 2.6.18-53.el5 (system.domain.com) 04/28/2010

12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 0.49 0.00 0.52 0.05 0.00 98.94
12:20:01 AM all 0.13 0.00 0.51 0.08 0.00 99.28
12:30:01 AM all 0.12 0.00 0.53 0.05 0.00 99.29
12:40:01 AM all 0.12 0.00 0.52 0.05 0.00 99.31
12:50:01 AM all 0.13 0.00 0.55 0.07 0.00 99.25
01:00:01 AM all 0.13 0.00 0.65 0.06 0.00 99.16
01:10:01 AM all 0.54 0.00 0.50 0.08 0.00 98.88
01:20:01 AM all 0.13 0.00 0.51 0.08 0.00 99.28
01:30:01 AM all 0.12 0.00 0.52 0.08 0.00 99.28
01:40:01 AM all 0.13 0.00 0.50 0.07 0.00 99.30

9. mpstat - The mpstat command provides you with Multi-processor, CPU-related statistics. It is part of the sysstat package.

$ mpstat 5 5
Linux 2.6.18-53.el5 (system.domain.com) 04/28/2010

03:44:58 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:45:03 PM all 0.30 0.00 8.81 0.00 0.00 0.00 0.00 90.89 1072.80
03:45:08 PM all 0.10 0.00 0.40 1.10 0.00 0.10 0.00 98.30 1109.42
03:45:13 PM all 0.10 0.00 0.40 0.00 0.00 0.00 0.00 99.50 1063.15
03:45:18 PM all 0.20 0.00 3.70 0.00 0.00 0.00 0.00 96.10 1084.57
03:45:23 PM all 0.10 0.00 0.30 0.00 0.00 0.10 0.00 99.50 1067.07
Average: all 0.16 0.00 2.72 0.22 0.00 0.04 0.00 96.86 1079.37

or

mpstat -P ALL
Linux 2.6.18-53.el5 (system.domain.com) 04/28/2010

03:50:59 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:50:59 PM all 0.18 0.00 0.41 0.11 0.01 0.02 0.00 99.28 1071.77
03:50:59 PM 0 0.24 0.00 0.13 0.02 0.00 0.00 0.00 99.61 1000.70
03:50:59 PM 1 0.12 0.00 0.68 0.19 0.03 0.03 0.00 98.95 71.07

10. netstat - The netstat command, replete with options and switches, provides you with diagnostic information about your network statistics including interface statistics, routing tables, network connections and more. A wise SA uses netstat to diagnose network problems, attacks and to see a list of services and connections. An example is shown below.

$ netstat -a |grep LISTEN
tcp 0 0 localhost.localdomain:2208 *:* LISTEN
tcp 0 0 *:vmware-authd *:* LISTEN
tcp 0 0 *:mysql *:* LISTEN
tcp 0 0 *:netbios-ssn *:* LISTEN
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:ndmp *:* LISTEN
tcp 0 0 localhost.localdo:findviatv *:* LISTEN
tcp 0 0 localhost.localdomain:ipp *:* LISTEN
tcp 0 0 *:con *:* LISTEN
tcp 0 0 localhost.localdomain:smtp *:* LISTEN
tcp 0 0 localhost.lo:x11-ssh-offset *:* LISTEN
tcp 0 0 localhost.localdomain:6011 *:* LISTEN
tcp 0 0 *:microsoft-ds *:* LISTEN
tcp 0 0 *:ms-wbt-server *:* LISTEN
tcp 0 0 localhost.localdomain:2207 *:* LISTEN
tcp 0 0 *:http *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 localhost6.l:x11-ssh-offset *:* LISTEN
tcp 0 0 localhost6.localdomain:6011 *:* LISTEN

11. du - The du command reports on disk usage. You can use it to look at all filesystems or a single one. If you use du, prepare yourself for a long list of files, directories and their sizes. It's better to filter the information so that you just see a snapshot of how much space a particular directory or filesystem is using. Issue the du command and request a human readable (megabytes, gigabytes) summary report of the /opt directory.

$ du -sh /opt

929M /opt

12. df - The df command reports the amount of used vs. free space you have on your filesystems. To see how this output differs from the du command, see the example below. The example shown uses the (-h) or human readable format that many SAs prefer.

$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
360G 274G 68G 81% /
/dev/sda1 99M 30M 65M 32% /boot
tmpfs 1.4G 0 1.4G 0% /dev/shm
/dev/hdb1 230G 164G 55G 75% /backups

There you have the top 12 native Linux monitoring tools at your disposal. The real beauty of these commands is that they don't require any web services or third party products to make them work. Their only shortcoming is that they are not predictive nor do they have historical data associated with them. These tools are all snapshot utilities that tell you what's going on right now with your system.
In a future post, I'll cover some predictive and historical monitoring tools.

Edited by khess: n/a

9
Contributors
8
Replies
36
Views
6 Years
Discussion Span
Last Post by Spencer_1
1

Superb information, really helpful. Very good collection. Keep it up.

Something to add.

Pmap - - Process Memory Usage

The command pmap report memory map of a process. Use this command to find out causes of memory bottlenecks.

# pmap -d PID

Iptraf - Real-time Network Statistics

Features of Iptraf are

Network traffic statistics by TCP connection
IP traffic statistics by network interface
Network traffic statistics by protocol
Network traffic statistics by TCP/UDP port and by packet size
Network traffic statistics by Layer2 address

Edited by thewebhostingdi: n/a

Votes + Comments
Also remember iftop, which I prefer to iptraf as it isn't so detailed
0

Are you guys familiar with new usp for a faster internet connection? Can you show me the sites of this statistic?

SNIP

Edited by happygeek: fake sig snipped

0

The problem I have when an article starts out 'The top n tools...' is that these tools are typically all very different. They usually have different user interfaces, different output and there's almost no easy way to correlate the data between them.

That was one of the things to develop collectl a number of years ago in an attempt to reuse all the best capabilities of all the different tools, but provide a single consistent interface both input and output. Here's a table that correlates a bunch of different tools with collectl: http://collectl.sourceforge.net/Matrix.html There is a lot more that collectl can do but no tools/commands to correlate against and so those capabilities are not part of the table.

AND if you use colplot you can directly plot the data collectl produces.

Actually in all fairness, collectl does focus on performance monitoring so it doesn't do things like tell you the amount of available disk storage or how long your system has been up, but it does tell you just about anything you'd ever want to know about what your system is currently doing including everything above and them some such as nfs performance, slab utilization, infiniband stats, interrupt distribution by CPU, lustre stats, buddyinfo, socket utilization, and probably other things as well.

-mark

0

Checkout SeaLion. Gives all the above tools' output in a very convenient timeline, allows you to add any more commands and also alerts on system crash,etc. Extremely easy setup for SAs monitoring multiple servers.

Edited by s.kevin

1

I'd highly recommend Dead Man's Snitch for monitoring your periodic tasks (cron jobs). My recommendation is partly biased since my company works with DMS but that's not stopping me from recommeneding a good product when I see one! :)

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.