Introduction
Inside every Unix shop with more than a handful of machines,
odds are there's a file system slowly growing toward
100% capacity, at which point all kinds of unpleasant
events begin to transpire. System administrators who value
the serenity and rejuvenation of a good night's sleep
over the flattering feeling of being needed which comes
from having your pager go off at four in the morning need
tools which anticipate little problems before they mature
into full-on screaming crises. This page presents a suite of such
tools, all
Perl
programs, which monitor impending file system full
disasters and aid in remediation of impending problems.
WatchFull consists of
three independent Perl scripts
which address different aspects of file system capacity
management. In order to use these tools you
must have
Perl
installed on your system. The programs are compatible
with both Perl 4.036 and 5.003 and later. Note that
while Perl has been ported to non-Unix platforms, these
utilities are Unix-specific as they require other
standard Unix programs not present on other
systems. If you install these programs as cron
jobs, be sure to verify whether Perl can be found from the
abbreviated search path used for such jobs and,
if not, either add it to the path or explicitly
specify the full pathname in the crontab entry.
- WatchFull
- Examines mounted file systems and reports, usually
by E-mail to the administrator, any which exceed a
designated percentage of their capacity.
- LogJam
- One frequent culprit insidiously filling up critical
file systems is log files, such as console
message logs and FTP and HTTP transfer logs, which grow
without bound as entries are appended. LogJam
monitors a list of files and reports any which exceed
individually designated threshold sizes in an E-mail
to the administrator.
- Top40
- Unlike WatchFull and LogJam, which are
usually run automatically as cron jobs to alert
the administrator of incipient problems, Top40
helps resolve them by producing a list of the
40 (or any other number you like) largest files
in one or more directory trees. With appropriate
options you can scan everything from a single directory
to all file systems mounted on a machine.
WatchFull -- The Cop on the Beat
Documentation for
WatchFull follows, in Unix
manual page style.
NAME
WatchFull - monitor file systems approaching capacity
SYNOPSIS
perl WatchFull.pl [ -d ]
[ -g name ]
[ -m address ]
[ -s /path=thresh,... ]
[ -t thresh ]
[ -u ]
[ -x /path,... ]
DESCRIPTION
WatchFull examines mounted file systems on the machine
on which it is run and generates a report listing those which
exceed a given capacity threshold. If any file systems are
found to exceed their thresholds, a warning is mailed to
the designated system administrator.
Here is an example of a warning message mailed by
WatchFull to the administrator of a host
named "pallas".
Greetings, carbon-based lifeform. The following file systems on
pallas are approaching capacity.
/dev/dsk/dks0d2s7 xfs 1960472 1804112 156360 93 /files1
OPTIONS
- -d
- Debug mode: output (if any)
is written to standard output rather than being E-mailed. If
you specify this option when running WatchFull as a
cron job, the output will usually be mailed to the
owner of the job as a cron report.
- -g name
- Name by which to greet the system administrator.
By default this is "carbon-based lifeform".
- -m address
- Mail
warning messages to the designated address, which
can be any E-mail address accessible by the mailer
on the host system. This defaults to "root@localhost".
- -s /path=thresh,...
- The -s option allows you to specify warning
thresholds for individual file systems. The argument
is a comma-separated list of file system mount
points (as shown in a "df -k"
report) and warning thresholds as a percentage of
capacity.
- -t thresh
- The default warning threshold, as a percentage of file system
capacity, is set to thresh. This threshold may
be overridden for individual file systems by the
-s option. The default warning threshold
is 90% of the file system's capacity.
- -u
- Print how-to-call information.
- -x /path,...
- File systems mounted at the comma-separated list
of mount points are excluded from those checked.
You may want to exclude NFS-mounted file systems
on other machines, read-only media such as CD-ROMs,
and removable backup media which are routinely written
until full.
BUGS
WatchFull assumes the "df -k"
command produces output in the format it expects and that
the Mail command can be used to send mail
to the designated recipient. If this isn't the case, you'll
have to modify the Perl program accordingly. On some systems
you'll have to replace Mail with mailx.
SEE ALSO
df(1), Mail(3)
|
LogJam -- The Usual Suspects
One of the most common causes of file system exhaustion
is system and server log files which grow without bound
as entries are appended to them. If you don't keep an
eye on these files, they can eat your disc alive. For
example, let's a take a peek at the HTTP log file
directory on the
www.fourmilab.ch
server right now:
/files/server/logs/http> ls -lt
total 5012848
-rw-r--r-- 1 root sys 477730174 Aug 22 15:21 agent_log
-rw-r--r-- 1 root sys 957131968 Aug 22 15:21 referer_log
-rw-r--r-- 1 root sys 1113544113 Aug 22 15:21 access_log
-rw-r--r-- 1 root sys 18168734 Aug 22 15:19 error_log
Yikes! That's two and a half G- G- Gigabytes of
log files--time to clean house! (Actually, the file system
on which these files are kept has a capacity of 17 Gb and
is only about 25% full, so I can go a long time
before taking the garbage out....)
Anyway, LogJam will keep an eye on the log files on
your system and E-mail warnings when one or more exceed
size thresholds you define on a file-by-file basis.
Amid the daily chaos of system administration, it's easy
to overlook log files ratcheting up to absurd dimensions.
LogJam lets you know when they need attending to.
Documentation for LogJam follows, in Unix
manual page style.
NAME
LogJam - monitor size of continuously growing files
SYNOPSIS
perl LogJam.pl [ -d ]
[ -g name ]
[ -m address ]
[ -t ]
[ -u ]
filename threshold...
DESCRIPTION
A common cause of file system full crises are system and
server log files which grow without bound as
transactions are added. Most modern Unix systems
incorporate mechanisms to limit the space consumed by
system files such as console message transcripts and
login histories, but many server logs such as FTP and
HTTP access logs must be manually "cycled" when they
grow too large. This is a task easily overlooked amidst
the quotidian alarums and diversions of system
administration. LogJam keeps an eye on these log
files and sends a warning when one or more exceeds a
given size threshold.
On the command line, list one or more
"filename threshold" pairs which specify
a file to be checked and the size threshold which, when
exceeded, will generate a warning for that file. The
size may be specified in bytes, or with a suffix of "K"
for kilobytes, "M" for megabytes, "G" for gigabytes, or
"T" for terabytes. Suffixes denote powers of 1000 and
may be either upper or lower case. For example, to
generate a warning when an HTTP access log exceeds
500 megabytes, one would use:
perl LogJam.pl /files/server/logs/http/access_log 500M
OPTIONS
- -d
- Debug mode: output (if any)
is written to standard output rather than being E-mailed. If
you specify this option when running WatchFull as a
cron job, the output will usually be mailed to the
owner of the job as a cron report.
- -g name
- Name by which to greet the system administrator.
By default this is "carbon-based lifeform".
- -m address
- Mail
warning messages to the designated address, which
can be any E-mail address accessible by the mailer
on the host system. This defaults to "root@localhost".
- -t
- Print size quantities with thousands separators.
The number 1269259614 will be displayed as "1,269,259,614"
with this option specified. If you prefer a different
character as the thousands separator, change the
assignment to the $Thousands variable
in the source code.
- -u
- Print how-to-call information.
BUGS
The size of a file is deemed to be whatever the Perl -s
operator says it is. On systems which support and contain
"holey" files--files in which all logical addresses do not
correspond to allocated storage--the size reported may not
correspond to the amount of storage actually occupied by
the file.
Size of the contents of directories named on the command
line is determined with the "du -sk"
command. If this command does not produce the expected
format on your system, you'll have to modify the Perl
program to specify the appropriate command and/or parse
the results it returns.
LogJam assumes the Mail command
can be used to send mail to the designated recipient.
If this isn't the case, you'll have to modify the source
code accordingly. On some systems you'll have to
replace Mail with mailx.
SEE ALSO
du(1),
Mail(1)
|
Top40 -- Most Wanted List
Once
WatchFull has alerted you to a file system approaching
capacity and you've dealt with any oversized log files
fingered by
LogJam, it's time to unleash the witch hunt
for huge files lurking in less obvious locations. You know--those 275
megabyte core dumps from Netscrape in each of your users'
home directories, the fellow with half a gigabyte of, shall
we say, "non-work related" MPEG files, system crash core dumps
and kernel images dating back to 1994, patch back-out directories
from three operating system releases ago, etc.
This is where
Top40 comes in.
Top40 scans one or more directory trees and prepares
a list of the 40 largest files found in them. (You can
specify the number of files to be shown with a command
line option.) These files are prime candidates for clean-up
campaigns.
Documentation for Top40 follows, in Unix
manual page style.
NAME
Top40 - show largest files in directory trees
SYNOPSIS
Top40
[ -f ]
[ -h ]
[ -n count ]
[ -s size ]
[ -t ]
[ -u ]
rootdir...
DESCRIPTION
Top40 scans one or more directory trees starting
at its rootdir and displays a list of the largest
files found, 40 by default, in descending order by size. The
rootdir arguments need not be file system mount
points--any directory may be scanned. If no
rootdir is specified, the current directory is
scanned.
OPTIONS
- -f
- By default, Top40 does not follow mount
point directories onto other file systems. For example,
if you scan the root file system, "/", a directory
"/usr" which is the mount point of a different
physical file system will not be examined. The
-f option overrides this and causes mount
points to be followed just like regular directories.
Specifying the -f option and the root
file system will cause every file system on the
machine to be scanned. This can take a long time!
- -h
- File sizes are displayed in "human
readable" form: a number of three or
fewer digits followed by a suffix,
"b" for bytes, "K" for kilobytes,
"M" for megabytes, "G" for gigabytes,
and "T" for terabytes. Each of these
units denotes a power of 1000, not 1024.
Values less than 10 units are shown with
a single decimal place; if you wish to
change the decimal character, modify the
definition of $Decimal in the
program source.
- -n count
- The count largest files
are displayed. The default value is 40.
- -s size
- Files smaller than the specified
size are excluded from the scan. This
reduces the time required to scan large file systems
with many relatively small files. The size
may be given in bytes, or with a suffix of
"K" for kilobytes, "M" for megabytes, "G" for
gigabytes, and "T" for terabytes. Suffixes
denote powers of 1000 and may be either upper or
lower case. By default all files are scanned,
regardless of size.
- -t
- File sizes are edited with a thousands separator,
for example "16,297,944" instead of "16297944".
If you prefer a different thousands separator
character, change the definition of $Thousands
in the program source.
- -u
- Print how-to-call information.
BUGS
The Unix find command is used to traverse the
specified directory trees and pre-filter the files found
therein. If the find command on your system behaves
in a non-standard manner, you may have to modify the options
supplied to it in the program source.
Directories which contain a multitude of small files will
escape scrutiny by Top40. A separate scan which sums
the size of top-level directory contents would be required
to identify such perpetrators and Top40 does not
presently do this.
The size of a file is taken to be whatever the Perl -s
operator says it is. On systems which support and contain
"holey" files--files in which all logical addresses do not
correspond to allocated storage--the size reported may not
correspond to the amount of storage actually occupied by
the file.
SEE ALSO
find(1)
|