"Yikes! Who ate my hard drive?", you exclaim, having installed a Linux distribution update and discovered your 4 Gb root partition, created a year ago that absurdly large (you thought) so you'd never have to worry about it filling up, is now 98% full of stuff you've never heard of and are unlikely to ever use: Pig Latin spelling checkers, six different ways to type Kanji on your ASCII keyboard, and a different desktop environment for every day of the week.
Unless you're ready to surrender, repartition your drive and reload everything in the forlorn hope that the next update won't fill up whatever ridiculous size you make the new root partition, there's nothing for it but to embark upon an old-fashioned system administrator's witch hunt--find out what's using all the space and get rid of the fat targets you don't need. If your system uses RPM software packages, the obvious way to proceed is to make a list of all packages sorted by the disc space they consume, then investigate whether the larger ones are really necessary. While several RPM administration tools such as rpm GnoRPM show the size of individual packages, you're forced to query them individually to find the big ones. This is a lot of typing or pointing and clicking for grizzled sysadmins like me who would rather just look at a list.
This page presents two small, simple Perl programs: rpmsize.pl calculates the size in bytes of all the files in a given RPM package, and rpmhogs.pl uses rpmsize to prepare a list of all packages installed on a system and the size of each, sorted in descending order. Both programs use the command line rpm program to query the database of installed software. As these programs perform only query operations, they do not require super-user (root) privilege to run.
perl rpmsize.pl package_name
If the specified package is installed on the system, the sum of the sizes of all files within the package will printed in bytes. For example:
$ perl rpmsize.pl emacs-21.2-2 35774059
shows that the Emacs text editor and its support files consume 35.8 megabytes on your system. (If you've installed everything in a single root partition, all the files will be there. If your system is configured with separate /, /usr, /var, etc. partitions, you'll have to investigate further to determine where the files are actually installed; these tools to not address that question, although most application packages install most of their files in the /usr filesystem.) If the package_name is not installed on the system, rpmsize will report its size as zero.
Output is written to standard output and may be redirected or piped to another program in the usual manner. Here are the first ten and last five lines from a run of this program on a system which had just been upgraded from Red Hat Linux 7.2 to 7.3:
168,180,172 glibc-common-2.2.5-34 115,245,076 kernel-source-2.4.18-3 109,563,286 php-manual-4.1.2-7 65,477,844 xemacs-21.4.6-7 64,227,434 aspell-cs-0.2-3 61,776,259 rpmdb-redhat-7.3-0.20020419 59,441,866 jdk-1.3.1-fcs 49,206,083 tetex-doc-1.0.7-47 39,713,613 xemacs-el-21.4.6-7 37,629,394 kdebase-3.0.0-12 . . . 1,966 rootfiles-7.2-1 213 procps-X11-2.0.7-12 48 docbook-utils-pdf-0.6.9-25 0 basesystem-7.0-2 3,711,330,234 Total
At a glance, it's obvious there are some promising targets for clean-up here. To obtain details for a package, use the command:
rpm --query --info package_name
Running this on, say, php-manual-4.1.2-7, we discover that the upgrade has installed almost 110 megabytes of documentation for the PHP Web scripting language, and this on a machine which isn't even a Web server! Then there's the 64 megabytes of spelling checker dictionaries for the Czech language in aspell-cs-0.2-3, and the list goes on and on. A couple of hours of cleanup focused exclusively on large and obviously unnecessary packages freed up more than 600 megabytes on my system partition, reducing its occupancy from 98% to 83%, and that's with leaving both the Gnome and KDE desktop environments installed simultaneously (this is a development machine, and I occasionally wish to test programs for compatibility with both of these desktop systems).
Obviously, you shouldn't go around deleting packages from your system unless you know what you're doing and fully comprehend the consequences of your actions. RPM's dependency management will help you to avoid common pitfalls, but before you delete anything related to the kernel or system, be absolutely sure you're not inviting disaster. As long as a package is an application included on your distribution CD-ROMs, you can always re-install it if you later discover you need it.
The programs are supplied as a Gzipped TAR archive:
which simply extracts the two Perl programs to the current directory. You can modify the formatting of numbers, column sizes, and separators by changing declarations at the top of rpmhogs.pl; the comments explain the options available.
rpmsize.tar.gz (1.6 Kb)
This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided "as is" without express or implied warranty.
- Base64 encoder/decoder
- BLITZ: Remove subscribers from Majordomo lists
- Flashback: Instant backups of directory trees
- Logtail: Watch multiple log files on multiple machine
- MD5 signature computation per RFC 1321
- One-time key or password list generator
- Quoted-Printable encoder/decoder
- WatchFull: Monitor file systems for exhaustion
by John Walker