Unix Utilities

Anagram Finder

Did you know that "aim of blur" is an anagram of "Fourmilab"? Well, now you can find all kinds of cool anagrams yourself, on your own computer, without even connecting to the Internet. Fourmilab's Anagram Finder is a command-line program which finds all the anagrams of a given phrase made up of words from a list of 117972 words legal in the popular crossword game. You can build your own dictionaries from custom word lists and search for anagrams using them. The program is written in C++ and may be built on any system with a modern, compatible, compiler such as GCC. A ready-to-run Win32 executable and complete source code are available. Written in the literate programming style, the hyperlinked source code may be read on line. New version 1.3 fixes compile problems with GCC versions 3.3 and 3.4, library incompatibilities on Solaris 5.9, and now includes a native 32-bit Windows executable built with Microsoft Visual Studio .NET.

Annoyance Filter

The junk E-mail plague just seems to get worse and worse. Annoyance Filter is a tool which permits individuals to immunise themselves against time-wasting and offensive unsolicited E-mail. It doesn't kill the rats or fleas, but at least it keeps the pathogen they propagate from ever affronting your eyes. Annoyance Filter is an adaptive, trainable E-mail filtering program. You train it to distinguish mail you're interested in reading from junk you don't wish to receive by supplying it with a collection of each kind of mail--the more the better. Then, when a new message arrives, the content of each is compared with statistical profiles of legitimate mail and junk computed from the training collections, and the probability the message's being junk is estimated. If the junk probability exceeds a given threshold, you may wish to automatically discard the message or quarantine it in a containment facility for later review. Junk mail evolves "protective colouration" over time to evade conventional filters, but this is of no avail against Annoyance Filter, which can be re-trained at any time with contemporary mail and junk collections.

Annoyance Filter is written in standard C++ and can be installed on any system with a suitable compiler. On Unix systems, Annoyance Filter can work in conjunction with Procmail. There is nothing Unix- or Procmail- specific about Annoyance Filter; it can be integrated into any mail processing system which permits an external program to filter incoming mail. Developers familiar with Windows, Macintosh, and other proprietary mail clients are encouraged to adapt the program for use with them and contribute their work to this effort.

New release 1.0d builds with GCC 3.4 as well as earlier versions, and adds the ability to parse deliberately mis-formatted MIME headers some junk mail uses to avoid filters.

BASE64

Portable C program which encodes and decodes files in MIME "Base64" encoding; this comes in handy when developing E-mail and Web servers which accept and deliver embedded binary files. New January 2001 update adds support for EBCDIC systems and 32-bit Microsoft Windows platforms and includes a ready-to-run Windows executable.

BLITZ

Perl utility used in conjunction with the Majordomo mailing list manager to remove bounced addresses from mailing lists.

Bulk Validator

The World Wide Web Consortium's Markup Validation Service is an essential resource for Web developers who wish to create standards-compliant documents. When you're generating a collection of documents automatically or have an existing Web tree to check, submitting each document individually for validation can become tedious. BulkValidator is a Perl program which automatically submits all the documents in a directory or directory tree for validation, reports a summary of the results, and collects error reports for any documents which failed validation for subsequent examination.

Codegroup

Utility which encodes and decodes binary files into five-letter code groups just like secret agents use. Handy for sending small binary messages by telephone, radio, or telegraph.

Demoroniser

Using Microsoft applications to create Web pages runs the risk of publishing pages which appear to be the product of a semi-literate moron when read on non-Microsoft platforms. Demoroniser corrects a number of errors and gratuitous incompatibilities in Microsoft-generated HTML, both generic and specific to PowerPoint.

Dictool

OpenWindows tool for accessing the "Languages of the World" CD-ROM of language dictionaries. Provides pop-up language dictionary tools with an inverted index that allows near-instantaneous access to dictionaries on the CD-ROM.

ePowerSwitch Control for Perl

The ePowerSwitch is an Internet-enabled outlet strip--no, really! Its 10Base-T network interface and built-in mini Web server allow turning power on and off independently to each of its four outlets and querying their status over the network. While control from a Web browser is fine for interactive use, in many applications you'd like to be able switch devices on and off under program control. This directory provides an object-oriented Perl module which permits control of an ePowerSwitch from Perl programs, plus a command line utility suitable for interactive use or within shell scripts.

ETSET

C++ program which automatically translates electronic texts prepared in the format used at this site into LaTeX for typeset output, HTML for Web publication, Palm Markup Language for handhelds, or 7-bit ASCII for readers unable to display 8-bit ISO characters. Includes tools for editors to produce and validate electronic books. Source code for Unix systems and a ready-to-run 32-bit Windows executable are available.

Floating Point Benchmarks

There are many disadvantages to being a balding geezer. In compensation, if you've managed to survive the second half of the twentieth century and been involved in computing, there's bearing personal witness to what happens when a technological transition goes into full-tilt exponential blow-off mode. I'm talking about Moore's Law—computing power available at constant cost doubling every 18 months or so. When Moore's Law is directly wired to your career and bank account, it's nice to have a little thermometer you can use to see how it's going as the years roll by. This page links to two benchmarks I've used to evaluate computer performance ever since 1980. They focus on things which matter dearly to me—floating point computation speed, evaluation of trigonometric functions, and matrix algebra. If you're interested in text searching or database retrieval speed, you should run screaming from these benchmarks. Hey, they work for me.

New September 2012 update adds Haskell to the C, FORTRAN, QBasic, Ada, Common Lisp, Java, JavaScript, Pascal, Perl, Python, Ruby, Smalltalk, and Visual Basic (6 and .NET) implementations of the original floating point benchmark, and includes a comparison of the relative performance of these languages.

FeedbackForm

When you publish something on the Web, it's great to receive thoughtful feedback from readers; there's no faster or better way to find and fix everything from typos to yawning logical chasms in your argumentation than to submit your work to the global peer review which the Internet enables. What's not great are the consequences of publishing your E-mail address on a Web page to invite such feedback. Should you be so unwise, you'll quickly discover what it's like to leave your door unlocked in today's Internet slum. FeedbackForm is a CGI application, written in the Perl language, which allows visitors to your Web pages to send feedback identifying the page from which it was sent, without ever disclosing your E-mail address. To avoid ruminations of robots and memos from morons, submitters are asked to solve a simple linear equation in order to have their feedback transmitted. Trusted correspondents can be added to a white list, and may send feedback without solving a problem. You can configure the difficulty of the problem the user must solve (or disable it entirely, if you dare).

FIST

Nerds weren't held in the highest esteem in the tempestuous times of the late sixties, but if you had access to a mainframe computer with a fast line printer, a great way to make new friends and meet radical chicks was cranking out banners for the cause du jour on the graveyard shift when the Man wasn't looking. The FIST program traces its lineage directly back to a program I punched onto Hollerith cards for a UNIVAC 1108 in September 1969. It prints banners with a clenched fist and block-letter slogan of your choice. Various silly options let you choose a right- or left-handed fist according to your political persuasion and to adjust the size of the fist commensurate with the vehemence of your convictions and your printer's paper size. In the spirit of Donald E. Knuth's most excellent mise jour of the Adventure game, this version is presented as a literate program in the CWEB language; C source code and a ready to run Win32 executable are included.

Flashback

Ever lost a day's worth of editing on Unix by fat-fingering something like "rm * .o"? Flashback makes a gzipped tar snapshot of the directory you're working in (and any subdirectories) to a common backup directory from which you can restore clobbered files when needed. Flashback reports the size of the backup directory after adding the new backup so you'll know when it's time to clean up old backups.

FTP Report

This program allows you convert either transfer log files from wu-ftpd into either HTTP daemon common log file format or a Comma Separated Value (CSV) database suitable for analysis with Microsoft Excel or other packages which accept this format. A manual page and Makefile are included.

Logtail

Perl utility which allows a system administrator to watch, in real time, items added to any number of log files on any number of hosts on a network.

MD5

Command-line utility which computes and checks message digests (digital signatures) generated by the MD5 algorithm as defined by RFC 1321. This program is handy for software installation, file verification, and other system administration shell scripts and Perl programs. Includes complete C source code for Unix and a ready to run Win32 executable. New version 2.0 adds multiple file signature generation (including wildcard expansion in the Win32 version), tagging signatures with file names, and optional lower case letters in hexadecimal output.

MIDICSV: Translate MIDI Music Files to and from CSV

MIDI music files are a simple and elegant representation of musical compositions, but are stored in a somewhat arcane binary format which is difficult to process without specialised libraries. MIDICSV includes two utilities, midicsv and csvmidi, which inter-convert MIDI files and Comma-Separated Value (CSV) files preserving all information. CSV representations of MIDI file may be loaded into spreadsheets and database programs, and can be easily processed with text processing languages such as Perl and Python. A variety of examples, written in Perl, illustrate generation and transformation of MIDI music files in CSV format. Complete source code in portable ANSI C and ready-to-run WIN32 executables are available.

Moontool

Displays an icon with the current phase of the Moon on an OpenLook or SunView screen. When opened, up-to-date information is displayed about the Sun and Moon.

NETPBM Utilities

Additions to the NETPBM package. Currently contains ppm.shar.gz which (extracted into the ppm subdirectory), contains additions to the ppmdraw library to implement drawing ASCII text in a pixmap, using a stroke font which can be scaled and rotated. There's a test/demo program included, as well as two new PPM applications, cietoppm and ppmlabel. Cietoppm makes a portable pixmap containing a plot of the CIE "tongue" diagram, optionally showing the colour gamut of various display systems (NTSC, PAL/SECAM, SMPTE, HDTV, etc.). Ppmlabel draws ASCII text, specified either on the command line or from a file, into a portable pixmap. Both cietoppm and ppmlabel require the text drawing extensions to ppmdraw. The archive pnm.shar.gz contains a new PNM filter, pnmhisteq, which performs contrast enhancement through the technique of histogram equalisation. See the README file for details.

A Perl script which mobilises standard PBMplus/NetPBM utilities to add simulated shadows to bitmap images is available. Visit the pnmshadow page for details, sample images, and download instructions.

Windex is a Perl script which prepares an HTML document containing a graphical index for a collection of image files, with each small thumbnail image linked to the corresponding full-size image.

sbigtopgm is a filter which translates image files produced by the Santa Barbara Instrument Group's astronomical CCD cameras into PBMplus portable graymaps.

Passport Photo Maker creates a ready to print page of passport (or other) photos of a specified size, adjusted for the page size and resolution of a printer, with as many copies of the original photo as will fit on the page.

pnmctrfilt simulates the effect of an optical centre filter on a wide angle lens, allowing the elimination of vignetting in images taken without a centre filter.

Onetime Pad Generators

Utilities for generating one-time key or password pads in a variety of formats. Options allow you to select key length, whether digits or letters are used, and whether alphabetic keys are truly random or obey the digraph statistics of English text (less secure, but easier to remember and transcribe). Both a C language program and a Web page with an embedded generator program in JavaScript are available.

Google™ PageRank™ Query

With search engines directing a substantial fraction of the traffic to many Web sites, the ranking of one's pages and their standing compared to offerings of others is of acute interest to many site operators. This Perl utility allows you to query Google's ranking of any page on the Web, either from the command line or via a Web query form submitted to a CGI application on a Web server. In addition, the utility can display a graphical “page rank meter” which shows visitors the instantaneous ranking of pages on your site. Security features allow restricting access to the Web query facility to keep your site from being submerged by requests by third parties.

PageVisits

Want to add a cool "number of visits" counter to your Web page? Know how to install Common Gateway Interface (CGI) Perl programs and Netpbm image processing utilities? Well, here's your Web counter! A simple Perl program, in conjunction with the Netpbm toolkit, allows you to add counters to pages on your Web site, using a variety of pre-defined fonts, and even create your own fonts for innovative pages which demand them.

Passport Photo Maker

Perl script for Unix systems which uses Netpbm utilities to create a ready to print page of passport (or other) photos of a specified size, adjusted for the page size and resolution of a printer, with as many copies of the original photo as will fit on the page.

pnmctrfilt: Centre filter for the Netpbm Image Toolkit

Wide angle lenses for large- and medium-format cameras often vignette the image--the film is not uniformly exposed, but instead illumination falls off away from the optical axis of the lens, underexposing the edges and corners. Traditionally, optical "centre filters" have been used to compensate for vignetting, but they require additional exposure time which may be impractical. Pnmctrfilt is an addition to the Netpbm image processing toolkit which simulates the effect of a centre filter in an image exposed without one. Options allow emulating a wide variety of optical filters.

QPRINT

Portable C program which encodes and decodes files in RFC 1521 MIME "Quoted-Printable" encoding. You can use this developing E-mail and Web servers which accept and deliver text files containing characters not present in the 7-bit ASCII printable set.

Redirex

Isn't it irritating when you change the IP address of a Web server and discover that, months later, requests still rain in from search engines and other sites which have linked to the absolute IP address rather than your site name? Redirex is a small, lightweight Perl server which intercepts HTTP requests sent to the old address and redirects them to the site's new home. New version 2.0 (July 2004) is compatible with Perl 5.6 and above in "strict" mode, easier to configure, and permits control over caching of server redirections.

Random Sequence Tester

A program for the analysis (not generation) of random and pseudorandom sequences. A variety of tests, including many from Knuth, are applied to the contents of a file and the results reported on standard output. In portable C; public domain. October 1998 update adds frequency histogram display, optional analysis of input as a bitstream, CSV output for postprocessing by other programs, and improved HTML documentation.

RPMSIZE

"Yikes! Who ate my hard drive?", you exclaim, having installed a Linux distribution update and discovered your 4 Gb root partition is now 98% full. RPMSIZE provides two Perl programs to compute the size of files belonging to a given RPM software package, and prepare a list of all packages installed on a system and their sizes, sorted by size, to point out promising candidates for banishment.

Setbase

When making a few changes to a program's source code, for example, to get an Open Source program to build on your system, it's easy to forget what you've changed. You can use a source code change control system such as RCS or CVS, but that's a lot of baggage when you're just making a few changes here and there. Setbase keeps track of files you've changed using standard Unix utilities so when you go to install a newer version, you needn't start from scratch and fix the same things all over again.

Settime

A Unix utility which, run by root as a cron job, calls the National Institute of Standards and Technology in Boulder Colorado in the U.S. and sets the system time from their bank of atomic clocks. This was written for a Sun running SunOS. Due to the variety of schemes for port locking, serial I/O ioctl's, etc. you may have some difficulty getting this to work on your system. Note: if you're connected to the Internet, you don't need this program; you can set your system's clock directly across the net using the daytime utilities available from the NIST.

Splits

Dumb little utility which splits a binary file into pieces which you can reassemble with cat or COPY /B. Yes, I know there are 100 other programs which do this, but the system I was using when I wrote it didn't seem to have any of them available.

TeXtoGIF

Perl program which transforms equations written in LaTeX into PNG or GIF files suitable for inclusion as inline images in Web HTML documents. New version 1.1 (November 2003) works with Ghostscript 6.52, generates PNG files as well as GIF, and allows command line specification of background grey scale shade.

UNUM: Unicode/HTML/Numeric Character Code Converter

Web authors who use characters from other languages, mathematical symbols, fancy punctuation, and other typographic embellishment in their documents often find themselves juggling the Unicode book, an HTML entity reference, and a programmer's calculator to convert back and forth between the various representations. This stand-alone command line Perl program contains complete databases of Unicode characters and character blocks and HTML/XHTML character entities, and permits easy lookup and interconversion among all the formats, including octal, decimal, and hexadecimal numbers. The program works best on a recent version of Perl, such as v5.8.5 or later, but requires no Perl library modules.

Valve

Valve is a Unix pipeline component and stand-alone copy utility which permits limiting the transfer rate (bytes per second) to a specified value. This allows bulk data transfers to be performed without monopolising disc and/or network bandwidth to the detriment of other users and applications, for example when backing up a large filesystem across the Internet to a server located at a remote site. You can think of valve as a nice command for I/O. Valve is written in portable ANSI C using the Literate Programming methodology.

WatchFull

Don't you just hate it when your pager goes off in the middle of the night due to a file system full disaster on a mission-critical system you administer? WatchFull keeps an eye on file systems and notifies you by E-mail when one or more file systems are dangerously close to capacity. Included in the distribution are LogJam, which monitors system and application log files, warning when they need to be cycled to recover space, and Top40, which scans one or more file systems and fingers the largest files: candidates for clean-up. All of these utilities are Perl programs compatible with Perl 4.036 and 5.003 and usable with most versions of Unix.

Windex

Perl script for Unix systems which uses Netpbm and JPEG utilities to prepare an HTML document containing a graphical index for a collection of image files, with each small thumbnail image linked to the corresponding full-size image. Update: Version 1.1 (July 2004) fixes compatibility with current releases of image processing utilities, adds support for PNG image format, permits specification of the title of the HTML image document, and allows opening images in an auxiliary browser window.

XD

Extended file dump and load utility for Unix and 32-bit Windows. Lets you dump a file in hex, decimal, or octal (with optional side-by-side ASCII/ISO-8859), then use whatever text editor you like to edit the data, even inserting and/or deleting bytes, then reload the edited dump to create a modified binary file. No need to learn a different editor to edit binary files!

Xsunclock

OpenWindows utility which shows you, as an icon or resizeable window, the portion of the Earth currently day and night. In conjunction with the "Two Line Elements" posted to the space newsgroups periodically or obtained by FTP, lets you track Earth satellites, with the current satellite position shown on the map.