Once upon a time, all the characters you needed when writing documents and programs were right there on the keyboard (at least if you're too young to have used a keypunch and think 11−6−8 every time you see a semicolon), but that was then and this is the globalised, technology saturated twenty-first century, with flying cars, fusion power generators, holidays on the Moon, and instantaneous worldwide communication…oh, wait…well, the last one anyway. When writing for a wired world, one often finds the need to use characters from languages other than one's own, mathematical and scientific symbols, fancy punctuation, printer's ornaments, and other embellishments which give online documents that professional polish.
Usually, this has meant hauling out that monster (1500 page, 3.4 kg) Unicode book, searching for the character that's required, then converting its hexadecimal character code to a decimal or symbolic HTML “character entity”, which means firing up the programmer's calculator and/or another table lookup. Enter unum, a stand-alone utility program written in portable Perl which allows you to look up Unicode and HTML characters by name or number, and interconvert numbers in decimal, hexadecimal, and octal bases.
Here are some everyday questions you can easily answer with unum.
Browser and system font support for Unicode is spotty at the present time. Some of the characters in the following examples may display as square boxes, blanks, question marks, or whatever other symbol your system uses for characters it cannot properly display.
The unum utility may be downloaded from the following link:
unum.tar.gz: Gzipped TAR archive (112 Kb)
The archive contains a single file, unum.pl, which is the self-contained Perl source code for unum. You may wish to rename it unum and install it in a directory on your PATH, but it may be run from any location. The program uses no modules from the Perl library: there are no prerequisites other than Perl itself. For full functionality, including the ability to display Unicode characters and accept them on the command line, unum requires a release of Perl contemporary with its early 2006 release date. The program has been tested on Perl 5.8.5; if you run it on an earlier version, you may get an error indicating that the "−CA" option in the first line of the program is not implemented. If this specification is removed, the program will still work, but you won't be able to specify Unicode characters on the command line. Unicode command line arguments and terminal output require an operating system and shell which support these features and may not work on your system: for example, everything works fine with Perl 5.8.5 on Fedora Core 3 Linux, but neither Unicode arguments nor Unicode output from shell programs are implemented on Red Hat Enterprise Linux 3. You may have to change the character encoding in your terminal program to permit Unicode input and output; with Gnome Terminal, for example, select the Terminal / Character Encoding / Unicode menu item. The character name and number lookup facilities will work on any system with a vaguely recent version of Perl.
All prior releases remain available.
You can print the following documentation directly from the unum.pl program with the command "perldoc unum.pl". Calling the program with no arguments will print a short summary of argument formats.
unum — Interconvert numbers, Unicode, and HTML/XHTML characters
The unum program is a command line utility which allows you to convert decimal, octal, hexadecimal, and binary numbers; Unicode character and block names; and HTML/XHTML character entity names into one another. It can be used as an on-line special character reference for Web authors.
The command line may contain any number of the following forms of argument.
0x. Letters may be upper or lower case, but the
xmust be lower case.
b=greeklists all Greek character blocks in Unicode.
c=may be omitted.
Gothicblock and the 32 characters it contains.
n=telephonefinds the five Unicode characters for telephone symbols.
For number or character arguments, the value(s) are listed in all of the input formats, save binary.
Octal Decimal Hex HTML Character Unicode 046 38 0x26 & "&" AMPERSAND
If the terminal font cannot display the character being listed, the "Character" field will contain whatever default is shown in such circumstances. Control characters are shown as a Perl hexadecimal escape.
Unicode blocks are listed as follows:
Start End Unicode Block U+2460 - U+24FF Enclosed Alphanumerics U+1D400 - U+1D7FF Mathematical Alphanumeric Symbols
This is unum version 1.1, released on February 11th, 2006.
Specification of Unicode characters on the command line requires
an operating system and shell which support that feature and a
version of Perl with the −CA command line option
(v5.8.5 has it, but v5.8.0 does not; I don't know in which
intermediate release it was introduced). If your version of
Perl does not implement this switch, you'll have to remove it
#! statement at the top of the program, and Unicode
characters on the command line will not be interpreted correctly.
If you specify a regular expression, be sure to quote the argument if it contains any characters the shell would otherwise interpret.
Please report any bugs to firstname.lastname@example.org.
This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The Unicode character tables are derived from the Unicode::CharName module:
Copyright © 1997, 2005 Gisle Aas.
The Unicode::CharName library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Name table extracted from the Unicode 4.1 Character Database. Copyright © 1991–2005 Unicode, Inc. All Rights reserved.
The original Unicode::CharName module may be found at:
The control characters in this unum version have been annotated with their Unicode abbreviations, names, and for U+0000 to U+001F, the Ctrl-letter code which generates them.
This document is in the public domain.