« Reading List: Red War | Main | Reading List: Savrola »

Monday, October 22, 2018

ISBNiser 1.4 Update Released

I have just posted version 1.4 of ISBNiser, a utility for validating ISBN publication numbers in the ISBN-13 and ISBN-10 formats, converting between the formats, and generating Amazon associate links to purchase items with credit to a specified account.

In version 1.3, the ability to automatically parse ISBNs and insert delimiters among the elements (unique country code [ISBN-13 only], registration group, registrant, publication, and checksum) was added. This allows you, given an ISBN with no delimiters, for example “9781481487658”, to obtain an ISBN-13 or ISBN-10 with proper delimiters with:

$ isbniser 9781481487658
ISBN-13: 978-1-4814-8765-8  9781481487658   ISBN-10: 1481487655  1-4814-8765-5
The rules for parsing ISBNs are beyond baroque. The international bureaucrats who created the scheme first defined a series of “registration groups”, which identify the ISBN by language, (for example 0 and 1 for English and 3 for German), countries (7 for China, 987 for Argentina), regions (982 for South Pacific, 976 for the Caribbean Community), former countries (5 for the Soviet Union, 80 for Czechoslovakia), parts of countries (962 for Hong Kong), non-countries (9950 for Palestine), and un-countries (92 for International NGO Publishers and EU Organizations). Within each registration group, it's up to those administering it to decide how the registrant (publisher) and publication fields are parsed from the balance of the ISBN. The checksum is always the final character, but is computed by entirely different algorithms for ISBN-10 and ISBN-13.

There is no way to cleanly parse the contents of an ISBN with a simple algorithm. Had programmers designed ISBNs, there would be a simple, uniform, left-to-right way to determine field sizes, but what the bureaucrats have left us with is a mess which requires a table exhaustively enumerating each case for every separate registration group, which you have to search to parse the fields of the ISBN.

ISBNiser 1.3 used an algorithm based on JavaScript code employed by the U.S. Library of Congress ISBN Converter to parse and hyphenate ISBNs. Unfortunately, while perhaps “good enough for government work”, that code only handles a small fraction of the universe of ISBNs. For example, try feeding it ISBN 978-952-7303-00-9 for an English language book published in Finland and watch what happens.

ISBNiser 1.4 replaces this algorithm with a comprehensive search of the official ISBN Range database, downloaded in XML format from the Web site of the International ISBN Agency. It should be able to parse any ISBN issued by an organisation assigned a registration group by that agency. A simple process using tools included in the distribution archive allows updating the program from new versions of the range database.

Operation of ISBNiser is unchanged; the only difference is that it will now be able to hyphenate many more ISBNs than before. If you specify the “−g” option, the name of the registration group will be displayed. The “−u” option output now includes the publication date of the ISBN range database included in the program.

Installation of ISBNiser is as before. On any system with a base installation of Perl (no optional modules are required), simply place the executable Perl program anywhere on your path. If you wish to rebuild the ISBN Range database from a new release of the XML file from the International ISBN Agency, you will need to have the Perl module XML::Parser installed to run the auxiliary program that creates the database for ISBNiser.

Unrelated to the ISBN parsing changes, an error in checksum computation which could cause incorrect ISBN-13s to be generated from a supplied ISBN-10 has been corrected.

Posted at October 22, 2018 14:06