« Reading List: The Simulation Hypothesis | Main | Reading List: Sonic Wind »

Tuesday, January 7, 2020

UNUM 3.1: Updated to Unicode 12.1.0, UTF-8 Support Added

Version 3.1 of UNUM is now available for downloading. Version 3.1 incorporates the Unicode 12.1.0 standard, released on May 7th, 2019. Since the Unicode 11.0.0 standard supported by UNUM 3.0, a total of 555 new characters have been added, for a total of 137,929 characters. Unicode 12.0.0 added support for 4 new scripts (for a total of 150) and 61 new emoji characters. Unicode 12.1.0 added the single character U+32FF, the Japanese character for the Reiwa era. (In addition to the standard Unicode characters, UNUM also supports an additional 65 ASCII control characters, which are not assigned graphic code points in the Unicode database.)

This is an incremental update to Unicode. There are no structural changes in how characters are defined in the databases, and other than the presence of the new characters, the operation of UNUM is unchanged. There have been no changes to the HTML named character reference standard since the release of UNUM version 2.2 in September 2017, so UNUM 3.1 is identical in this regard.

UNUM 3.1 adds support for the UTF-8 encoding of Unicode, and allows specification of characters as UTF-8 encoded byte streams expressed as numbers, for example:

    $ unum utf8=0xE298A2
       Octal  Decimal      Hex        HTML    Character   Unicode
      023042     9762   0x2622     ☢    "☢"         RADIOACTIVE SIGN
A new --utf8 option displays the UTF-8 encoding of characters as a hexadecimal byte stream:
  $ unum --utf8 h=sum
     Octal  Decimal      Hex        HTML       UTF-8      Character   Unicode
    021021     8721   0x2211 ∑,∑    0xE28891      "∑"         N-ARY SUMMATION
UNUM Documentation and Download Page

Posted at January 7, 2020 15:53