uni2ascii

Software Screenshot:
uni2ascii
Software Details:
Version: 4.18
Upload Date: 11 May 15
Developer: Bill Poser
Distribution Type: Freeware
Downloads: 12

Rating: 3.5/5 (Total Votes: 2)

uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and decimal HTML numeric character references, u-escapes, standard hexadecimal, and raw hexadecimal.

Such ASCII equivalents are useful when including Unicode text in program source, when entering text into Web programs that can handle the Unicode character set but are not 8-bit safe, and when debugging.

The Unicode escapes available are:

- HTML hexadecimal numeric character references (e.g. �)
- HTML decimal numeric character references (e.g. ȳ)
- u-escapes, as used in Python (e.g. u00E9)
- u-escapes within the BMP and U-escapes beyond the BMP, e.g. u00E9 but U00010024.
- U -escapes (e.g. U 00E9)
- U-escapes (e.g. U00E9)
- u-escapes (e.g. u00E9)
- U-escapes within angle brackets (e.g. )
- x-escapes (e.g. x00E9)
- x-escapes with braces (e.g. x{00E9})
- Standard hexadecimal (e.g. 0x00E9)
- Raw hexadecimal (e.g. 00E9)

uni2ascii accepts a command line flag determining whether to generate upper-case A-F or lower-case a-f as hexadecimal digits since some some programs accept only one or the other. ascii2uni accepts either.

In the case of uni2ascii by default, only characters outside the ASCII range are converted. Even if ASCII characters are also converted, newlines are preserved unless their conversion is explicitly requested. Space characters are also preserved unless conversion is explicitly requested. In the case of the three non-ASCII space characters (Ethiopic word space, Ogham space, and ideographic space), if space characters are not converted, these are replaced with ASCII space (0x20) so as to keep the output within the 7-bit ASCII range.

This package contains four programs. The main program is uni2ascii. It is written in C and must be compiled. uni2html.py is the predecessor to uni2ascii. As it is written in Python, it does not need to be compiled and should run on just about any current computer. uni2ascii is otherwise superior in that:

- It generates a wider range of output formats.
- It is approximately 20 times faster.
- It handles input in the full 32 bit Unicode range. In contrast, uni2html handles only the

Basic Multilingual Plane (Plane 0) because at present Python represents Unicode encoded text internally using 16-bit integers. If you've got text in, say, Linear B or Ugaritic, you need uni2ascii.

It does a better job of reporting errors. If it encounters an error in its input, such as mal-formed UTF-8, it reports the location of the error both in terms of the character count from the beginning of the file (starting at 0) and in terms of the byte count from the beginning of the file (also starting at 0). (Character counts and byte counts are generally not the same since a UTF-8 encoded character occupies from one to four bytes.) The Python version reports only the character count. uni2ascii also provides information about the nature of the error.

The third program, ascii2uni, is the inverse of uni2ascii. It accepts text containing a variety of ASCII representations of Unicode characters and generates UTF-8 Unicode.

The fourth program, ascii2uni.py, reads 7-bit ASCII containing u-escaped Unicode, as used in Python and Tcl, and converts it to UTF-8 Unicode. It is the original program of which ascii2uni is a generalization.

What is new in this release:

  • Fixed bug in uni2ascii in which in certain cases the subsitution count was too high, fixing Debian bug #626268.
  • Patched to handle situation in NetBSD which lacks getline.
  • Clarified semantics of pure option as converting characters in ascii range other than space and newline. Fixed bug in which this was not implemented correctly for UTF8 types.

What is new in version 4.17:

  • Added to uni2ascii the following conversions to nearest ascii equivalent: U+2022 bullet to 'o', U+00B7 middle dot to period, U+0085 next line to newline, U+2028 line separator to newline.

What is new in version 4.16:

  • The Q format works again in ascii2uni.
  • Added U+2033 DOUBLE PRIME to the characters converted to their closest ascii equivalent under using the e format in uni2ascii.

What is new in version 4.15:

  • Renamed endian.h to u2a_endian.h to eliminate conflict with external endian.h.
  • Removed copy of GNU getline from ascii2uni.c as it is standard as of POSIX2008.

What is new in version 4.14:

  • Fixed a bug that interfered with the use of the Q format in uni2ascii.
  • Fixed bug in which ascification of U+2502 and U+2503 added double quote to output.
  • Fixed a bug in which -a S option generated a "Converted so many chars" line for each character due to leaving in debugging code.

What is new in version 4.13:

  • Fixed bug that caused excessive number of characters changed to ASCII to be reported.

What is new in version 4.12:

  • Both programs now allow the input file name to be specified on the command line without redirection.

What is new in version 4.11:

  • This release adds support for the < XX >< XX > and %uXXXX formats.

What is new in version 4.10:

  • This release fixes a bug that made the Y argument to the -a flag of ascii2uni a no-op, and corrects the man pages and help for the Y and Q arguments to the -a flag for both programs.
  • The Y argument is now an error for uni2ascii.
  • The version information and action summaries are more informative.

Similar Software

Landslide
Landslide

11 May 15

JCAM Engine
JCAM Engine

3 Jun 15

cssutils
cssutils

14 Apr 15

Other Software of Developer Bill Poser

WAVE Utilities
WAVE Utilities

2 Jun 15

SndBite
SndBite

2 Jun 15

Redet
Redet

3 Jun 15

Comments to uni2ascii

Comments not found
Add Comment
Turn on images!