Hany's Home Page

Software

Hany Char Convertor 0.5.7

[Author] [Contributors] [Description] [Copying] [Requirements] [Download] [Installation] [Usage] [Contribute] [Changelog] [Todo]

Author

Peter Hanecak <hanecak (at) megaloman.sk>

Current maintainer is Peter Hanecak <hanecak (at) megaloman.sk>.

Contributors

Sano Kurthy <kurthy (at) sopsr.sk>

Thank you.

Description

HCC is char filter used to replace certain characters by another characters one by one. Usefull for converting plain texts from one character set to another.

Currently available filters:

Copying

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

See the COPYING file or GNU General Public License page for license information.

Copyright 1998,.. by Peter Hanecak <hanecak (at) megaloman.sk>.
All rights reserved.

Requirements

hd2u

Hd2u is required to perform end-of-line conversion from UNIX to MS-DOS format and vice versa.

For more information about hd2u see http://www.megaloman.com/~hany/software/hd2u/. Sources can be downloaded from http://www.megaloman.com/~hany/_data/hd2u/. RPM package of hd2u can be found for example at http://www.megaloman.com/~hany/RPM/hd2u.html.

Download

You can find sources at:

  1. http://www.megaloman.com/~hany/_data/hcc/
  2. http://terminus.sk/~hany/_data/hcc/

Also you can download RPM packages from:

  1. http://www.megaloman.com/~hany/RPM/hcc.html

To verify files, use my public key.

Installation

First, you need some UNIX system (maybe some other too) which basic development utilities alredy installed (make, sed, install, ...).

After you sucessfuly downloaded and unpacked source tarball, do the following in source directory:

$ make
$ make install

By default, binary is installed in /usr/local/bin and filters in /usr/local/lib/hcc. If you want to use diferent directories, run following (example will install binary into /usr/bin and filters into /usr/lib/hcc):

$ make prefix=/usr/ install

Usage

hcc

For now hcc is just filter which is processing data from standard input and writing results to standard output.

You can run it with following parameters:

hcc <conversion table> [<conversion table file name>]
 
conversion table   which conversion table to use
conversion table file name   file where is stored conversion table (default is /usr/local/lib/hcc/hcc.ct)

Example:

u2w

This utility converts text in ISO-8859-2 encoding to text in win1250 (using hcc with iso88592-win1250 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).

Usage of u2w:

u2w [-b] [-n] <text file> [<text file> [...]]
 
-b   create backup copies (.bak files)
-n   do not convert end-of-line characters

Example:

Note: Running this utility on one file more than once may corrupt content of the file so that it no longer contains data in original character set nor in expected character set.

w2u

This utility converts text in win1250 encoding to text in ISO-8859-2 (using hcc with win1250-iso88592 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).

Usage of w2u:

w2u [-b] [-n] <text file> [<text file> [...]]
 
-b   create backup copies (.bak files)
-n   do not convert end-of-line characters

Example:

Note: Running this utility on one file more than once may corrupt content of the file so that it no longer contains data in original character set nor in expected character set.

w2a

This utility converts text in win1250 encoding to text in ISO-8859-1 (using hcc with win1250-iso88591 conversion table). It is also converting EOL characters using 'dos2unix' (from hd2u package).

In the process of conversion national characters are converted to "nearest" ISO-8859-1 equivalent thus backward conversion to same source text is not posible.

Usage of w2a:

w2a [-b] [-n] <text file> [<text file> [...]]
 
-b   create backup copies (.bak files)
-n   do not convert end-of-line characters

Example:

u2a

This utility converts text in ISO-8859-2 encoding to text in ISO-8859-1 (using hcc with win1250-iso88591 conversion table).

In the process of conversion national characters are converted to "nearest" ISO-8859-1 equivalent thus backward conversion to same source text is not posible.

Usage of u2a:

u2a [-b] <text file> [<text file> [...]]
 
-b   create backup copies (.bak files)

Example:

How to contribute

If you would like to submit a patch, send it to me <hanecak (at) megaloman.sk>. Please be sure to include a textual explanation of what your patch does.

The preferred format for changes is 'diff -u' output. You might generate it like this:

$ diff -urN hcc-orig hcc-work > mydiffs.patch

ChangeLog

0.5.7-1 2003/10/23 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.h, hcc.cc: read/write buffer implemented - increases performance about 3 times
0.5.6-1 2003/09/17 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct/hcc.ct: added <0x82>,' <0x86>:+, <0x8B>:<0x3C>, <0x91>:', <0x97>:-, <0x9B>:<0x3E> and <0xA6>:| translations to win1250-iso88592 and win1250-iso88591 maps
  2003/09/11 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct/hcc.ct:
    • transformed to use '\' as line continuation (to better handle changes using 'diff')
    • reordered translations
    • added <0x8C>:<0xA6>, <0x8F>:<0xAC>, <0x9C>:<0xB6>, <0x9F>:<0xBC>, <0xA1>:<0xB7>, <0xA5>:<0xA1> and <0xB9>:<0xB1> translations to win1250-iso88592 map
    • added <0x8C>:S, <0x8F>:Z, <0x9C>:s and <0x9F> translations to win1250-iso88591 map
    • added <0xA1>:<0xA5>, <0xA6>:<0x8C>, <0xAC>:<0x8F>, <0xB1>:<0xB9>, <0xB6>:<0x9C>, <0xB7>:<0xA1> and <0xBC>:<0x9F> translations to iso88592-win1250 map (now ISO-8859-2 -> win1250 conversion should be complete/loseless)
    • <0xA6>:S, <0xAC>:Z, <0xB6>:s and <0xBC>:z translations to iso88592-cp852 map
  • src/ct.cc:
    • treat '\' in conversion table file as a line continuation (everything after it and it itself is stripped from the line and next line is then appended at the end)
    • fixed compilation with GCC 3.x
    • fixed small typos
  2003/08/07 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct/hcc.ct: added <0x94>:" translation to win1250-iso88592 and win1250-iso88591 map
  • README: fixed small typo
0.5.5-1 2003/08/05 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct/hcc.ct:
    • added <0x92>:' translation to win1250-iso88592 and win1250-iso88591 map
    • synchronized translations in win1250-iso88591 map with win1250-iso88592 (<0x95>:-, <0x92>:' and <0x93>:")
0.5.4-1 2003/02/26 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct.h, ct.cc: moved some declarations from .h to .cc to avoid compiler warnings
  • ct.cc, hcc.cc: fixed compilation with GCC 3.x
  • README:
    • added note to u2w and w2u usage information about possibility of file content corruption if conversion is done more than once on same file
    • fixed typo
  • ?2? scripts: quoted file name arguments so conversion of files with spaces in file name does not fail
0.5.3-1 2002/01/26 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.ct: fixed typo in 'sv' translation in iso88592-win1250 map (thanks to Sano Kurthy <kurthy (at) sopsr.sk>)
  • hcc.cc: fixed missing '#include <string.h>'
  • u2w, w2a, w2u: handle command line parameters same way as u2a (i.e. differently than versions! run with '--help' for help)
  • README: fixed and updated
0.5.2-1 2001/02/28 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.ct: fixed 'Lv' and 'e/' translations in iso88592-cp852 map
  • hcc.ct: added '<0x93>' -> '"' conversion into win1250-iso88592 map
  • Makefile: added '-Wall -pedantic' into 'CCOPT'
  • ct.cc: removed unused 'lChar' and 'rChar' variables
  • hcc.cc: removed unused 'n1' variable
0.5.1-1 2000/10/31 Peter Hanecak <hanecak (at) megaloman.sk>
  • u2a: process stdin, output is going to stdout, when no parameters are given (note: run 'u2a --help' for help)
0.5.0-1 2000/10/25 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.ct: added iso88592-iso88591 map
  • added u2a script (note: it handles parameters diferently than other scripts! run 'u2a --help' for help)
0.4.0-1 2000/07/01 Peter Hanecak <hanecak (at) megaloman.sk>
  • Makefiles: some hagling to make building somehow more comfortable
  • some documentation added
0.4.0-0 1999/11/25 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.cc: more maps added to hcc.ct
  • hcc.cc: more than half a year of testing
  • hcc.cc: first release candidate (even without documentation)
0.3.0-2 1999/03/29 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.cc: ctSize handling fixed (till now, it has been ignored)
  • hcc.cc: char as index to c.t. handling fixed (at least for i386 linux platform)
0.3.0-1 1999/03/26 Peter Hanecak <hanecak (at) megaloman.sk>
  • ct.cc: comments ('#')
0.2.0-1 1999/03/25 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.cc: ct.cc
  • ct.cc: basic operations
0.1.0-1 1999/03/19 Peter Hanecak <hanecak (at) megaloman.sk>
  • hcc.cc: basic operations

TODO