#============================================================================ # Enca v1.9 (2005-12-18) guess and convert encoding of text files # Copyright (C) 2000-2003 David Necas (Yeti) #============================================================================ List of user-visible changes in Enca Legend: + new feature * change of behaviour (including disappearing of a feature) - bugfix enca-1.9 2005-12-18 + support for HZ encoding * Big5 and GBK detection improved - enca.spec no longer installs docs to world-unreadable directory enca-1.8 2005-11-24 + Chinese (Big5 and GBK) support (thanks to Zuxy) * deb/ subdirectory is gone as there is finally an Enca package in Debian (thanks to Michal Cihar) - manual page clean-up (thanks to Michal Cihar) enca-1.7 2005-02-27 + new name type: preferred MIME name (option -m) - broken iconv detection on some system was fixed enca-1.6 2004-09-01 * English language names (--list=languages, enca_language_english_name()) were changed to lowercase to match common locale aliases - Win32, i.e. MinGW and Cygwin, build problems were fixed enca-1.5 2004-05-30 - crash on impossible recovery after iconv failure in pipe was fixed - rpm building problems on Mandrake Linux were fixed enca-1.4 2004-05-12 - dependency of guessing API on locales (via ctype functions) was fixed - --help text generation failure on some systems was fixed enca-1.3 2003-12-24 + [libenca] it's possible to get analyser option values, not just set them * a good BOM (byte order mark) increases the chance of being recognized for UCS-4 and UTF-8 too * external convertor wrappers were moved from bin to libexec and the b- prefix was removed (though it still works) * external convertors are no longer searched in PATH, nonstandard ones has to be specified with full path enca-1.2 2003-11-26 - fixed segfault in language detection for some locale setups enca-1.1 2003-11-17 - fixed losing data at the end of file when using external convertors in a pipe (and maybe in other situations) - [libenca] enca_analyser_free() not freeing analyser completely was fixed enca-1.0 2003-11-06 * deprectated options -T, -R, -S, -u, -U, -m, and -M were finally removed * default HTML API docs installation path changed to the new gtk-doc style (DATADIR/gtk-doc/html/enca) * debian/ subdir moved to deb/ to allow official deb creation w/o too much hassle enca-0.99.4 2003-07-15 - several race conditions in librecode and iconv interfaces were fixed - temporary file names are much less predictable now enca-0.99.3 2003-06-30 * Debian package is back from death * failure to find external convertor is now fatal - fixed build problems on FreeBSD (and probably other Unices) - libiconv is not used for `conversion to ASCII' since never does the Right Thing, whatever it is - when conversion with libiconv fails, the file should now survive intact - fixed build problems on systems w/o libiconv (hopefully) - fixed distclean and uninstall targets to really clean and uninstall everything - fixed builds with separate source (read-only) and build directories - fixed builds with --without-libiconv and --without-librecode on GNU/Linux - external convertor is not checked when it's not going to be used enca-0.99.2 2003-06-25 + EOL type is used to decide ambiguous cases, e.g. CP1250 is reported instead of ISO-8859-2/CRLF * --list languages by default prints English names, instead of ISO-639a codes, use -e or -r to get the old listing * if LC_CTYPE is something like en_US, more locale categories are examined to detect the language * cork charset was modified to contain \n, \r and \t in the same places as ASCII * some heuristics tuning enca-0.99.1 2003-06-22 + libenca pkg-config support * all libenca tuning parameters (-T, -R, -S, -u, -U, -m, and -M) were marked deprecated and are noop, Enca should DWIM * ambiguity is now always OK when the sample has the same meaning in all the charsets * deprecated `built-in-encodings' and `encodings' lists were removed * PAGER feature was removed - exchanged `latvian' and `lithuanian' language names were fixed (`lv' and `lt' were always OK) - missing tests for the new languages was added to the test suit enca-0.99.0 2003-06-14 + added some support for: Bulgarian, Croatian, Estonian, Hungarian, Latvian, Lithuanian, Slovene + a new algorithm for 8bit-dense languages (cyrillics), the old one is used as a fallback * removed support for non-transitive iconv (such a thing should not exist) * auxiliary tools in data are not longer built in regular builds, use --enable-maintainer-mode to rebuild them, create dists, etc. - fixed iconv interface surface check pickier than iconv itself inhibiting some otherwise possible conversions - fixed u+x permissions on temporary files (from 0.10.7) - fixed not deleting temporary files in iconv interface - fixed broken iconv interface behaviour in pipes - fixed iconvcap misdetecting Latin5 as ISO-8859-5 - fixed casual `make distclean' failures enca-0.10.7 2003-01-28 - fixed interchanged iconv and cstocs encoding names - corrected(?) librecode surface interaction - fixed a temporary file creation race condition * added tex and utf8 to cstocs (names and b-cstocs) enca-0.10.6 2002-10-22 + enconv uses DEFAULT_CHARSET variable, exactly as recode - ENCAOPT works everywhere, albeit imperfectly - options -P and -p no longer imply -M too - ambiguous mode (-M) works again - pager is run so that help text doesn't disappear - standard input it printed as STDIN with -d, not as null - make check works again - it compiles wihtout recode again enca-0.10.5 2002-10-13 + UTF-8 recognition in binary and otherwise messy files + detection of double-encoding from some 8bit charset to UTF-8 + Cork encoding conversion * librecode interaction was (hopefully) improved - fixed some build-time problems enca-0.10.4 2002-10-10 + added Cork encoding support for Czech, Slovak and Polish - empty files are now considered convertible to any encoding - removed the so-called faster (in fact slower) I/O - fixed some more compile-time search path issues enca-0.10.3 2002-09-22 * added support for perl umap as external convertor - fixed external convertor wrappers to work with standard sh - fixed some compile-time library search path issues enca-0.10.2 2002-09-15 + target charset is automatically obtained from locales when called as enconv, new options --guess, --auto-convert + English language names can be used instead of ISO-639 codes everywhere - cs_SK and ru_UA locales are properly recognised as Slovak and Ukrainian enca-0.10.1 2002-08-29 + faster I/O * external convertors can be disabled at build time - `-' is accepted for standard input - fixed broken built-in convertor - fixed crasing on an unknown language - trivial (identity) conversions are not performed any more - help is now printed when input is a terminal and no argument specified - changed braindamaged , to STDIN, STDOUT in messages - various small fixes and build-time improvements enca-0.10.0 2002-08-26 + added support for Ukraininan (CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr CP1125), Belarussian (CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr, IBM855) and Polish (ISO-8859-2, ISO-8859-12, ISO-8859-16, Baltic, macce, IBM852, CP1250) + Enca library introduced * dropped native Debian package * --details no longer prints guessing details (now is mostly like --human) * --list=encodings, --list=built-in-encodings corrected to --list=charsets, --list-built-in charsets (old names supported with a warning) * improved Czech and Slovak charsets detection enca-0.9.4: 2002-03-03 - built-in convertor didn't convert more than first 64kB of a file enca-0.9.3: 2001-07-16 + a native Debian package - fixed random reporting of nonsense results - fixed self-contradictory --details output when file was quoted-printable encoded - fixed poor performance on non-GNU/Linux - made pager less intrusive (instead of intrusive `less' ;-) - --list=encodings prints only `known' encodings - fixed several compile-time/portability problems enca-0.9.2: 2001-07-13 * --help and --license are displayed through pager (when possible) - fixed broken language hooks--they were never activated (from 0.9.1) - fixed reporting ASCII when a 7bit encoding was detected - fixed boundary-case behaviour when recovering from librecode failures enca-0.9.1: 2001-06-25 + support for Macintosh Cyrillic, including conversion + support for unusual UCS-4 byte orders (3412 and 2143) + new option --license printing full enca license * exit codes now make sense (0, 1, 2; where 2 means serious troubles) - temporary files are no longer world-readable enca-0.9.0: 2001-03-26 Serious incompatibilities: * -E and -C option letters exchanged (much better mnemonics) * convertor wrappers renamed to b-cstocs and b-recode * finding only 7bit ASCII is no longer considered failure * need to use --language to set language (sometimes) * dull convertor behaviour no longer supported, -x syntax changed * option -g removed (try --name=aliases) * option -c changed to --list=convertors, listing format changed * option -l changed to --list=encodings, listing format changed * convertor names are no longer case insensitive * no longer uses cstocs names as canonical * external convertors are called with Enca's names, not cstocs's Other changes: + support for slovak and russian (and `none') language + support for CP1251, IBM866, ISO-8859-5 and KOI8-R, including conversion + UCS-2, UCS-4, UTF-8, UTF-7 and LaTeX encoding recognition + much more encoding aliases accepted + long `GNU style' command line options + new output types: --enca-name, --iconv-name + output type --name=WORD allowing to select output type by name + ENCAOPT environment variable + language detection from locales + support for surfaces (experimental) + new option --list printing various listings + new convertor wrapper b-map (for perl `map') + new option -m to reset -M back + new language filters + new options -u and -U to control multibyte encoding checks + included [generated] enca.spec into the tarball to allow `rpm -tb' * -d output improved * read limit changed to 16MB * librecode now run with flags diacritics_only and ascii_graphics - fixed broken -P options - fixed several build problems on non-GNU/Linux systems - fixed some missing and wrong characters in Unicode data - temporary copy of damaged original file is not deleted when rescue fails enca-0.8.x: Since features planned for 0.8 and 0.9 happened to be developed simultaneously, this version number has been skipped. enca-0.7.7: 2001-01-01 + ability to use UNIX98 iconv conversion functions + the word `none' can be used as -E parameter causing clearing of convertor list - fixed disarranged help text, misspelled word `European' in macce long name, obsolete statements in manual page and other stuff of this kind enca-0.7.6: 2000-11-20 + any convertor combination/order can be now specified with -E, old -E meaning is no longer valid + new option -c (list all valid convertor names) * cork encoding not supported anymore * better verbosity * `/' is added to recode recoding requests thus partially solving the surface problem---surface never changes * some errors like specifying invalid value of threshold are no longer fatal, the bad values are ignored instead * handling of some exotic characters in bulit-in convertor slightly changed - fixed several fatal bugs regarding stdin to stdout conversion - stdin is copied to stdout in case of failure whenever possible/applicable enca-0.7.5: 2000-10-25 * license changed to GNU GPL Version 2 (i.e. license version is explicitly specified) * prints error message when conversion is impossible * binary data filter improved/changed - fails back to external convertor when GNU recode library cannot convert due to errorneous request - '' no longer causes enca to read from stdin - tries to restore files damaged by GNU recode library enca-0.7.4: 2000-10-12 + box-drawing characters are (carefully) filtered out when guessing - fixed intermixed behaviour in SMS/nonSMS modes enca-0.7.3: 2000-10-09 + blocks of probably binary data are filtered out when guessing * standard input is copied to standard output when its encoding is unknown - fixed reading only 4096 bytes from pipe (from 0.7.1) enca-0.7.2: has been never released + GNU recode recoding chains made possible by starting -x (convert) parameter with `..' + second best guess is marked with `-' in -d (print details) output enca-0.7.1: 2000-10-02 * in case of nonfatal i/o failure enca continues processing remaining files enca-0.7.0: 2000-09-26 + standard input to standard output conversion + short message mode -M + ability to use GNU recode library + new output type -r (encoding name after RFC1345) + ability to convert cork internally + new external convertor brecode (recode wrapper) + new output type -g (list of aliases) + new option -V (verbose) * -x (convert) paramteres syntax changed to in_enc..out_enc (old syntax still supported, will be removed in 0.8.x) * option -e (disable external) no longer supported, empty string as -C (external convertor) parameter can be used instead * encoding names specified as -x (convert) parameters are case insensitive * ascii is not considered unknown encoding (i.e. failure) so enca returns 0 * -d (print details) output improved/changed/updated * -p (prefix result with file name) no longer prints conversion details * by default result is prefixed by file name when enca is run on more than one file enca-0.6.2: 2000-08-17 + help texts (-h and -v) made usable (thanx to Halef) enca-0.6.1: 2000-08-15 - tarball bugfix enca-0.6.0: 2000-07-20 + bulilt-in convertor + -x (convert) can now take form -x in_enc,out_enc causing enca to behave like a dull convertor + new options -e and -E (disable internal/external convertor) + new option -l (print internally-convertible encodings) enca-0.5.0: 2000-07-17 * -p (prefix result with file name) causes enca to print what is converted and how * iso8859-2/cp1250 recognition improved - doesn't spawn external convertors as fast as is possbile, but waits for them to return - fixed `Unrecognized encoding' when winner is 1250 (from 0.4.3) - corrected -d (print details) table alignment enca-0.4.3: 2000-07-14 * -d (print details) prints encodings alphabetically sorted - corrected short encoding name t1 -> cork - division-by-zero bugfixes enca-0.4.2: has been never released * options -m/-M ([don't] use iso8892-2/cp1250 hack) no longer supported - fixed showing standard input as empty string ( is printed now) enca-0.4.1: 2000-07-12 * default of 60 significant characters changed to 10 enca-0.4.0: 2000-07-10 + first public release