Given here is the instruction on how to use UnicodeConverter to convert Vietnamese text, RTF, HTML, and Word/Excel files in legacy encodings—VNI, VPS, VISCII, TCVN3, VIQR/Vietnet—NCR (windows-1252, iso-8859-1), and Unicode Composite (NFD) to Unicode Precomposed (NFC). The program comes in three versions—Java, Windows, and .NET; they sport similar graphic user interface and are capable of converting multiple files, a directory, including subdirectories, or an entire website.
The Java program requires Java Runtime Environment, 6 or later. You can launch the program by double-clicking on the Uni.jar file. If that does not work, you can associate the .jar file extension with the Java interpreter to make it executable by mouse clicking.
The Windows (J++) version requires Microsoft Java Virtual Machine to run. Any problems encountered while launching the program can mostly be solved by installing or updating to the latest Microsoft VM (build 3805 or later) through Microsoft Windows Update.
The .NET version requires Microsoft .NET Framework 2.0 Redistributable.
To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning may need to be performed on the source files. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is recommended (yellow text in the illustration), for leaving them in will needlessly slow down page download.
Changing the original document fonts to the more common ones with respect to its original encoding may also be needed.
|Encoding||Fonts for original HTML document|
|VNI||VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI- Helve, VNI Helve|
|VPS||VPS Times, VPS Helv|
|VISCII||VI Times, VI Arial, HoangYen, MinhQuân, PhuongThao, ThaHuong, UHoŕi|
|VIQR||No font formatting|
These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, EditPad, etc. are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.
The resulting Unicode output files are placed in a x_Unicode directory located at the same tree level as the source directory that contains the original files, which remain unchanged. Verify the UTF-8 encoded HTML files using any Unicode-enabled web browsers, such as Firefox, Netscape, Internet Explorer, Mozilla, Opera, or Safari.
The default fonts for the output files are Times New Roman and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible editors or word processors such as FrontPage or Word. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable.
Note: It is recommended that Microsoft Word/Excel not open any
file when you convert
Word/Excel documents. It may cause errors or slow down the conversion process.
Tip: Minimize the number of text boxes within Word documents to a few; having too many will slow down conversion significantly.