Given here is the instruction on how to use UnicodeConverter to convert Vietnamese text, RTF, HTML, and Word/Excel/PowerPoint files in legacy encodings—VNI, VPS, VISCII, TCVN3, VIQR/Vietnet—NCR (windows-1252, iso-8859-1), and Unicode Composite (NFD) to Unicode Precomposed (NFC). The program comes in three versions—Java, Windows, and .NET; they sport similar graphic user interface and are capable of converting multiple files, a directory, including subdirectories, or an entire website.
The Java program requires
Java Runtime Environment 6 or later. You can launch the program by double-clicking
on the UnicodeConverter.jar
file. If that does not work, you can associate the .jar file extension with the
Java interpreter to make it executable by mouse clicking.
The .NET version requires Microsoft .NET Framework 4.0 Redistributable.
To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning may need to be performed on the source files. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is recommended (yellow text in the illustration), for leaving them in will needlessly slow down page download.
<html> <head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252"> <title>HISTORY OF VIETNAM</title> <link REL="FONTDEF" SRC="http://www.nonsong.org/ns.pfr"> <script LANGUAGE="JavaScript" SRC="http://www.nonsong.org/tdserver.js"> </script> <link> </head> <body bgcolor="#FFFFFF" link="#FF0000" vlink="#FF0000"> <font FACE="VNI-Times"> <h1>HISTORY OF VIETNAM</h1>
Changing the original document fonts to the more common ones with respect to its original encoding may also be needed.
Encoding | Fonts for original HTML document |
VNI | VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI- Helve, VNI Helve |
VPS | VPS Times, VPS Helv |
VISCII | VI Times, VI Arial, HoangYen, MinhQuân, PhuongThao, ThaHuong, UHoŕi |
TCVN3 | .VnTime, .VnArial |
VIQR | No font formatting |
These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, EditPad, etc. are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.
UnicodeConverter.jar
file or icon or by executing the following command at the command line:java -jar UnicodeConverter.jar
javaw -jar UnicodeConverter.jar
UnicodeConverter.jar
file is the current directory.Uni.exe
from Windows desktop or explorer.
The resulting Unicode output files are placed in a x_Unicode
directory
located at the same tree level as the source directory that contains the original
files, which remain unchanged. Verify the UTF-8 encoded HTML files using any Unicode-enabled
web browsers, such as Firefox, Netscape, Internet Explorer, Mozilla, Opera, or Safari.
The default fonts for the output files are Times New Roman and Arial. Users can change to other Unicode-compliant fonts, using Unicode-compatible editors or word processors such as FrontPage or Word. Do not use Unicode-incompatible editors (such as Notepad of Win9x/Me) to edit UTF-8 files. Doing so would corrupt the UTF-8 byte sequence, rendering the characters or the file unreadable.
Note: It is recommended that Microsoft Word/Excel/PowerPoint not open any
file when you convert Word/Excel/PowerPoint documents. It may cause errors or slow
down the conversion process.
Tip: Minimize the number of text boxes within Word documents to a few; having
too many will slow down conversion significantly.