UnicodeConverter is a program that converts text and HTML files in VNI, VISCII, VPS, TCVN3 (ABC), VIQR/Vietnet, NCR, or Unicode Composite (NFD) format to Unicode Precomposed (NFC) UTF-8. The program is capable of converting multiple files in a directory, or an entire directory, including its subdirectories.
Support for conversion of Microsoft Word, Excel, and PowerPoint documents in legacy Vietnamese encodings to Unicode 16-bit native format is included. Support for Rich Text Format is also provided.
UnicodeConverter.NET runs on Windows XP/Vista/7/8 platforms with Microsoft .NET Framework 4.0 installed and can be launched from Windows desktop or Explorer. In case if your system can not have .NET Framework support, use the Java version instead.
If you attempt to run a .NET application from a computer on which the .NET runtime is not installed, you will receive the error "A required .DLL file, MSCOREE.DLL, was not found" or "The dynamic link library mscoree.dll could not be found in the specified path..." To solve this problem, install the .NET Framework on the computer and try running the application again.
Note: It is recommended that Microsoft Word/Excel/PowerPoint not open any file when you convert Word/Excel/PowerPoint documents. It may cause errors or slow down the conversion process.
Tip: Minimize the number of text boxes within Word documents to a few; having too many will slow down conversion significantly.
You can select single or multiple files for conversion. When convert a directory, select any file in the directory to provide the program a cue as to what directory is to be converted; in cases if there is no file to be selected, create in that directory an empty file that has the same file extension as the type of file you want to perform conversion on.
Unicode Composite (NFD) source text files should be saved in UTF-8 format for correct conversion to Unicode Precomposed (NFC).
The resulting Unicode output files will be placed in a x_Unicode directory
located at the same tree level as the source directory that contains the original
files, which remain unchanged. Be sure to back up your files before converting them,
nevertheless.
The default fonts are Times New Roman and Arial for the output files. Users can
change to other Unicode-compliant fonts, using Unicode-compatible HTML editors and
word processors, such as FrontPage, Dreamweaver, or Microsoft Word. Do not use Unicode-incompatible
editors, such as Notepad of Win9x/Me, to edit UTF-8 files, for doing so would corrupt
the UTF-8 byte sequence, rendering the characters unreadable or incorrect.
To ensure successful conversion of HTML files in legacy formats and to minimize post-conversion editing, some pre-conversion conditioning may need to be performed on the source files. Changing the original document fonts to the more common ones with respect to its original encoding may be needed. Removing obsolete dynamic font links (.pfr or .eot) and associated ActiveX control scripts (e.g., tdserver.js) is also recommended, for leaving them in will needlessly slow down page download.
These basic editing tasks should be done prior to the actual conversion process and can be expeditiously performed by using MDI (multiple document interface) text editors which allow opening multiple files and performing global find/replace actions on all open files at once. CuteHTML, TextPad, UltraEdit, EditPlus, and EditPad are some text editors that sport such useful features. They can be searched and downloaded from http://www.download.com.
Encoding |
Fonts for original HTML documents |
VNI |
VNI-Times, VNI Times, VNI-Aptima, VNI Aptima, VNI-Helve, VNI Helve |
VPS |
VPS Times, VPS Helv |
VISCII |
VI Times, VI Arial, HoangYen, MinhQuân, PhuongThao, ThaHuong, UHoài |
TCVN3 (ABC) |
.VnTime, .VnArial |
VIQR/Vietnet |
No font formatting |
Note: Due to the nature of TCVN3 encoding, conversion of some Vietnamese capital vowels will result in incorrect, lower case. Some post-conversion editing may be necessary.
Unicode has only limited support in Windows 95/98/Me, but they are still capable of displaying all Vietnamese characters using appropriate Unicode fonts. Full Unicode support is built into Windows NT/2000/XP. Linux and Mac OS 8.5 or greater have begun to provide support Unicode. Mac OS X and Palm OS provide full Unicode support.
The following TrueType fonts, which come supplied with Windows 98SE/Me/2000/XP, contain many Unicode characters, including Vietnamese:
Times New Roman, Courier New, Arial, Tahoma, Verdana, Palatino Linotype
This list of Unicode fonts is by no means comprehensive, as there are more and more fonts are being commercially developed or expanded to include Unicode characters.
Note: Users of Windows 95/98/NT should download the latest versions of these fonts, as the older versions, which are not fully Unicode-compliant, would display question marks (?) or squares (◻) for unsupported characters. They can be downloaded from http://sourceforge.net/projects/corefonts or http://sourceforge.net/projects/vietunicode.