Need Help, Removing Accents from Vietnamese Words

Java/Windows/.NET conversion utility

Moderator: quân

Need Help, Removing Accents from Vietnamese Words

Postby bond777 » Sat Jul 16, 2005 4:28 pm

I wonder if any one can help me writing a javascript to remove accents from Vietnamese entry on a search text box. I have a data base stored lots of vietnamese without accents, however, some users when do searching type in Vietnamese with accents which prevent SQL searching.
The javascript would remove/strip off the accents and convert
into non accent words.
For example:
converting these vowels/special characters with accents to: =>
------
á à ã ạ â ấ ầ ẫ ậ ă ắ ằ ẵ ặ => a
Á À Ã Ạ Â Ấ Ầ Ẫ Ậ Ă Ắ Ằ Ẵ Ặ => A
é è ẽ ẹ ế ề ễ ệ => e
É È Ẽ Ẹ Ê Ế Ề Ễ Ệ => E
í ì ĩ ị => i
Í Ì Ĩ Ị => I
ó ò õ ọ ô ố ồ ỗ ộ ơ ớ ờ ỡ ợ => o
Ó Ò Õ Ọ Ô Ố Ồ Ỗ Ộ Ơ Ớ Ờ Ỡ Ợ => O
ú ù ũ ụ ư ứ ừ ữ ự => u
Ú Ù Ũ Ụ Ư Ứ Ừ Ữ Ự => U
ý ỳ ỹ ỵ => y
Ý Ỳ Ỹ Ỵ => Y
Đ => D
đ => d
-------------
Thanks,





Thanks.
bond777
 
Posts: 1
Joined: Sat Jul 16, 2005 4:16 pm

Postby quân » Wed Jul 20, 2005 6:17 am

This accent removing (diacritics stripping) function has been implemented in VietPad. Whether the implementation can be ported to Javascript, it's up to you to find out. Basically, the string is decomposed and then the diacritical marks are deleted out.

Also, you can use Regular Expression to replace the accented characters with plain ones, for example, in pseudo code for letter a:

string.replace("[áàãạâấầẫậăắằẵặ]", "a");

Make sure the string has been normalized.

If Javascript does not support these operations, you may want to consider having it performed by a program on the server.
quân
 
Posts: 236
Joined: Sat Nov 16, 2002 1:51 am
Location: Oxnard, CA - USA

Postby quân » Sat Dec 10, 2005 3:23 am

This capability has also been incorporated in VietIME 1.2.
quân
 
Posts: 236
Joined: Sat Nov 16, 2002 1:51 am
Location: Oxnard, CA - USA


Return to UnicodeConverter

Who is online

Users browsing this forum: No registered users and 1 guest