VIETNAMESE SORTING RULES
FOR DICTIONARY ENTRIES

Vũ Xuân Lương

Vietnam Lexicography Centre
http://www.vietlex.com

 

1. All main entries (word units) in dictionaries are listed in the following alphabetical order:

a ă â b c d đ e ê f g h i j k l m n o ô ơ p q r s t u ư v w x y z

and

2. in the following tone-mark order:

flat accent (non-accent), grave, hook above, tilde, acute, dot below

The two above rules combine as follows:

a à ả ã á ạ ă ằ ẳ ẵ ắ ặ â ầ ẩ ẫ ấ ậ b c d đ e è ẻ ẽ é ẹ ê ề ể ễ ế ệ
f g h i ì ỉ ĩ í ị j k l m n o ò ỏ õ ó ọ ô ồ ổ ỗ ố ộ ơ ờ ở ỡ ớ ợ
p q r s t u ù ủ ũ ú ụ ư ừ ử ữ ứ ự v w x y ỳ ỷ ỹ ý ỵ z

3. The base unit to be sorted is a block of consecutive characters (character composition), monosyllables or polysyllables, accounted from left to right. The lowercase character has a higher priority than the uppercase character. A character block which has fewer characters (syllable) is always sorted before a longer character block which is prefixed by the shorter character block. For example:

a (area measure unit) is listed before A (Ampere)
cha is listed before chan

4. Priority is given first to alphabet letters and then to tone marks when sorting.

   4.1. For monosyllables, if the syllables[1] (delimited by spaces) are different in the character blocks, they are sorted based on the characters, ignoring the tone marks.

ang is listed before anh in every case regardless of the tone marks they may have, for g in ang is listed before h in anh
ác is listed before ách because ac+zero[2] is listed before ac+h regardless of the tone marks they may have
apatit is listed before apxe, for apa- is listed before apx-

   4.2. For monosyllables, if syllables are not different in character blocks, they are sorted based on tone marks.

ba < < bả
hai
< hài < hại

   4.3. For polysyllables, sorting is based on the order of each syllable, from left to right, and then the tone marks (combination of 4.1 and 4.2).

ba bể is listed before ba gác, for bể is listed before gác
ba bể, ba gác, ba que are listed before bà cô, for ba in those entries is listed before in bà cô

Thus, any syllable prefixed by ba to form a new main entry word (polysyllable) is always listed before any entry prefixed by syllable .

Note:

1. In popular written phonetic form of words of foreign origins such as cu-lông, a-xpi-rin, the hyphen is considered as zero; and those words are treated as polysyllables for sorting. For example:

a-xpi-rin is listed after a tòng, for a+zero+x is listed after a+zero+t
a-xpi-rin is listed before à and à ơi, for a is listed before à

2. Symbols and digits are listed before letters. For example:

!, #, $, %, &, @,..., 0, 1, 2,..., 9 are always listed before a, b, c
B1 is listed before B40 and ba


[1] The definition of "syllable" is not precise for words of foreign origins, which are mostly polysyllables. For convenience, however, space-delimited words are considered as monosyllables, equivalent to Vietnamese syllables.

[2] Spaces after a character block (syllable) are considered as zero, which is sorted even before a, A letters in any case.