Bug 1624244 - Exclude Japanese characters KATAKANA-HIRAGANA [SEMI-]VOICED SOUND MARK from the diacritics that can be ignored during search. r=m_kato

Differential Revision: https://phabricator.services.mozilla.com/D67834
This commit is contained in:
Jonathan Kew
2020-03-30 13:53:20 +00:00
parent 4eddabee9c
commit 3c0c598dc5
2 changed files with 11 additions and 1 deletions

View File

@@ -240,8 +240,15 @@ uint32_t CountGraphemeClusters(const char16_t* aText, uint32_t aLength);
// European accents and Hebrew niqqud, but not Hangul components or Thaana
// vowels, even though Thaana vowels are combining nonspacing marks that could
// be considered diacritics.
// As an exception to strictly following Unicode properties, we exclude the
// Japanese kana voicing marks
// 3099;COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK;Mn;8;NSM
// 309A;COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK;Mn;8;NSM
// which users report should not be ignored (bug 1624244).
inline bool IsCombiningDiacritic(uint32_t aCh) {
return u_getCombiningClass(aCh) != 0;
uint8_t cc = u_getCombiningClass(aCh);
return cc != HB_UNICODE_COMBINING_CLASS_NOT_REORDERED &&
cc != HB_UNICODE_COMBINING_CLASS_KANA_VOICING;
}
// Remove diacritics from a character