Bug 1945341 - Count characters more correctly for hyphenate-limit-chars thresholds. r=layout-reviewers,emilio

In particular, this avoids counting a combining mark at the end of the word
as if it were a separate character.

Differential Revision: https://phabricator.services.mozilla.com/D236467
This commit is contained in:
Jonathan Kew
2025-02-01 22:55:39 +00:00
parent 26c2609023
commit 7a435ce9a6
4 changed files with 15 additions and 7 deletions

View File

@@ -437,7 +437,11 @@ void nsLineBreaker::FindHyphenationPoints(nsHyphenator* aHyphenator,
uint8_t mState;
};
AutoTArray<BreakInfo, 16> oldBreaks;
for (uint32_t i = 0; i + 1 < string.Length(); ++i) {
// Don't consider setting any breaks where i >= endLimit, as they will
// definitely be too near the end of the word to be accepted.
uint32_t endLimit =
string.Length() - std::max<uint32_t>(1u, mHyphenateLimitEnd);
for (uint32_t i = 0; i < string.Length(); ++i) {
// Get current character, converting surrogate pairs to UCS4 for char
// category lookup.
uint32_t ch = string[i];
@@ -470,8 +474,13 @@ void nsLineBreaker::FindHyphenationPoints(nsHyphenator* aHyphenator,
break;
}
// Don't accept any breaks until we're far enough into the word.
if (length >= mHyphenateLimitStart && hyphens[i]) {
// Don't accept any breaks until we're far enough into the word, or if
// we're too near the end for it to possibly be accepted. (Note that the
// check against endLimit is just an initial worst-case check that assumes
// all the remaining characters are countable; if there are combining
// marks, etc., in the trailing part of the word we may need to reset the
// potential break later, after we've fully counted length.)
if (hyphens[i] && length >= mHyphenateLimitStart && i < endLimit) {
// Keep track of hyphen position and "countable" length of the word.
oldBreaks.AppendElement(BreakInfo{i + 1, length, aBreakState[i + 1]});
aBreakState[i + 1] = gfxTextRun::CompressedGlyph::FLAG_BREAK_TYPE_HYPHEN;
@@ -482,7 +491,6 @@ void nsLineBreaker::FindHyphenationPoints(nsHyphenator* aHyphenator,
++i;
}
}
++length; // Account for the last character (not counted by the loop above).
if (length < mHyphenateLimitWord) {
// After discounting combining marks, punctuation, controls, etc., the word

View File

@@ -1,4 +1,4 @@
<!DOCTYPE html>
<meta charset="utf-8">
<div lang="hi" style="width:0; hyphens:manual;">
सभी मनु&shy;ष्यों को गौरव और अधि&shy;का&shy;रों के मा&shy;&shy;ले में जन्म&shy;जात स्व&shy;&shy;न्त्र&shy;ता और समा&shy;&shy;ता प्रा&shy;प्त है ।
सभी मनु&shy;ष्यों को गौरव और अधि&shy;का&shy;रों के माले में जन्म&shy;जात स्व&shy;&shy;न्त्र&shy;ता और समा&shy;&shy;ता प्रा&shy;प्त है ।

View File

@@ -1,4 +1,4 @@
<!DOCTYPE html>
<meta charset="utf-8">
<div lang="kn" style="width:0; hyphens:manual;">
ಎಲ್ಲಾ ಮಾ&shy;&shy;&shy;ರೂ ಸ್ವ&shy;ತಂ&shy;ತ್ರ&shy;ರಾ&shy;ಗಿ&shy;ಯೇ ಜನಿ&shy;ಸಿ&shy;ದ್ದಾ&shy;ರೆ. ಹಾಗೂ ಘನತೆ ಮತ್ತು ಹಕ್ಕು&shy;&shy;&shy;ಲ್ಲಿ ಸಮಾ&shy;&shy;ರಾ&shy;ಗಿ&shy;ದ್ದಾ&shy;ರೆ.
ಎಲ್ಲಾ ಮಾ&shy;&shy;&shy;ರೂ ಸ್ವ&shy;ತಂ&shy;ತ್ರ&shy;ರಾ&shy;ಗಿ&shy;ಯೇ ಜನಿ&shy;ಸಿ&shy;ದ್ದಾರೆ. ಹಾಗೂ ಘನತೆ ಮತ್ತು ಹಕ್ಕು&shy;&shy;&shy;ಲ್ಲಿ ಸಮಾ&shy;&shy;ರಾ&shy;ಗಿ&shy;ದ್ದಾರೆ.

View File

@@ -1,4 +1,4 @@
<!DOCTYPE html>
<meta charset="utf-8">
<div lang="ml" style="width:0; hyphens:manual;">
മനു&shy;ഷ്യ&shy;രെ&shy;ല്ലാ&shy;&shy;രും തുല്യാ&shy;&shy;കാ&shy;&shy;ങ്ങ&shy;ളോ&shy;ടും അന്ത&shy;സ്സോ&shy;ടും സ്വാ&shy;&shy;ന്ത്ര്യ&shy;ത്തോ&shy;ടും&shy;കൂ&shy;ടി ജനി&shy;ച്ചി&shy;ട്ടു&shy;ള്ള&shy;&shy;രാ&shy;ണ്‌.
മനു&shy;ഷ്യ&shy;രെ&shy;ല്ലാ&shy;&shy;രും തുല്യാ&shy;&shy;കാ&shy;&shy;ങ്ങ&shy;ളോ&shy;ടും അന്ത&shy;സ്സോ&shy;ടും സ്വാ&shy;&shy;ന്ത്ര്യ&shy;ത്തോ&shy;ടും&shy;കൂ&shy;ടി ജനി&shy;ച്ചി&shy;ട്ടു&shy;ള്ള&shy;&shy;രാണ്‌.