* Properly take into account non-ASCII bytes at word boundaries for windows-1252. (Especially relevant for Italian, Catalan, Icelandic, and Faroese.) * Move Estonian from the Baltic model to the Western model. This improves overall Estonian detection but causes š and ž encoded as windows-1257, ISO-8859-13, or ISO-8859-4 to get misdecoded. (It would be possible to add a post-processing step to adjust for š and ž, but this would cause reloads given the way chardetng is integrated with Firefox.) * Improve Thai accuracy a lot. * Improve Vietnamese, Lithuanian, and Latvian accuracy a bit. * Improve accuracy for most Central European languages a bit. * Regress accuracy for some Central European languages a bit (as side effect of fixing Italian and Catalan). * Properly classify letters that ISO-8859-4 has but windows-1257 doesn't have in order to avoid misdetecting non-ISO-8859-4 input as ISO-8859-4. * Improve character classification of windows-1254. * Avoid classifying byte 0xA1 or above as space-like to avoid misdetection. * Reduce binary size. Differential Revision: https://phabricator.services.mozilla.com/D63197
15 lines
430 B
HTML
15 lines
430 B
HTML
<!doctype html>
|
|
<title>it windows-1252</title>
|
|
<script src=/resources/testharness.js></script>
|
|
<script src=/resources/testharnessreport.js></script>
|
|
<p>Questo è un test di codifica dei caratteri.</p>
|
|
<script>
|
|
setup({explicit_done:true});
|
|
onload = function() {
|
|
test(function() {
|
|
assert_equals(document.characterSet, "windows-1252", 'Expected windows-1252');
|
|
}, "Check detection result");
|
|
done();
|
|
};
|
|
</script>
|