Commit Graph

387 Commits

Author SHA1 Message Date
Kagami Sascha Rosylight
ca80f84bf8 Bug 1767996 - Apply readability-make-member-function-const on dom/html r=smaug
Differential Revision: https://phabricator.services.mozilla.com/D145627
2022-05-13 05:35:25 +00:00
Henri Sivonen
99fdcabfad Bug 1712928 - Gather telemetry about encoding-unlabeled pages and about Repair Text Encoding usage situations. r=emk
In particular, gather telemetry to evaluate the impact of unlabeled UTF-8
and how detector-triggered reloads would change if ASCII-only at initial
guess was treated as UTF-8.

Differential Revision: https://phabricator.services.mozilla.com/D140818
2022-03-29 08:04:25 +00:00
Peter Van der Beken
b98d7adef3 Bug 1749935 - Create nsParser directly instead of with a CID. r=hsivonen
Differential Revision: https://phabricator.services.mozilla.com/D135896
2022-02-14 13:03:51 +00:00
Peter Van der Beken
e171802e54 Bug 1749935 - Remove unused support for nested CParserContext. r=hsivonen
Differential Revision: https://phabricator.services.mozilla.com/D135879
2022-02-14 13:03:47 +00:00
Sylvestre Ledru
35a175aa33 Bug 1754767 - Remove duplicate includes r=media-playback-reviewers,padenot
Differential Revision: https://phabricator.services.mozilla.com/D138441
2022-02-11 10:01:15 +00:00
Henri Sivonen
ed5cd60f12 Bug 1748608 - Remove unused aSink argument from Document::StartDocumentLoad. r=edgar
Differential Revision: https://phabricator.services.mozilla.com/D135111
2022-01-05 13:38:56 +00:00
Henri Sivonen
a8ed2707bb Bug 1701828 - meta charset rewrite. r=smaug
Implements https://github.com/whatwg/html/issues/6962 . Improves performance
when <meta charset> occurs in head but after the first kilobyte and aligns
behavior better with WebKit and Blink.

The main change is to avoid reloads when meta appears within head but
after the first kilobyte. Prior to this change, Gecko reloaded in that
case (in compliance with the spec!) even though WebKit and Blink did not.

Differences from WebKit and Blink:

* WebKit and Blink honor <meta charset> in <noscript>. This implementation
  does not.
* WebKit and Blink look for meta as if the tree builder was unaware of
  foreign content. This implementation is foreign content-aware. This
  makes a difference for CDATA sections that contain a > before the meta
  as well as style and script elements within foreign content. This could
  happen if the CDATA section that has mysteriously been introduced around
  a what looks like a meta tag also contains another prior tag-looking
  run of text.
* This implementation processes rel=preload and speculative loads that are
  seen before <meta charset> has been seen. WebKit and Blink instead first
  look for the meta and rewind before starting speculative parsing.
* Unlike WebKit, if there is neither an honored meta nor syntax resembling
  an XML declaration, detection from content takes place (as in Blink).
* Unlike Blink, if there is neither an honored meta nor syntax resembling
  an XML declaration, the detection from content is not dependent of network
  buffer boundaries.
* Unlike Blink, detection from content can trigger a reload at the end of
  the stream if the guess made at that point differs from the first guess.
  (See below for the definition of the input to the first guess.)

Differences from the old spec and Gecko previously:

* Meta inside script and RCDATA elements is no longer honored.
* Late meta is now ignored and no longer triggers a reload.
* Later meta counts as early enough meta: In addition to the previous
  meta within the first 1024 bytes, now a meta that started within the first
  1024 bytes counts as early enough. Additionally, if by then there hasn't
  been a template start tag and head hasn't ended, meta occurring before the
  earlier of the end of the head or a template start tag counts as early
  enough.
* Meta now counts as not-late even if the encoding label has numeric
  character reference escapes.
* Syntax resembling an XML declaration longer than a kilobyte is honored if
  there is no honored meta.
* If there is neither an honored meta nor syntax resembling an XML declaration,
  the initial chardetng scan is potentially longer than before: the first 1024
  bytes, the token spanning the 1024-byte boundary if there is such a token,
  and, if by then head hasn't ended and there hasn't been a template start tag
  until the end of the template start tag or the end of the token that causes
  head to end, ever comes first. However, if the token implying the end of the
  head is a text token, bytes only to the end of the previous non-text token is
  considered. (This definition avoids depending on network buffer boundaries.)
* XML View Source now uses the code for syntax resembling an XML declaration
  instead of expat for extracting the internal encoding label.

Reftest are added as both WPT and Gecko reftests in order to test both http:
and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order
to use the exact same bytes.

An encoding declaration has been added to a number of old tests that didn't
intend to test the new speculation behavior especially in the context of
https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 .

Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
Norisz Fay
e844dde8a6 Backed out changeset 3dfd3c94a105 (bug 1701828) for causing mochitest failures on browser_hsts_host.js CLOSED TREE 2021-12-07 12:05:44 +02:00
Henri Sivonen
37fade70f0 Bug 1701828 - meta charset rewrite. r=smaug
Implements https://github.com/whatwg/html/issues/6962 . Improves performance
when <meta charset> occurs in head but after the first kilobyte and aligns
behavior better with WebKit and Blink.

The main change is to avoid reloads when meta appears within head but
after the first kilobyte. Prior to this change, Gecko reloaded in that
case (in compliance with the spec!) even though WebKit and Blink did not.

Differences from WebKit and Blink:

* WebKit and Blink honor <meta charset> in <noscript>. This implementation
  does not.
* WebKit and Blink look for meta as if the tree builder was unaware of
  foreign content. This implementation is foreign content-aware. This
  makes a difference for CDATA sections that contain a > before the meta
  as well as style and script elements within foreign content. This could
  happen if the CDATA section that has mysteriously been introduced around
  a what looks like a meta tag also contains another prior tag-looking
  run of text.
* This implementation processes rel=preload and speculative loads that are
  seen before <meta charset> has been seen. WebKit and Blink instead first
  look for the meta and rewind before starting speculative parsing.
* Unlike WebKit, if there is neither an honored meta nor syntax resembling
  an XML declaration, detection from content takes place (as in Blink).
* Unlike Blink, if there is neither an honored meta nor syntax resembling
  an XML declaration, the detection from content is not dependent of network
  buffer boundaries.
* Unlike Blink, detection from content can trigger a reload at the end of
  the stream if the guess made at that point differs from the first guess.
  (See below for the definition of the input to the first guess.)

Differences from the old spec and Gecko previously:

* Meta inside script and RCDATA elements is no longer honored.
* Late meta is now ignored and no longer triggers a reload.
* Later meta counts as early enough meta: In addition to the previous
  meta within the first 1024 bytes, now a meta that started within the first
  1024 bytes counts as early enough. Additionally, if by then there hasn't
  been a template start tag and head hasn't ended, meta occurring before the
  earlier of the end of the head or a template start tag counts as early
  enough.
* Meta now counts as not-late even if the encoding label has numeric
  character reference escapes.
* Syntax resembling an XML declaration longer than a kilobyte is honored if
  there is no honored meta.
* If there is neither an honored meta nor syntax resembling an XML declaration,
  the initial chardetng scan is potentially longer than before: the first 1024
  bytes, the token spanning the 1024-byte boundary if there is such a token,
  and, if by then head hasn't ended and there hasn't been a template start tag
  until the end of the template start tag or the end of the token that causes
  head to end, ever comes first. However, if the token implying the end of the
  head is a text token, bytes only to the end of the previous non-text token is
  considered. (This definition avoids depending on network buffer boundaries.)
* XML View Source now uses the code for syntax resembling an XML declaration
  instead of expat for extracting the internal encoding label.

Reftest are added as both WPT and Gecko reftests in order to test both http:
and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order
to use the exact same bytes.

An encoding declaration has been added to a number of old tests that didn't
intend to test the new speculation behavior especially in the context of
https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 .

Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-07 07:35:32 +00:00
olaoluwa
613274e835 Bug 1575191- Make callers to Document::SetContentType with ASCII literal using SetContentTypeInternal.. r=hsivonen
Differential Revision: https://phabricator.services.mozilla.com/D129697
2021-11-04 06:47:05 +00:00
Henri Sivonen
7cc2f552d4 Bug 1716290 - Remove protections against the document changing as part of kCharsetFromFinalUserForcedAutoDetection reload. r=emk,emilio
NOTE! In cases where there is no HTTP-layer encoding declaration, and CSS
parsing inherits the encoding from the HTML document, for preloads, this
changes the inherited encoding from windows-1252 to UTF-8 in order to
make the speculative encoding correct in the common `<meta charset=utf-8>`
case.

Differential Revision: https://phabricator.services.mozilla.com/D123593
2021-08-26 18:02:15 +00:00
criss
3963232b21 Backed out changeset ab805f2926d5 (bug 1716290) for causing failures on link-header-preload.html. CLOSED TREE 2021-08-26 12:07:17 +03:00
Henri Sivonen
274b08a6fa Bug 1716290 - Remove protections against the document changing as part of kCharsetFromFinalUserForcedAutoDetection reload. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D123593
2021-08-26 06:25:31 +00:00
Peter Van der Beken
14b63eef21 Bug 1721153 - Remove unused aListener and aMode arguments of nsIParser::Parse. r=hsivonen
Differential Revision: https://phabricator.services.mozilla.com/D120217
2021-08-12 14:11:38 +00:00
Henri Sivonen
497eba31f2 Bug 1713627 - Remove code obsoleted by the replacing the Text Encoding menu with one item. r=jaws,emk
Differential Revision: https://phabricator.services.mozilla.com/D116391
2021-06-21 12:09:01 +00:00
Dorel Luca
48f1cd991c Backed out changeset 4891a17c55e2 (bug 1713627) for Browser-chrome failures in docshell/test/browser/browser_bug673087-1.js. CLOSED TREE 2021-06-21 12:10:54 +03:00
Henri Sivonen
f97066009d Bug 1713627 - Remove code obsoleted by the replacing the Text Encoding menu with one item. r=jaws,emk
Differential Revision: https://phabricator.services.mozilla.com/D116391
2021-06-21 08:09:43 +00:00
Henri Sivonen
05ce9e975a Bug 1687635 part 2 - Disable Repair Text Encoding when known not to have effect. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D112929
2021-06-01 05:15:49 +00:00
Sean Feng
43a422586d Bug 1690905 - Report the DOM portion of memory usage for data documents in about:memory r=smaug
Depends on D111317

Differential Revision: https://phabricator.services.mozilla.com/D111318
2021-05-27 17:55:45 +00:00
Dorel Luca
110c3fac1d Backed out 2 changesets (bug 1687635) for Browser-chrome failures in browser/components/customizableui/test/browser_967000_button_charEncoding.js. CLOSED TREE
Backed out changeset bd95df8be7ca (bug 1687635)
Backed out changeset 701981112733 (bug 1687635)
2021-05-27 09:48:00 +03:00
Henri Sivonen
dd66c6bb7d Bug 1687635 part 2 - Disable Repair Text Encoding when known not to have effect. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D112929
2021-05-27 05:49:15 +00:00
Iulian Moraru
bb9909b542 Backed out 2 changesets (bug 1687635) for causing bc failures on browser_967000_button_charEncoding.js. CLOSED TREE
Backed out changeset b0b7678a6781 (bug 1687635)
Backed out changeset 9f57ec83cdc6 (bug 1687635)
2021-05-26 20:41:20 +03:00
Henri Sivonen
f4ad469570 Bug 1687635 part 2 - Disable Repair Text Encoding when known not to have effect. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D112929
2021-05-26 16:59:05 +00:00
Henri Sivonen
1d5bf583ce Bug 829543 - Rename hintCharset to reloadEncoding and remove unnecessary code. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D113639
2021-04-28 12:15:47 +00:00
Henri Sivonen
1be77297f7 Bug 673087 - Honor encoding declared via XML declaration in text/html. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D107806
2021-03-23 09:52:04 +00:00
Simon Giesecke
86e29e162f Bug 1695162 - Use range-based for instead of custom hashtable iterators. r=xpcom-reviewers,kmag
Differential Revision: https://phabricator.services.mozilla.com/D108585
2021-03-17 15:49:46 +00:00
Henri Sivonen
1d440d9632 Bug 1686463 - Gather telemetry about automatic encoding detection outcomes. r=chutten,emk
Differential Revision: https://phabricator.services.mozilla.com/D102397
2021-01-24 00:11:07 +00:00
Henri Sivonen
dc81bb2634 Bug 1648464 - Add an Autodetect item to the Text Encoding menu. r=emk,chutten,Gijs
Take a step towards replacing the encoding menu with a single menu item that
triggers the autodetection manually. However, don't remove anything for now.

* Add an autodetect item.
* Add telemetry for autodetect used in session.
* Add telemetry for non-autodetect used in session.
* Restore and revise telemetry for how the encoding that is being overridden
  was discovered.

Differential Revision: https://phabricator.services.mozilla.com/D81132
2021-01-14 07:06:53 +00:00
Henri Sivonen
7ca6606f37 Bug 1647310 - Stop storing charset on cache entries. r=necko-reviewers,dragana
Storing the charset on cache entries makes the code path uselessly different
when loading from cache relative to uncached loads. Also, for future
telemetry purposes, caching the charset obscures its original source.

Differential Revision: https://phabricator.services.mozilla.com/D101570
2021-01-15 09:35:56 +00:00
Simon Giesecke
ab6f0a7137 Bug 1650145 - Replace all value uses of Empty[C]String by 0-length _ns literals. r=froydnj,geckoview-reviewers,agi
Differential Revision: https://phabricator.services.mozilla.com/D82325
2020-09-23 15:17:15 +00:00
Henri Sivonen
695d6eb24c Bug 1647301 - Remove forceCharset from nsIContentViewer. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D80470
2020-07-02 06:23:24 +00:00
Henri Sivonen
c2b1e3bb69 Bug 1647728 - Unify kCharsetFromUserForced and kCharsetFromParentForced. r=m_kato
For making further changes less messy.

Differential Revision: https://phabricator.services.mozilla.com/D80813
2020-06-25 03:25:03 +00:00
Henri Sivonen
202b60fc22 Bug 1646484 - Remove nsIDocShell::forcedCharset. r=emilio
Differential Revision: https://phabricator.services.mozilla.com/D80072
2020-06-18 01:18:11 +00:00
Henri Sivonen
0dc7ead757 Bug 1603712 - Remove intl.charset.detector.ng.enabled pref and resulting dead code. r=Gijs,fluent-reviewers,valentin,m_kato
Differential Revision: https://phabricator.services.mozilla.com/D79101
2020-06-15 15:32:21 +00:00
Andrea Marchesini
aaa2675e18 Bug 1639833 - IntrisincStoragePrincipal should always be partitioned - part 2 - Expose PartitionedPrincipal, r=dimi
Differential Revision: https://phabricator.services.mozilla.com/D76915
2020-06-03 06:09:52 +00:00
Csoregi Natalia
ed3350ab9b Backed out 5 changesets (bug 1639833) for failures on browser_blockingIndexedDbInWorkers.js. CLOSED TREE
Backed out changeset 6b4f76d65540 (bug 1639833)
Backed out changeset c77acba1aacb (bug 1639833)
Backed out changeset 30c97666919e (bug 1639833)
Backed out changeset d769b313441a (bug 1639833)
Backed out changeset ed41b41d1b03 (bug 1639833)
2020-06-02 15:02:31 +03:00
Andrea Marchesini
88b78d701a Bug 1639833 - IntrisincStoragePrincipal should always be partitioned - part 2 - Expose PartitionedPrincipal, r=dimi
Differential Revision: https://phabricator.services.mozilla.com/D76915
2020-06-02 08:28:05 +00:00
Noemi Erli
2b060384fc Backed out 5 changesets (bug 1639833) for causing sessionstorage related failures CLOSED TREE
Backed out changeset b36af8d9db34 (bug 1639833)
Backed out changeset 712c11904dbe (bug 1639833)
Backed out changeset 14f1e4783582 (bug 1639833)
Backed out changeset b7f14c4cfe5d (bug 1639833)
Backed out changeset b4b25034dd83 (bug 1639833)
2020-06-01 19:31:50 +03:00
Andrea Marchesini
5ffef561ac Bug 1639833 - IntrisincStoragePrincipal should always be partitioned - part 2 - Expose PartitionedPrincipal, r=dimi
Differential Revision: https://phabricator.services.mozilla.com/D76915
2020-06-01 11:57:46 +00:00
Jonathan Watt
3b22e68804 Bug 1634474. Make dom/html/ buildable outside of unified-build environment. r=farre
Differential Revision: https://phabricator.services.mozilla.com/D73309
2020-05-04 14:29:02 +00:00
Simon Giesecke
968040c445 Bug 1613985 - Use default for equivalent-to-default constructors/destructors in dom/html. r=smaug
Differential Revision: https://phabricator.services.mozilla.com/D63171
2020-02-20 16:19:15 +00:00
Simon Giesecke
9bcfd47601 Bug 1611415 - Prefer using std::move over forget. r=froydnj
Differential Revision: https://phabricator.services.mozilla.com/D60980
2020-02-13 14:38:48 +00:00
shindli
6bb3487209 Backed out changeset 0c982bc69cb3 (bug 1611415) for causing build bustages in /builds/worker/workspace/build/src/obj-firefox/dist/include/nsCOMPtr CLOSED TREE 2020-02-12 20:13:29 +02:00
Simon Giesecke
d45525793f Bug 1611415 - Applied FixItHints from mozilla-non-std-move. r=froydnj
Differential Revision: https://phabricator.services.mozilla.com/D60980
2020-02-12 17:24:41 +00:00
Johann Hofmann
3ca41b7244 Bug 1597029 - Set securityInfo for all document types. r=baku
In StartDocumentLoad we copy the securityInfo reference from the load channel into a document
member variable. This used to happen only for HTML documents, but other documents (e.g. media)
can be loaded via secure channels, too and thus should have securityInfos. We're using the
securityInfo object to display information in the UI, which would previously fail for images and videos.

Differential Revision: https://phabricator.services.mozilla.com/D59327
2020-01-09 14:55:02 +00:00
Emma Malysz
82d2389e71 Bug 1601110, remove handling and references to vnd.mozilla.xul+xml r=bzbarsky
Differential Revision: https://phabricator.services.mozilla.com/D57567
2019-12-23 23:02:05 +00:00
Emma Malysz
ba7491424c Bug 1605737, remove XUL cached type r=bzbarsky
Differential Revision: https://phabricator.services.mozilla.com/D58140
2019-12-23 17:10:15 +00:00
Henri Sivonen
b33e8e6eaf Bug 1551276 - Autodetect legacy encodings on unlabeled pages. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D56362
2019-12-12 17:50:19 +00:00
Oana Pop Rus
53a8a406db Backed out changeset 0810ad586986 (bug 1551276) for wpt failures in ar-ISO-8859-6-late.tentative.html on a CLOSED TREE 2019-12-12 16:38:54 +02:00
Henri Sivonen
5404be1b75 Bug 1551276 - Autodetect legacy encodings on unlabeled pages. r=emk
Differential Revision: https://phabricator.services.mozilla.com/D56362
2019-12-12 12:59:47 +00:00