Most of this is fixing functions that in some cases return a value but then
can also run to completion without returning anything. ESLint 2 catches this
where previous versions didn't. Unless there was an obvious other choice I just
made these functions return undefined at the end which is effectively what
already happens.
MozReview-Commit-ID: KHYdAkRvhVr
We have some oddities in our jemalloc stats reporting.
- "heap-overhead-ratio" is a strange measurement: overhead / non-overhead,
expressed as a percentage. And it omits "bin_unused", which appears to be an
oversight.
- "heap-committed" also omits "bin_unused".
- There are some minor errors in memory report descriptions.
This patch fixes these and improves the heap reporting. It makes the following
reporting changes:
- "heap-allocated": Duplicated as "heap-committed/allocated". (We keep
"heap-allocated" because that's a special value used in the computation of
"heap-unclassified".)
- "heap-committed/overhead": Added; it's the same as the sum of the
"explicit/heap-overhead/*" values. Together with "heap-committed/allocated"
it shows clearly what fraction of the heap is overhead and what fraction is
useful.
- "heap-committed": Removed; now implicit as the "heap-committed/" node.
- "heap-overhead-ratio":
- Removed from memory reports; now shown as the percentage of the new
"heap-committed/overhead" node.
- Still available as a distinguished amount (because it's useful in
isolation) but renamed to heapOverheadFraction, and the telemetry ID is
renamed as MEMORY_HEAP_OVERHEAD_FRACTION.
- "heap-chunks": Removed; it's not that interesting, and can be manually
computed as "heap-mapped" / "heap-chunksize" if necessary.
What it does:
Adds a new function, TelemetrySession.getChildThreadHangs(), which returns a promise resolving to an array of threadHangStats [1], one per process.
Note that processes that spawn or die while the function's promise is created but not resolved may be excluded from the final result.
How we do this:
1. Parent sends a MESSAGE_TELEMETRY_GET_CHILD_PAYLOAD message to each child, promising the results of these messages.
2. Child processes respond to parent with a MESSAGE_TELEMETRY_THREAD_HANGS, which contains BHR stats in the payload.
3. Parent combines all the child responses together and resolves the promise.
Plus a bunch of synchronization stuff and handling of edge cases since the number of child processes can change at any time.
Also, there is a 200ms timeout since we can't handle all of these cases. Specifically, when a child dies without responding, after all other child processes have responded.
Why we do this:
* We can technically get thread hang stats by retrieving Telemetry pings (see requestChildPayloads() in TelemetrySession for details), but this is very slow and can only be done for one process at a time.
* TelemetrySession is responsible for various Telemetry IPC-related tasks, and so is a natural place to expose this function (i.e., the function blends in well with the rest of the API).
* Statuser [2] uses this for quickly obtaining child process BHR stats. This allows us to get realtime hang monitoring for child processes.
[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/nsITelemetry.idl#146
[2]: https://github.com/chutten/statuser
What it does:
Adds a new function, TelemetrySession.getChildThreadHangs(), which returns a promise resolving to an array of threadHangStats [1], one per process.
Note that processes that spawn or die while the function's promise is created but not resolved may be excluded from the final result.
How we do this:
1. Parent sends a MESSAGE_TELEMETRY_GET_CHILD_PAYLOAD message to each child, promising the results of these messages.
2. Child processes respond to parent with a MESSAGE_TELEMETRY_THREAD_HANGS, which contains BHR stats in the payload.
3. Parent combines all the child responses together and resolves the promise.
Plus a bunch of synchronization stuff and handling of edge cases since the number of child processes can change at any time.
Also, there is a 200ms timeout since we can't handle all of these cases. Specifically, when a child dies without responding, after all other child processes have responded.
Why we do this:
* We can technically get thread hang stats by retrieving Telemetry pings (see requestChildPayloads() in TelemetrySession for details), but this is very slow and can only be done for one process at a time.
* TelemetrySession is responsible for various Telemetry IPC-related tasks, and so is a natural place to expose this function (i.e., the function blends in well with the rest of the API).
* Statuser [2] uses this for quickly obtaining child process BHR stats. This allows us to get realtime hang monitoring for child processes.
[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/nsITelemetry.idl#146
[2]: https://github.com/chutten/statuser
We now have all the necessary measurement APIs to get a full memory picture for
a running multi-process instance. However, there's no way to correlate one
particular RSS measurement on chrome with its USS measurements on content
processes.
So do that in TelemetrySession and report it.
We now have all the necessary measurement APIs to get a full memory picture for
a running multi-process instance. However, there's no way to correlate one
particular RSS measurement on chrome with its USS measurements on content
processes.
So do that in TelemetrySession and report it.
Unique Set Size (USS) could be described as "the amount of memory you could
expect to reclaim if you killed this process"
Resident Set Size (RSS) is USS with the addition of memory allocated by
shared memory.
To get a full picture of the memory use of a multi-process application is
impossible. A better guess than most is
Parent Process' RSS + sum(Child Processes' USS)
Or, from Telemetry:
Parent MEMORY_RESIDENT + sum(Children's MEMORY_UNIQUE)
The GMP manager uses a copy of the update service's url formatting code and has
since fallen out of sync. We'll also want to use the same formatting code for
the system add-on update checks so this just exposes it in a shared API.
I've moved the contents of UpdateChannel.jsm to UpdateUtils.jsm and exposed
formatUpdateURL there as well as a few properties that the update service still
needs access to.
UpdateUtils.UpdateChannel is intended to be a lazy getter but isn't for now
since tests expect to be able to change the update channel at runtime.
The GMP manager uses a copy of the update service's url formatting code and has
since fallen out of sync. We'll also want to use the same formatting code for
the system add-on update checks so this just exposes it in a shared API.
I've moved the contents of UpdateChannel.jsm to UpdateUtils.jsm and exposed
formatUpdateURL there as well as a few properties that the update service still
needs access to.
UpdateUtils.UpdateChannel is intended to be a lazy getter but isn't for now
since tests expect to be able to change the update channel at runtime.