If we enter a hang through the normal timed loop of RunMonitorThread, and then
call NotifyWait, it will result in a hang with an indefinite duration being
reported once NotifyActivity is called again.
MozReview-Commit-ID: 4vUip65L0qo
When the BHR wants to report a tardy thread, it hands off the work of
getting a stacktrace to a helper thread. The helper threads used are in the
Stream Transport Service threadpool, which has a default limit of 25
threads.
This has a bad effect when we are in a severely compute-resource constrained
situation. Then, many threads will be late, and up to 25 STS worker threads
will be employed to do unwinding, a potentially expensive operation. This
further restricts the compute resources available to progress the rest of
the system.
Another effect is that the unwinder work will compete against the "real" STS
work for the 25 workers, potentially further slowing forward progress.
This patch replaces the use of the STS thread pool with a single nsThread
dedicated to unwinding/reporting hangs.
This patch was autogenerated by my decomponents.py
It covers almost every file with the extension js, jsm, html, py,
xhtml, or xul.
It removes blank lines after removed lines, when the removed lines are
preceded by either blank lines or the start of a new block. The "start
of a new block" is defined fairly hackily: either the line starts with
//, ends with */, ends with {, <![CDATA[, """ or '''. The first two
cover comments, the third one covers JS, the fourth covers JS embedded
in XUL, and the final two cover JS embedded in Python. This also
applies if the removed line was the first line of the file.
It covers the pattern matching cases like "var {classes: Cc,
interfaces: Ci, utils: Cu, results: Cr} = Components;". It'll remove
the entire thing if they are all either Ci, Cr, Cc or Cu, or it will
remove the appropriate ones and leave the residue behind. If there's
only one behind, then it will turn it into a normal, non-pattern
matching variable definition. (For instance, "const { classes: Cc,
Constructor: CC, interfaces: Ci, utils: Cu } = Components" becomes
"const CC = Components.Constructor".)
MozReview-Commit-ID: DeSHcClQ7cG
Indicates whether Telemetry pre-release data recording is turned on. Tends to be true on pre-release channels. Also, this attribute is read-only. @see nsITelemetry.canRecordPrereleaseData.
Currently the Gecko Profiler defines a moderate amount of stuff when
MOZ_GECKO_PROFILER is undefined. It also #includes various headers, including
JS ones. This is making it difficult to separate Gecko's media stack for
inclusion in Servo.
This patch greatly simplifies how things are exposed. The starting point is:
- GeckoProfiler.h can be #included unconditionally;
- everything else from the profiler must be guarded by MOZ_GECKO_PROFILER.
In practice this introduces way too many #ifdefs, so the patch loosens it by
adding no-op macros for a number of the most common operations.
The net result is that #ifdefs and macros are used a bit more, but almost
nothing is exposed in non-MOZ_GECKO_PROFILER builds (including
ProfilerMarkerPayload.h and GeckoProfiler.h), and understanding what is exposed
is much simpler than before.
Note also that in BHR, ThreadStackHelper is now entirely absent in
non-MOZ_GECKO_PROFILER builds.
XPCOM's string API doesn't have the notion of a "null string". But it does have
the notion of a "void string" (or "voided string"), and that's what these
functions are returning. So the names should reflect that.
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
- This patch was written in 3 interdependent parts, which are described below -
Part 1A: Allow HangStack to contain raw PCs and Module offsets, r=froydnj
The HangStack previously consisted of an array of const char* pointers into its
backing string buffer, which represented pseudostack entries. With interleaved
stacks, it is now possible for the stack to contain raw unresolved program
counters (Kind::PC), and module/offset pairs (Kind::MODOFFSET). To do this, we
use a discriminated union, and make the backing array use the discriminated
union instead of const char*s.
The code cannot use mozilla::Variant<const char*, uintptr_t, Module>
unfortuantely, as we cannot use the implementation of ParamTraits for Variant in
HangStack's ParamTraits implementation.
When deserializing a HangStack over IPC, we need to read the string frame
entries into the backing string buffer, and generate const char* entries for
each of the strings which we read in over IPC. The default implementation of
ParamTraits wouldn't give us access to the enclusing HangStack object while
deserializing each individual entry, so we couldn't use it. In fact, Entries
don't have ParamTraits implemented for them at all, and can only be sent over
IPC as part of a HangStack due to this dependency.
Part 1B: Remove nsIHangDetails.pseudoStack, replace ProcessedStack w/ new HangStack type, r=froydnj
Previously there were two stack objects on each HangDetails object: mStack and
mPseudoStack. mStack was a Telemetry::ProcessedStack, while mPseudoStack was a
HangStack. After the changes in part 1A, HangStack can now contain all of the
information of both the old HangStack and ProcessedStack, so the mPseudoStack
field is renamed to mStack, and the old mStack field is removed.
This patch also implements the new GetStack getter, which generates the JS data
format for the new HangStack type.
Part 1C: Collect interleaved stacks w/ ProfilerStackCollector API in ThreadStackHelper, r=froydnj
This new API was added by njn in bug 1380286, and provides both pseudostack and
native stack entries to the consumer of the API.
This patch changes ThreadStackHelper to use this new API instead of the previous
one, and use it to collect the frames directly into HangStack objects.
HangAnnotations was very complex, required a separate allocation, and used this
unfortunate virtual interface implementation which made it harder to do
interesting things with it (such as serialize it over IPC).
This new implementation is much simpler and more concrete, making
HangAnnotations simply be a nsTArray<Annotation>. This also simplifies some of
the IPC code which was added in part 7.
MozReview-Commit-ID: EzaaxdHpW1t
These changes are going to increase the amount of data which we collect from BHR
a lot. It would be dangerous to run it on beta, especially considering how soon
the next merge is.
This should turn it off for 100% of beta users if I understand the logic
correctly.
MozReview-Commit-ID: 3HyEKWdXaqU
BHRTelemetryService only runs in the parent process (and we can only submit
pings from there), so we need to send the data which we collect in the GPU and
Content processes over IPC to the parent process.
MozReview-Commit-ID: 8B5uZKbjNbU
This patch adds the BHRTelemetryService which is a JS implemented XPCOM service
that simply listens to the bhr-thread-hang observer notification, and uses the
data it collects from it to submit telemetry pings.
MozReview-Commit-ID: 2hPXAFmHrm5
We're going to use HangDetails as the type containing hang information. We'll
have a JS component which reads the data out of nsIHangDetails, builds the
payload, and submits it to telemetry for us.
We'll do it in JS because telemetry has to be submitted from JS.
This patch also adds IPC serization for the relevant types so that we can send
HangDetails objects over IPDL.
MozReview-Commit-ID: CeikKabY9Vs