Historically, PSM has handled tracking NSS resources, releasing them, and
shutting down NSS in a coordinated manner (i.e. preventing races,
use-after-frees, etc.). This approach has proved intractable. This patch
introduces a new approach: have XPCOM shut down NSS after all threads have been
joined and the component manager has been shut down (and so there shouldn't be
any XPCOM objects holding NSS resources).
Note that this patch only attempts to determine if this approach will work. If
it does, we will have to go through alter and remove the remnants of the old
approach (i.e. nsNSSShutDownPreventionLock and related machinery). This will be
done in bug 1421084.
MozReview-Commit-ID: LjgEl1UZqkC
OS.File already only supports UTF-8 paths on non-Windows systems, so this
change makes our different ways of accessing file paths consistent with each
other.
MozReview-Commit-ID: 8HiC5xC8tJN
This was created for B2G and isn't really useful otherwise. It only works on
Linux, and it's behind the memory.system_memory_reporter pref, which is false
by default.
The patch also removes LinuxUtils.{h,cpp}, which is no longer used.
And remove unreachable code after MOZ_CRASH_UNSAFE_OOL().
MOZ_CRASH_UNSAFE_OOL causes data collection because crash strings are annotated to crash-stats and are publicly visible. Firefox data stewards must do data review on usages of this macro. However, all the crash strings this patch collects with MOZ_CRASH_UNSAFE_OOL are already collected with NS_RUNTIMEABORT.
MozReview-Commit-ID: IHmJfuxXSqw
Currently the Gecko Profiler defines a moderate amount of stuff when
MOZ_GECKO_PROFILER is undefined. It also #includes various headers, including
JS ones. This is making it difficult to separate Gecko's media stack for
inclusion in Servo.
This patch greatly simplifies how things are exposed. The starting point is:
- GeckoProfiler.h can be #included unconditionally;
- everything else from the profiler must be guarded by MOZ_GECKO_PROFILER.
In practice this introduces way too many #ifdefs, so the patch loosens it by
adding no-op macros for a number of the most common operations.
The net result is that #ifdefs and macros are used a bit more, but almost
nothing is exposed in non-MOZ_GECKO_PROFILER builds (including
ProfilerMarkerPayload.h and GeckoProfiler.h), and understanding what is exposed
is much simpler than before.
Note also that in BHR, ThreadStackHelper is now entirely absent in
non-MOZ_GECKO_PROFILER builds.
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
This patch reduces the differences between builds where the profiler is enabled
and those where the profiler is disabled. It does this by removing numerous
MOZ_GECKO_PROFILER checks.
These changes have the following consequences.
- Various functions and classes are now defined in all builds, and so can be
used unconditionally: profiler_add_marker(), profiler_set_js_context(),
profiler_clear_js_context(), profiler_get_pseudo_stack(), AutoProfilerLabel.
(They are effectively no-ops in non-profiler builds, of course.)
- The no-op versions of PROFILER_* are now gone. The remaining versions are
almost no-ops when the profiler isn't built.
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB
Separate AbstractThread::InitTLS and
AbstractThread::InitMainThread. Init AbstractThread main thread when
init nsThreadManager. Init AbstractThread TLS for all content process
types because for plugin and gmp processes we are doing IPC even
without init XPCOM and for content process init XPCOM requires IPC.
MozReview-Commit-ID: DhLub23oZz8
PseudoContext::sampleContext() is always called immediately after
profiler_get_pseudo_stack(). This patch introduces profiler_set_js_context()
and profiler_clear_js_context(), which replace the profiler_get_pseudo_stack()
+ sampleContext() pairs. This takes us a step closer to not having to export
PseudoStack outside the profiler.
The profiler can currently handle nested calls to profiler_{init,shutdown}() --
only the first call to profiler_init() and the last call to profiler_shutdown()
do anything. And sure enough, we have the following.
- Outer init/shutdown pairs in XRE_main()/XRE_InitChildProcess() (via
GeckoProfilerInitRAII).
- Inner init/shutdown pairs in NS_InitXPCOM2()/NS_InitMinimalXPCOM() (both shut
down in ShutdownXPCOM()).
This is a bit silly, so the patch removes the inner pairs, and adds a
now-needed pair in XRE_XPCShellMain. This will allow gInitCount -- which tracks
the nesting depth -- to be removed in a future patch.
SHGetKnownFolderPath() is available on Windows Vista+ so we no longer need to GetProcAddress("SHGetKnownFolderPath"). We can set _WIN32_WINNT 0x0600 to expose the SHGetKnownFolderPath function declaration in shlobj.h and call it directly.
Also remove a redundant #include <shlobj.h>.
MozReview-Commit-ID: AoAlrfvQ5AB
This marks the idl classes as deprecated, removes an unnecessary include that
was triggering deprecation warnings and wraps a necessary include in
XPCOMInit.cpp that is used for registering the component in deprecation
disabling pragmas.
MozReview-Commit-ID: BbNU5q8O4Q4