Also adds a mozilla/ResultExtensions.h header to define the appropriate
conversion functions for nsresult and PRResult. This is in a separate header
since those types are not available in Spidermonkey, and this is the pattern
other *Extensions.h headers follow.
Also removes equivalent NS_TRY macros and WrapNSResult inlines that served the
same purpose in existing code, and are no longer necessary.
MozReview-Commit-ID: A85PCAeyWhx
This allows MOZ_TRY and MOZ_TRY_VAR to be transparently used in XPCOM methods
when compatible Result types are used.
Also removes a compatibility macro in SimpleChannel.cpp, and an identical
specialization in AddonManagerStartup, which are no longer necessary after
this change.
MozReview-Commit-ID: 94iNrPDJEnN
Off-thread parse tasks depend on the memory allocated by the main-thread
script preloader, so that memory needs to be kept alive until they finish. In
normal use cases, this is guaranteed, but when the browser shuts down very
quickly (as sometimes happens in tests), or we invalidate the startup caches
in the middle of the startup process, they can sometimes be freed too early.
MozReview-Commit-ID: GQmkVbWgTH9
In general, we always want to compile with lazy source whenever we intend to
store code in the startup bytecode cache. Currently, we only do so when the
main StartupCache is available (which is only in the parent process), even if
we'll be storing code in the ScriptPreloader cache.
The main side-effect of this is that scripts which are used in a content
process do not use lazy source, but *do* use lazy parsing. And since we need
to clone pre-loaded scripts into their target compartment before executing, we
end up eagerly compiling those lazy functions on the main thread as soon as we
execute the script, in order to clone them. And this causes two side-effects
of its own:
1) It's terrible for startup performance.
2) It creates new scope chains which didn't exist when the clone began, and
which are likely responsible for bug 1367896.
MozReview-Commit-ID: 6lZLb68zitp
The current logic is fairly hard to follow, and fails to account for changes
in the set of processes a script needs to be executed in when deciding whether
to write a new cache file. These changes simplify that logic considerably,
increase the chances that we'll correctly pre-decode a script that's needed in
a given process during startup.
MozReview-Commit-ID: BvqjKU8FDHW
Flushing the cache at startup is already handled automatically by the
AppStartup code, which removes the entire startupCache directory when
necessary. The add-on manager requires being able to flush the cache at
runtime, though, for the sake of updating bootstrapped add-ons.
MozReview-Commit-ID: LIdiNHrXYXu
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB
Flushing the cache at startup is already handled automatically by the
AppStartup code, which removes the entire startupCache directory when
necessary. The add-on manager requires being able to flush the cache at
runtime, though, for the sake of updating bootstrapped add-ons.
MozReview-Commit-ID: LIdiNHrXYXu
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB