Bug 1712168 - migrate profiler mdn docs to in tree performance docs r=julienw
*** bug 1712168 - migrate profiler mdn docs to in tree performance docs r=#firefox-source-docs-reviewers Differential Revision: https://phabricator.services.mozilla.com/D116154
This commit is contained in:
@@ -1,185 +0,0 @@
|
||||
# Call Tree
|
||||
|
||||
The Call Tree tells you which JavaScript functions the browser spent the
|
||||
most time in. By analyzing its results, you can find bottlenecks in your
|
||||
code - places where the browser is spending a disproportionately large
|
||||
amount of time.
|
||||
|
||||
These bottlenecks are the places where any optimizations you can make
|
||||
will have the biggest impact.
|
||||
|
||||
The Call Tree is a sampling profiler. It periodically samples the state
|
||||
of the JavaScript engine and records the stack for the code executing at
|
||||
the time. Statistically, the number of samples taken in which we were
|
||||
executing a particular function corresponds to the amount of time the
|
||||
browser spent executing it.
|
||||
|
||||
In this article, we'll use the output of a simple program as an
|
||||
example. If you want to get the program to experiment with your profile,
|
||||
you can find it
|
||||
[here](https://github.com/mdn/performance-scenarios/blob/gh-pages/js-call-tree-1/).
|
||||
You can find the specific profile we discuss
|
||||
[here](https://github.com/mdn/performance-scenarios/blob/gh-pages/js-call-tree-1/profile/call-tree.json)
|
||||
- just import it to the performance tool to follow along.
|
||||
|
||||
There's a short page describing the structure of this program
|
||||
[here](sorting_algorithms_comparison.md).
|
||||
|
||||
Note that we use the same program - the same profile, in fact - in the
|
||||
documentation page for the [Flame
|
||||
Chart](https://developer.mozilla.org/en-US/docs/Tools/Performance/Flame_Chart).
|
||||
|
||||
The screenshot below shows the output of a program that compares three
|
||||
sorting algorithms - bubble sort, selection sort, and quicksort. To do
|
||||
this, it generates some arrays filled with random integers and sorts
|
||||
them using each algorithm in turn.
|
||||
|
||||
We've [zoomed](https://developer.mozilla.org/en-US/docs/Tools/Performance/UI_Tour#zooming_in) into
|
||||
the part of the recording that shows a long JavaScript marker:
|
||||
|
||||

|
||||
|
||||
The Call Tree presents the results in a table. Each row represents a
|
||||
function in which at least one sample was taken, and the rows are
|
||||
ordered by the number of samples taken while in that function, highest
|
||||
to lowest.
|
||||
|
||||
*Samples* is the number of samples that were taken when we were
|
||||
executing this particular function, including its children (the other
|
||||
functions called by this particular function).
|
||||
|
||||
*Total Time* is that number translated into milliseconds, based on the
|
||||
total amount of time covered by the selected portion of the recording.
|
||||
These numbers should roughly be the same as the number of samples.
|
||||
|
||||
*Total Cost* is that number as a percentage of the total number of
|
||||
samples in the selected portion of the recording.
|
||||
|
||||
*Self Time* is calculated as the time spent in that particular function,
|
||||
excluding its children. This comes from the captured stacks where this
|
||||
function is the leafmost function.
|
||||
|
||||
*Self Cost* is calculated from *Self Time* as a percentage of the total
|
||||
number of samples in the selected portion of the recording.
|
||||
|
||||
In the current version of the Call Tree, these are the most important
|
||||
columns. Functions with a relatively high *Self Cost* are good
|
||||
candidates for optimization, either because they take a long time to
|
||||
run, or because they are called very often.
|
||||
|
||||
[The inverted call tree](#using_an_inverted_aka_bottom-up_call_tree) is
|
||||
a good way to focus on these *Self Cos*t values.
|
||||
|
||||
This screenshot tells us something we probably already knew: Bubble sort
|
||||
is a very inefficient algorithm. We have about six times as many samples
|
||||
in bubble sort as in selection sort, and 13 times as many as in
|
||||
quicksort.
|
||||
|
||||
## Walking up the call tree
|
||||
|
||||
Next to each function name is a disclosure arrow: Click that, and you
|
||||
can see the path back up the call tree, from the function in which the
|
||||
sample was taken, to the root. For example, we can expand the entry for
|
||||
`bubbleSort()`:
|
||||
|
||||

|
||||
|
||||
So we can see the call graph is like this:
|
||||
|
||||
sortAll()
|
||||
|
||||
-> sort()
|
||||
|
||||
-> bubbleSort()
|
||||
|
||||
Note also that *Self Cost* for `sort()` here is 1.45%, and note that
|
||||
this is the same as for the separate entry for `sort()` later in the
|
||||
list. This is telling us that some samples were taken in `sort()`
|
||||
itself, rather than in the functions it calls.
|
||||
|
||||
Sometimes there's more than one path back from an entry to the top
|
||||
level. Let's expand the entry for `swap()`:
|
||||
|
||||

|
||||
|
||||
There were 253 samples taken inside `swap()`. But `swap()` was reached
|
||||
by two different paths: both `bubbleSort()` and `selectionSort()` use
|
||||
it. We can also see that 252 of the 253 samples in `swap() `were taken
|
||||
in the `bubbleSort()` branch, and only one in the `selectionSort()`
|
||||
branch.
|
||||
|
||||
This result means that bubble sort is even less efficient than we had
|
||||
thought! It can shoulder the blame for another 252 samples, or almost
|
||||
another 10% of the total cost.
|
||||
|
||||
With this kind of digging, we can figure out the whole call graph, with
|
||||
associated sample count:
|
||||
|
||||
sortAll() // 8
|
||||
|
||||
-> sort() // 37
|
||||
|
||||
-> bubbleSort() // 1345
|
||||
|
||||
-> swap() // 252
|
||||
|
||||
-> selectionSort() // 190
|
||||
|
||||
-> swap() // 1
|
||||
|
||||
-> quickSort() // 103
|
||||
|
||||
-> partition() // 12
|
||||
|
||||
## Platform data
|
||||
|
||||
You'll also see some rows labeled *Gecko*, *Input & Events*, and so on.
|
||||
These represent internal browser calls.
|
||||
|
||||
This can be useful information too. If your site is making the browser
|
||||
work hard, this might not show up as samples recorded in your code, but
|
||||
it is still your problem.
|
||||
|
||||
In our example, there are 679 samples assigned to *Gecko* - the
|
||||
second-largest group after `bubbleSort()`. Let's expand that:
|
||||
|
||||

|
||||
|
||||
This result is telling us that 614 of those samples, or about 20% of the
|
||||
total cost, are coming from our `sort()` call. If we look at the code
|
||||
for `sort()`, it should be fairly obvious that the high platform data
|
||||
cost is coming from repeated calls to `console.log()`:
|
||||
|
||||
``` {.brush: .js}
|
||||
function sort(unsorted) {
|
||||
console.log(bubbleSort(unsorted));
|
||||
console.log(selectionSort(unsorted));
|
||||
console.log(quickSort(unsorted));
|
||||
}
|
||||
```
|
||||
|
||||
It would certainly be worthwhile considering more efficient ways of
|
||||
implementing this.
|
||||
|
||||
One thing to be aware of here is that idle time is classified as
|
||||
*Gecko*, so parts of your profile where your JavaScript isn't running
|
||||
will contribute *Gecko* samples. These aren't relevant to the
|
||||
performance of your site.
|
||||
|
||||
By default, the Call Tree doesn't split platform data out into separate
|
||||
functions, because they add a great deal of noise, and the details are
|
||||
not likely to be useful to people not working on Firefox. If you want to
|
||||
see the details, check \"Show Gecko Platform Data\" in the
|
||||
[Settings](https://developer.mozilla.org/en-US/docs/Tools/Performance/UI_Tour#toolbar).
|
||||
|
||||
## Using an inverted, aka Bottom-Up, Call Tree
|
||||
|
||||
An inverted call tree reverses the order of all stacks, putting the
|
||||
leafmost function calls at the top. The direct consequence is that this
|
||||
is a view that focuses more on the function's *Self Time* information.
|
||||
This is a very useful view to find some hot spot in your code.
|
||||
|
||||
To display this view, click the gear icon on the right-hand end of the
|
||||
performance tab and select **Invert Call Tree**.
|
||||
|
||||

|
||||
50
docs/performance/dtrace.md
Normal file
50
docs/performance/dtrace.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# dtrace
|
||||
|
||||
`dtrace` is a powerful Mac OS X kernel instrumentation system that can
|
||||
be used to profile wakeups. This article provides a light introduction
|
||||
to it.
|
||||
|
||||
:::
|
||||
**Note**: The [power profiling
|
||||
overview](/en-US/docs/Mozilla/Performance/Power_profiling_overview) is
|
||||
worth reading at this point if you haven't already. It may make parts
|
||||
of this document easier to understand.
|
||||
:::
|
||||
|
||||
## Invocation
|
||||
|
||||
`dtrace` must be invoked as the super-user. A good starting command for
|
||||
profiling wakeups is the following.
|
||||
|
||||
```
|
||||
sudo dtrace -n 'mach_kernel::wakeup { @[ustack()] = count(); }' -p $FIREFOX_PID > $OUTPUT_FILE
|
||||
```
|
||||
|
||||
Let's break that down further.
|
||||
|
||||
- The` -n` option combined with the `mach_kernel::wakeup` selects a
|
||||
*probe point*. `mach_kernel` is the *module name* and `wakeup` is
|
||||
the *probe name*. You can see a complete list of probes by running
|
||||
`sudo dtrace -l`.
|
||||
- The code between the braces is run when the probe point is hit. The
|
||||
above code counts unique stack traces when wakeups occur; `ustack`
|
||||
is short for \"user stack\", i.e. the stack of the userspace program
|
||||
executing.
|
||||
|
||||
Run that command for a few seconds and then hit [Ctrl]{.kbd} + [C]{.kbd}
|
||||
to interrupt it. `dtrace` will then print to the output file a number of
|
||||
stack traces, along with a wakeup count for each one. The ordering of
|
||||
the stack traces can be non-obvious, so look at them carefully.
|
||||
|
||||
Sometimes the stack trace has less information than one would like.
|
||||
It's unclear how to improve upon this.
|
||||
|
||||
## See also
|
||||
|
||||
dtrace is *very* powerful, and you can learn more about it by consulting
|
||||
the following resources:
|
||||
|
||||
- [The DTrace one-liner
|
||||
tutorial](https://wiki.freebsd.org/DTrace/Tutorial) from FreeBSD.
|
||||
- [DTrace tools](http://www.brendangregg.com/dtrace.html), by Brendan
|
||||
Gregg.
|
||||
@@ -27,6 +27,13 @@ explains how to use the Gecko profiler.
|
||||
* [LogAlloc](https://searchfox.org/mozilla-central/source/memory/replace/logalloc/README) is a tool that dumps a log of memory allocations in Gecko. That log can then be replayed against Firefox's default memory allocator independently or through another replace-malloc library, allowing the testing of other allocators under the exact same workload.
|
||||
* [See also the documentation on Leak-hunting strategies and tips.](leak_hunting_strategies_and_tips.md)
|
||||
|
||||
## Profiling and performance tools
|
||||
|
||||
* [Profiling with Instruments](profiling_with_instruments.md) How to use Apple's Instruments tool to profile Mozilla code.
|
||||
* [Profiling with xperf](profiling_with_xperf.md) How to use Microsoft's Xperf tool to profile Mozilla code.
|
||||
* [Profiling with Concurrency Visualizer](profiling_with_concurrency_visualizer.md) How to use Visual Studio's Concurrency Visualizer tool to profile Mozilla code.
|
||||
* [Profiling with Zoom](profiling_with_zoom.md) Zoom is a profiler for Linux done by the people who made Shark.
|
||||
* [Adding a new telemetry probe](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/start/adding-a-new-probe.html) Information on how to add a new measurement to the Telemetry performance-reporting system
|
||||
|
||||
## Power Profiling
|
||||
|
||||
|
||||
@@ -103,10 +103,7 @@ ordered by the size of the allocations they made:
|
||||
|
||||

|
||||
\
|
||||
The structure of this view is very much like the structure of the [Call
|
||||
Tree](call_tree.md), only it shows
|
||||
allocations rather than processor samples. So, for example, the first
|
||||
entry says that:
|
||||
The first entry says that:
|
||||
|
||||
- 4,832,592 bytes, comprising 93% of the total heap usage, were
|
||||
allocated in a function at line 35 of \"alloc.js\", **or in
|
||||
|
||||
@@ -252,7 +252,7 @@ the code as being responsible.
|
||||
high-context measurements. This is useful because high CPU usage
|
||||
typically causes high power consumption.
|
||||
- Some tools can provide high-context wakeup measurements:
|
||||
[dtrace](/en-US/docs/Mozilla/Performance/dtrace) (on Mac) and
|
||||
[dtrace](dtrace.md) (on Mac) and
|
||||
[perf](perf.md) (on Linux.)
|
||||
- Source-level instrumentation, such as [TimerFirings
|
||||
logging](timerfirings_logging.md), can
|
||||
@@ -295,7 +295,7 @@ power consumption.
|
||||
tools profiler, the Gecko Profiler, or generic performance
|
||||
profilers.
|
||||
- For high wakeup counts, use
|
||||
[dtrace](/en-US/docs/Mozilla/Performance/dtrace) or
|
||||
[dtrace](dtrace.md) or
|
||||
[perf](perf.md) or [TimerFirings logging](timerfirings_logging.md).
|
||||
- On Mac workloads that use graphics, Activity Monitor's "Energy"
|
||||
tab can tell you if the high-performance GPU is being used, which
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
# Profiling with Concurrency Visualizer
|
||||
|
||||
Concurrency Visualizer is an excellent alternative to xperf. In newer versions of Visual Studio, it is an addon that needs to be downloaded.
|
||||
|
||||
Here are some scripts that you can be used for manipulating the profiles that have been exported to CSV: [https://github.com/jrmuizel/concurrency-visualizer-scripts](https://github.com/jrmuizel/concurrency-visualizer-scripts)
|
||||
54
docs/performance/profiling_with_instruments.md
Normal file
54
docs/performance/profiling_with_instruments.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Profiling with Instruments
|
||||
|
||||
Instruments can be used for memory profiling and for statistical
|
||||
profiling.
|
||||
|
||||
## Official Apple documentation
|
||||
|
||||
- [Instruments User
|
||||
Guide](https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/)
|
||||
- [Instruments User
|
||||
Reference](https://developer.apple.com/library/mac/documentation/AnalysisTools/Reference/Instruments_User_Reference/)
|
||||
- [Instruments Help
|
||||
Articles](https://developer.apple.com/library/mac/recipes/Instruments_help_articles/)
|
||||
- [Instruments
|
||||
Help](https://developer.apple.com/library/mac/recipes/instruments_help-collection/)
|
||||
- [Performance
|
||||
Overview](https://developer.apple.com/library/mac/documentation/Performance/Conceptual/PerformanceOverview/)
|
||||
|
||||
### Basic Usage
|
||||
|
||||
- Select \"Time Profiler\" from the \"Choose a profiling template
|
||||
for:\" dialog.
|
||||
- In the top left, next to the record and pause button, there will be
|
||||
a \"\[machine name\] \> All Processes\". Click \"All Processes\" and
|
||||
select \"firefox\" from the \"Running Applications\" section.
|
||||
- Click the record button (red circle in top left)
|
||||
- Wait for the amount of time that you want to profile
|
||||
- Click the stop button
|
||||
|
||||
## Command line tools
|
||||
|
||||
There is
|
||||
[instruments](https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man1/instruments.1.html)
|
||||
and
|
||||
[iprofiler](https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man1/iprofiler.1.html).
|
||||
|
||||
How do we monitor performance counters (cache miss etc.)? Instruments
|
||||
has a \"Counters\" instrument that can do this.
|
||||
|
||||
## Memory profiling
|
||||
|
||||
Instruments will record a call stack at each allocation point. The call
|
||||
tree view can be quite helpful here. Switch from \"Statistics\". This
|
||||
`malloc` profiling is done using the `malloc_logger` infrastructure
|
||||
(similar to `MallocStackLogging`). Currently this means you need to
|
||||
build with jemalloc disabled (`ac_add_options --disable-jemalloc`). You
|
||||
also need the fix to [Bug
|
||||
719427](https://bugzilla.mozilla.org/show_bug.cgi?id=719427 "https://bugzilla.mozilla.org/show_bug.cgi?id=719427")
|
||||
|
||||
The `DTPerformanceSession` api can be used to control profiling from
|
||||
applications like the old CHUD API we use in Shark builds. [Bug
|
||||
667036](https://bugzilla.mozilla.org/show_bug.cgi?id=667036 "https://bugzilla.mozilla.org/show_bug.cgi?id=667036")
|
||||
|
||||
System Trace might be useful.
|
||||
180
docs/performance/profiling_with_xperf.md
Normal file
180
docs/performance/profiling_with_xperf.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Profiling with xperf
|
||||
|
||||
Xperf is part of the Microsoft Windows Performance Toolkit, and has
|
||||
functionality similar to that of Shark, oprofile, and (for some things)
|
||||
dtrace/Instruments. For stack walking, Windows Vista or higher is
|
||||
required; I haven't tested it at all on XP.
|
||||
|
||||
This page applies to xperf version **4.8.7701 or newer**. To see your
|
||||
xperf version, either run '`xperf`' on a command line with no
|
||||
arguments, or start '`xperfview`' and look at Help -\> About
|
||||
Performance Analyzer. (Note that it's not the first version number in
|
||||
the About window; that's the Windows version.)
|
||||
|
||||
If you have an older version, you will experience bugs, especially
|
||||
around symbol loading for local builds.
|
||||
|
||||
### Installation
|
||||
|
||||
For all versions, the tools are part of the latest [Windows 7 SDK (SDK
|
||||
Version
|
||||
7.1)](http://www.microsoft.com/downloads/details.aspx?FamilyID=6b6c21d2-2006-4afa-9702-529fa782d63b&displaylang=en "http://www.microsoft.com/downloads/details.aspx?FamilyID=6b6c21d2-2006-4afa-9702-529fa782d63b&displaylang=en"){.external}.
|
||||
Use the web installer to install at least the \"Win32 Development
|
||||
Tools\". Once the SDK installs, execute either `wpt_x86.msi` or
|
||||
`wpt_x64.msi` in the `Redist/Windows Performance Toolkit `folder of the
|
||||
SDK's install location (typically Program Files/Microsoft
|
||||
SDKs/Windows/v7.1/Redist/Windows Performance Toolkit) to actually
|
||||
install the Windows Performance Toolkit tools.
|
||||
|
||||
It might already be installed by the Windows SDK. Check if C:\\Program
|
||||
Files\\Microsoft Windows Performance Toolkit already exists.
|
||||
|
||||
For 64-bit Windows 7 or Vista, you'll need to do a registry tweak and
|
||||
then restart to enable stack walking:\
|
||||
\
|
||||
`REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f`
|
||||
|
||||
### Symbol Server Setup
|
||||
|
||||
With the latest versions of the Windows Performance Toolkit, you can
|
||||
modify the symbol path directly from within the program via the Trace
|
||||
menu. Just make sure you set the symbol paths before enabling \"Load
|
||||
Symbols\" and before opening a summary view. You can also modify the
|
||||
`_NT_SYMBOL_PATH` and `_NT_SYMCACHE_PATH` environment variables to make
|
||||
these changes permanent.
|
||||
|
||||
The standard symbol path that includes both Mozilla's and Microsoft's
|
||||
symbol server configuration is as follows:
|
||||
|
||||
`_NT_SYMCACHE_PATH: C:\symbols _NT_SYMBOL_PATH: srv*c:\symbols*http://msdl.microsoft.com/download/symbols;srv*c:\symbols*http://symbols.mozilla.org/firefox/`
|
||||
|
||||
To add symbols **from your own builds**, add
|
||||
`C:\path\to\objdir\dist\bin` to `_NT_SYMBOL_PATH`. As with all Windows
|
||||
paths, the symbol path uses semicolons (`;`) as separators.
|
||||
|
||||
Make sure you select the Trace -\> Load Symbols menu option in the
|
||||
Windows Performance Analyzer (xperfview).
|
||||
|
||||
There seems to be a bug in xperf and symbols; it is very sensitive to
|
||||
when the symbol path is edited. If you change it within the program,
|
||||
you'll have to close all summary tables and reopen them for it to pick
|
||||
up the new symbol path data.
|
||||
|
||||
You'll have to agree to a EULA for the Microsoft symbols \-- if you're
|
||||
not prompted for this, then something isn't configured right in your
|
||||
symbol path. (Again, make sure that the directories exist; if they
|
||||
don't, it's a silent error.)
|
||||
|
||||
### Quick Start
|
||||
|
||||
All these tools will live, by default, in C:\\Program Files\\Microsoft
|
||||
Windows Performance Toolkit. Either run these commands from there, or
|
||||
add the directory to your path. You will need to use an elevated command
|
||||
prompt to start or stop profiling.
|
||||
|
||||
Start recording data:
|
||||
|
||||
`xperf -on latency -stackwalk profile`
|
||||
|
||||
\"Latency\" is a special provider name that turns on a few predefined
|
||||
kernel providers; run \"xperf -providers k\" to view a full list of
|
||||
providers and groups. You can combine providers, e.g., \"xperf -on
|
||||
DiagEasy+FILE_IO\". \"-stackwalk profile\" tells xperf to capture a
|
||||
stack for each PROFILE event; you could also do \"-stackwalk
|
||||
profile+file_io\" to capture a stack on each cpu profile tick and each
|
||||
file io completion event.
|
||||
|
||||
Stop:
|
||||
|
||||
`xperf -d out.etl`
|
||||
|
||||
View:
|
||||
|
||||
`xperfview out.etl`
|
||||
|
||||
The MSDN
|
||||
\"[Quickstart](http://msdn.microsoft.com/en-us/library/ff190971%28v=VS.85%29.aspx){.external}\"
|
||||
page goes over this in more detail, and also has good explanations of
|
||||
how to use xperfview. I'm not going to repeat it here, because I'd be
|
||||
using essentially the same screenshots, so go look there.
|
||||
|
||||
The 'stack' view will give results similar to shark.
|
||||
|
||||
### Heap Profiling
|
||||
|
||||
xperf has good tools for heap allocation profiling, but they have one
|
||||
major limitation: you can't build with jemalloc and get heap events
|
||||
generated. The stock windows CRT allocator is horrible about
|
||||
fragmentation, and causes memory usage to rise drastically even if only
|
||||
a small fraction of that memory is in use. However, even despite this,
|
||||
it's a useful way to track allocations/deallocations.
|
||||
|
||||
#### Capturing Heap Data
|
||||
|
||||
The \"-heap\" option is used to set up heap tracing. Firefox generates
|
||||
lots of events, so you may want to play with the
|
||||
BufferSize/MinBuffers/MaxBuffers options as well to ensure that you
|
||||
don't get dropped events. Also, when recording the stack, I've found
|
||||
that a heap trace is often missing module information (I believe this is
|
||||
a bug in xperf). It's possible to get around that by doing a
|
||||
simultaneous capture of non-heap data.
|
||||
|
||||
To start a trace session, launching a new Firefox instance:
|
||||
|
||||
`xperf -on base xperf -start heapsession -heap -PidNewProcess "./firefox.exe -P test -no-remote" -stackwalk HeapAlloc+HeapRealloc -BufferSize 512 -MinBuffers 128 -MaxBuffers 512`
|
||||
|
||||
To stop a session and merge the resulting files:
|
||||
|
||||
`xperf -stop heapsession -d heap.etl xperf -d main.etl xperf -merge main.etl heap.etl result.etl`
|
||||
|
||||
\"result.etl\" will contain your merged data; you can delete main.etl
|
||||
and heap.etl. Note that it's possible to capture even more data for the
|
||||
non-heap profile; for example, you might want to be able to correlate
|
||||
heap events with performance data, so you can do
|
||||
\"`xperf -on base -stackwalk profile`\".
|
||||
|
||||
In the viewer, when summary data is viewed for heap events (Heap
|
||||
Allocations Outstanding, etc. all lead to the same summary graphs), 3
|
||||
types of allocations are listed \-- AIFI, AIFO, AOFI. This is shorthand
|
||||
for \"Allocated Inside, Freed Inside\", \"Allocated Inside, Freed
|
||||
Outside\", \"Allocated Outside, Freed Inside\". These refer to the time
|
||||
range that was selected for the summary graph; for example, something
|
||||
that's in the AOFI category was allocated before the start of the
|
||||
selected time range, but the free event happened inside.
|
||||
|
||||
### Tips
|
||||
|
||||
- In the summary views, the yellow bar can be dragged left and right
|
||||
to change the grouping \-- for example, drag it to the left of the
|
||||
Module column to have grouping happen only by process (stuff that's
|
||||
to the left), so that you get symbols in order of weight, regardless
|
||||
of what module they're in.
|
||||
- Dragging the columns around will change grouping in various ways;
|
||||
experiment to get the data that you're looking for. Also experiment
|
||||
with turning columns on and off; removing a column will allow data
|
||||
to be aggregated without considering that column's contributions.
|
||||
- Disabling all but one core will make the numbers add up to 100%.
|
||||
This can be done by running 'msconfig' and going to Advance
|
||||
Options from the \"Boot\" tab.
|
||||
|
||||
### Building Firefox
|
||||
|
||||
To get good data from a Firefox build, it is important to build with the
|
||||
following options in your mozconfig:
|
||||
|
||||
`export CFLAGS="-Oy-" export CXXFLAGS="-Oy-"`
|
||||
|
||||
This disables frame-pointer optimization which lets xperf do a much
|
||||
better job unwinding the stack. Traces can be captured fine without this
|
||||
option (for example, from nightlies), but the stack information will not
|
||||
be useful.
|
||||
|
||||
`ac_add_options --enable-debug-symbols`
|
||||
|
||||
This gives us symbols.
|
||||
|
||||
### For More Information
|
||||
|
||||
Microsoft's [documentation for xperf](http://msdn.microsoft.com/en-us/library/ff191077.aspx "http://msdn.microsoft.com/en-us/library/ff191077.aspx")
|
||||
is pretty good; there is a lot of depth to this tool, and you should
|
||||
look there for more details.
|
||||
5
docs/performance/profiling_with_zoom.md
Normal file
5
docs/performance/profiling_with_zoom.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Profiling with Zoom
|
||||
|
||||
Zoom is a profiler very similar to Shark for Linux.
|
||||
|
||||
You can get the profiler from here: <http://www.rotateright.com/>
|
||||
@@ -1,10 +1,5 @@
|
||||
# Sorting algorithms comparison
|
||||
|
||||
This article describes a simple example program that we use in two of
|
||||
the Performance guides: the guide to the [Call
|
||||
Tree](call_tree.md) and the guide to the
|
||||
[Flame Chart](https://developer.mozilla.org/en-US/docs/Tools/Performance/Flame_Chart).
|
||||
|
||||
This program compares the performance of three different sorting
|
||||
algorithms:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user