This fixes an incidental bug with the pref to turn off GPU stroking. It's not supposed to disable
caching strokes as textures. This allows that to work again even if prefed off.
Differential Revision: https://phabricator.services.mozilla.com/D163307
It seems like this is slow for now until we implement a better way than WPF-gpu-raster
for stroking paths. Just hide this behind a pref so we can at least test it but not
impact performance as badly.
Differential Revision: https://phabricator.services.mozilla.com/D163248
For use-cases that repeatedly pop and re-push the same clips over and over, we can regenerate the
same mask that is already still stored, because we only detect that clip state changed, rather than
that it changed to exactly the same state it was previously.
This just remembers the previous state of the clip stack at the time the clip mask was generated
so that we can compare the previous and current state. If they're the same, we can assume there
is no need to regenerate the clip mask again and simply reuse it.
Differential Revision: https://phabricator.services.mozilla.com/D162699
WebGL doesn't reliably implement line smoothing, so we can't rely on it, making it
useless for canvas lines. Instead, just fall back to emulating it manually with paths.
Differential Revision: https://phabricator.services.mozilla.com/D162540
Some paths may contain so many types that their vertex representation far exceeds their
software rasterized representation in memory size. As a sanity-check, we should just set
a hard limit on the maximum allowed complexity of a path that we attempt to supply to
wpf-gpu-raster. Beyond that, we will instead just rasterize in software and upload
to a texture which can be more performant.
Differential Revision: https://phabricator.services.mozilla.com/D162481
By default, BorrowSnapshot is pessimistic and forces DrawTargetWebgl to return a data snapshot on
the assumption that the snapshot might be used off thread. However, if we actually know the DrawTarget
we're going to be drawing the snapshot to, then we can check if they're both DrawTargetWebgls with
the same internal SharedContext. In that case, we can use a SourceSurfaceWebgl snapshot which can
pass through a GPU texture to the target. This requires us to plumb the DrawTarget down through
SurfaceFromElement all the way to DrawTargetWebgl to make this decision.
Differential Revision: https://phabricator.services.mozilla.com/D162176
This adds a path vertex buffer where triangle list output from WGR is stored.
Each PathCacheEntry can potentially reference a range of vertexes in this buffer
corresponding to triangles for that entry. When this buffer is full, it gets
orphaned and clears corresponding cache entries, so that it can start anew.
Differential Revision: https://phabricator.services.mozilla.com/D161479
This adds a path vertex buffer where triangle list output from WGR is stored.
Each PathCacheEntry can potentially reference a range of vertexes in this buffer
corresponding to triangles for that entry. When this buffer is full, it gets
orphaned and clears corresponding cache entries, so that it can start anew.
Differential Revision: https://phabricator.services.mozilla.com/D161479
[Int]CoordTyped no longer inherits Units because otherwise
instances of [Int]IntPointTyped may get one Base subobject because
it inherits Units, and others because of BasePoint's Coord members,
which end up increasing the [Int]CoordTyped's objects size (since
according to the ISO C++ standard, different Base subobject are
required to have different addresses).
Differential Revision: https://phabricator.services.mozilla.com/D160713
If we have stroked paths whose bounds cover a lot of screen area, that can lead
to a lot of empty area in the interior that bloats the path cache textures up
with unused pixels that still need to be uploaded. Try to avoid this by not
trying to accelerate paths with the path cache that take up a large amount
of screen area.
Differential Revision: https://phabricator.services.mozilla.com/D160023
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Differential Revision: https://phabricator.services.mozilla.com/D158904
If we fail to compile DrawTargetWebgl's shaders, we bail out to a normal software canvas.
However, it will still try to create a DrawTargetWebgl every time we need to create a canvas.
To avoid this, remember if shader compilation failed in the process, and don't try to create
an accelerated canvas again in that case.
Differential Revision: https://phabricator.services.mozilla.com/D158903
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Differential Revision: https://phabricator.services.mozilla.com/D158904
If we fail to compile DrawTargetWebgl's shaders, we bail out to a normal software canvas.
However, it will still try to create a DrawTargetWebgl every time we need to create a canvas.
To avoid this, remember if shader compilation failed in the process, and don't try to create
an accelerated canvas again in that case.
Differential Revision: https://phabricator.services.mozilla.com/D158903
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Depends on D158903
Differential Revision: https://phabricator.services.mozilla.com/D158904
If we fail to compile DrawTargetWebgl's shaders, we bail out to a normal software canvas.
However, it will still try to create a DrawTargetWebgl every time we need to create a canvas.
To avoid this, remember if shader compilation failed in the process, and don't try to create
an accelerated canvas again in that case.
Differential Revision: https://phabricator.services.mozilla.com/D158903
Previously we were reusing the framebuffer's Skia DT to render the clip mask.
This was the path of least resistance since SkCanvas does not allow exporting
clip information, and there is no way to reset the bitmap storage inside an
SkCanvas temporarily.
However, this can cause a feedback cycle of unnecessary WaitForShmem operations,
since we need to wait before we can generate the clip mask into the Skia target,
and then anything else after it needs to wait for the clip mask to finish uploading
before the Skia DT can be used again.
To alleviate this, we just allocate a new DrawTargetSkia to render the clip mask
into. We carefully clip the size of the DT so that in the common case we avoid
having to upload a surface the size of the entire framebuffer. Further, since
this is a completely different DT, we can now use an A8 format (1/4 the memory
overhead) instead of a BGRA8 format for the clip mask, which gives a further
memory usage gain.
A further complication is that we need to log the current clip stack state so
that we can replay it onto the new DrawTargetSkia. This avoids having to add
a mechanism to SkCanvas to export clip information.
Differential Revision: https://phabricator.services.mozilla.com/D157050
Certain events like waiting on a round-trip to verify that the HostWebGLContext is
done using a shmem, or pushing a Skia layer which will need to be flatten later, can
be expensive, especially if they are used many times throughout a frame. However,
we weren't currently incremening the profile counters for these situations which can
lead to accelerated rendering persisting even when it would be more judicious to
fallback to software rendering.
Differential Revision: https://phabricator.services.mozilla.com/D157049
Sometimes the clip state is thrashed when we need to temporarily override
clipping to disable it. However, in this case, the clip mask itself remains
unchanged. The current invalidation scheme doesn't discern between generation
of the clip mask itself and setting the clip state for the shader, leading to
unnecessary regeneration of the clip mask.
This code just tries to discern when this is happening so we can refresh the
clip state without having to regenerate the clip mask unless truly necessary.
Differential Revision: https://phabricator.services.mozilla.com/D157048
Sometimes we hit requests to stroke a path with a rounded line in it that can't
be accelerated inside StrokeLine. This causes it to push a layer which can be
expensive. Go through DrawPath instead in this case which will still try to
accelerate the drawing with a cached texture that does not use a layer.
Differential Revision: https://phabricator.services.mozilla.com/D156791
The clip mask might not get deleted in a timely fashion and can be quite large.
Ensure it gets deleted promptly when DrawTargetWebgl goes away.
Differential Revision: https://phabricator.services.mozilla.com/D156644
DrawTargetWebgl currently only supports aligned rectangular clips that can be approximated
with a scissor. However, many use-cases require complex clips like rounded rectangles or
not-aligned regions. We can support these cases more generally by using a mask texture that
modulates the shader color. The mask texture is generated by doing a solid fill in the Skia
target over a clear background, which is safe because the Skia target is not in use while
the WebGL target is being rendered to. This adds one unconditional texture lookup to the
shaders which shouldn't have a big performance impact. When no clip mask is needed, we just
default to using a 1x1 solid texture.
Depends on D156224
Differential Revision: https://phabricator.services.mozilla.com/D156225
Currently we only support filled glyphs in DrawTargetWebgl. PDF.js can often render PDFs
that have stroked glyphs, so support for stroked glyphs is useful to prevent fallbacks.
This just adds support for plumbing StrokeOptions through to GlyphCache.
Differential Revision: https://phabricator.services.mozilla.com/D156224
We currently don't support repeat modes in the DrawTargetWebgl's image shader.
This change makes it only explicitly accelerate clamped modes. Other extend
modes will just go to the path rasterization option which will pre-rasterize
the image as a filled path and then upload to the texture cache. This will
let us keep the clamp path simple and fast without worrying about uncommon
repeat usage for now. If it ever turns out to be the case that repeat modes
are highly necessary for performance, we can revisit this.
Differential Revision: https://phabricator.services.mozilla.com/D155860
When we are rendering dark-on-light text, we invert the bitmap after
rendering to produce a standard white-on-black mask, since we must actually
render that as black-on-white to get CoreText to produce the correct dilation.
However, when we know we're rendering bitmap fonts for emoji, we don't actually
want this inversion to happen at all. So we need to ensure bitmaps go through
the normal light-on-dark path that doesn't do this.
Differential Revision: https://phabricator.services.mozilla.com/D154777
If viewport clipping is applied to the bounds of glyphs, we can end up finding the
wrong entry if the clipping would differ, since we no longer compare the bounds to
the entry bounds when searching for a matching entry in an effort to reduce the
amount of bounds computations in the common case.
To workaround this, each entry now remembers its full, unclipped bounds. Upon
finding a candidate entry, we offset the full bounds to the new offset and reapply
clipping. If the bounds then match with the old clipped bounds with the new offset
applied, then we know the amount of applied clipping is the same, and it is safe
to reuse the entry.
Differential Revision: https://phabricator.services.mozilla.com/D154490
Rather than allocating TexturePacker children individually, allocate an array
of all the children at once to cut the number of memory allocations in half.
Depends on D154118
Differential Revision: https://phabricator.services.mozilla.com/D154182
Now that we're actually hitting cache entries in the glyph cache, it turns out
always computing the local glyph bounds can be expensive, especially since they
should never really change. Instead, rely on the bounds that were initially
computed when an entry is inserted into the cache, and just change their relative
offset based on the current transform. This way, in the common case of a cache
hit, we never need to compute the bounds.
Differential Revision: https://phabricator.services.mozilla.com/D154118
Offsets were being specified in quantized space (i.e. scaled 1-4x) whereas bounds were
still in the original coordinate space. The bounds need to be transformed by the same
quantization scale so that glyph cache entries compare correctly. This was preventing
cache entries from being properly reused.
Differential Revision: https://phabricator.services.mozilla.com/D154102
This implements a rudimentary form of hash chaining for cache entries that reduces
the number of entry comparisons by about an order of magnitude. It's a bit of a bother
to use an actual hash table here since right now the code expects to be able to have
random access to cache entries in the list, and the MFBT hash table is not quite set
up for this. Instead, this just adds a reasonable fixed number of buckets to bring
the list size better under control.
Differential Revision: https://phabricator.services.mozilla.com/D154101
BorrowSnapshot can be called by OffScreenCanvas in various places that may send
a SourceSurfaceWebgl to the main thread. If it did not originate from the main
thread, then this can cause multiple threads to use it. In general we want to
avoid this. For now, override BorrowSnapshot and make it always force a Skia
snapshot that can be safely shared between threads instead of SourceSurfaceWebgl.
Differential Revision: https://phabricator.services.mozilla.com/D152417
When rendering large and/or fullscreen Canvas2Ds, excessive time can be spent
in calls to TexImage/ReadPixels copying into and out of Shmems to the separate
buffer for DrawTargetSkia. To alleviate this, we can make the DrawTargetSkia
directly wrap the Shmem, so that calls to TexImage/ReadPixels then directly
read or write to this without any separate copy. We modify RawTexImage to use
the IPDL SendTexImage path so that Shmems can be sent via SurfaceDescriptor.
Since SendTexImage is nominally async (which is beneficial), we rely on a
call to GetError later to verify that the Shmem processing is completely before
we further modify the DrawTargetSkia. We further add a ReadPixelsIntoShmem IPDL
call to allow sending the Shmem in the other direction directly.
Differential Revision: https://phabricator.services.mozilla.com/D151286
When rendering large and/or fullscreen Canvas2Ds, excessive time can be spent
in calls to TexImage/ReadPixels copying into and out of Shmems to the separate
buffer for DrawTargetSkia. To alleviate this, we can make the DrawTargetSkia
directly wrap the Shmem, so that calls to TexImage/ReadPixels then directly
read or write to this without any separate copy. We modify RawTexImage to use
the IPDL SendTexImage path so that Shmems can be sent via SurfaceDescriptor.
Since SendTexImage is nominally async (which is beneficial), we rely on a
call to GetError later to verify that the Shmem processing is completely before
we further modify the DrawTargetSkia. We further add a ReadPixelsIntoShmem IPDL
call to allow sending the Shmem in the other direction directly.
Differential Revision: https://phabricator.services.mozilla.com/D151286