Commit Graph

54 Commits

Author SHA1 Message Date
Sylvestre Ledru
3bf4f867df Bug 1519636 - Reformat recent changes to the Google coding style r=Ehsan
# ignore-this-changeset

Differential Revision: https://phabricator.services.mozilla.com/D38057
2019-07-16 07:33:44 +00:00
dlee
6ee4703f17 Bug 1531354 - P6. Remove unused testing files and load old version of prefixes data. r=gcp
This patch does the following:
1. Remove testing files from disk because they are no longer required.
2. Load completions from previous version of HashStore until an update
   is applied.
3. Older version of HashStore(.sbstore) & PrefixSet(.vlpset) will be
   removed during an update

Differential Revision: https://phabricator.services.mozilla.com/D36002
2019-06-29 19:24:14 +00:00
dlee
f59bb40e5d Bug 1531354 - P5. Safe Browsing test entries are directly stored in LookupCache. r=gcp
Create test entries via update introduces performance overhead.
We can store them directly in LookupCache and do not save test entries
to disk.

Differential Revision: https://phabricator.services.mozilla.com/D34576
2019-06-29 19:05:41 +00:00
dlee
bbf7fa8756 Bug 1531354 - P2. Use variable-length prefix set in LookupCacheV2. r=gcp
1. VariableLengthPrefixSet supports getting/setting prefixes with
AddPrefixArray and AddCompletesArray

2. VariableLengthPrefixSet supports passing prefix as an integer in
Match API. This is because how V2 and V4 see prefixes as an integer
works differently.

Differential Revision: https://phabricator.services.mozilla.com/D34547
2019-06-26 19:40:45 +00:00
dlee
9aaaafdee3 Bug 1531354 - P1. Remove mPrefixSet and mUpdateCompletions from LookupCacheV2 and use mVLPresetSet. r=gcp
The goal of the series of patches is to improve Safe Browsing performance by
skipping uncessary file IO.

The first two patches is to remove the dependency between LookupCache and HashStore, so HashStore is only
responsible for udpates.

Before this patch, LookupCacheV2 treats prefixes and completions
differently. It uses two data structures to maintain
prefixes:
1. mPrefixSet to store prefixes from .pset
2. mUpdateCompletions to store completions from .sbstore

After this patch
1. LookupCacheV2 & LookupCacheV4 both use variable-length
prefix set. mUpdateCompletions and mPrefixSet are removed and
mVLPrefixSet is used to store all prefixes data.
2. Move common function to base class.

Note that in this patch, conversion between 4/32 bytes prefixes and
mVLPrefixSet is not yet included, it will be handled in next patch.
This patch tries not to deal with any logic changes, only focus on refining
LookupCacheV2 & LookupCacheV4 class structure to use variable-length
prefixset for both classes.

Differential Revision: https://phabricator.services.mozilla.com/D34546
2019-06-21 23:07:52 +00:00
dlee
158ec83968 Bug 1543341 - Refine Safe Browsing log output. r=baku
After calling Lookup API per table, Safe Browsing outputs too many debug
message for a single URL lookup. Refine the current output.

Differential Revision: https://phabricator.services.mozilla.com/D27066
2019-04-11 18:57:56 +00:00
dlee
f066428f6b Bug 1543319 - P1. Free intermediate memory as early as possible during Safe Browsing update. r=gcp
Here is the flow how prefixes are handled during an V4 update:
1. Prefixes are received from Safe Browsing update, stored in ProtocolBuffer
2. Copy the prefixes from ProtocolBuffer to TableUpdate structure
3. Prefixes in TableUpdate are merged with local prefixes (stored in LookupCacheV4)
4. Merged prefixes are processes by PrefixSet to generate the in-memory prefix
   set data structure (MakePrefixSet).

In this patch, we free the prefixes stored in TableUpdate right after step3.
This reduces the peak memory used during an update (peak happens in step 4).

Differential Revision: https://phabricator.services.mozilla.com/D26860
2019-04-10 14:32:54 +00:00
dlee
ddf22fa781 Bug 1353956 - P6. Load the old prefixset(.pset) when there is no .vlpset. r=gcp
To avoid forcing a redownload of SafeBrowsing v4 list.

Differential Revision: https://phabricator.services.mozilla.com/D21876
2019-03-07 14:42:31 +00:00
dlee
8514f5041e Bug 1353956 - P5. Remove old v4 prefix files after new files are stored. r=gcp
This patch is to cleanup old SafeBrowsing v4 prefix files.

Differential Revision: https://phabricator.services.mozilla.com/D21464
2019-03-07 14:41:52 +00:00
dlee
c975e636dd Bug 1353956 - P4. Add header and CRC32 checksum to SafeBrowsing V4 prefix files. r=gcp
After this patch, we may have the following files in SafeBrowsing
directory:
- (v2) .sbstore  : Store V2 chunkdata, for update, MD5 integrity check
                   while load
- (v2) .pset     : Store V2 prefixset, for lookup, load upon startup, no
                  integrity check
- (v4) .metadata : Store V4 state, for update, no integrity check
- (v4) .vlpset   : Store V4 prefixset, for lookup, load upon startup,
                   CRC32 integrity check
- (v4) .pset     : V4 prefix set before this patch, should be removed

The magic string is also added to ".vlpset" header so we can add
a telemetry to see if sanity check is good enough for prefix set
integrity check (The telemetry is not yet added). If yes, we can remove
the CRC32 in the future for even better performance.

Differential Revision: https://phabricator.services.mozilla.com/D21463
2019-03-07 14:41:25 +00:00
Dimi Lee
e1ed95ebdf Bug 1353956 - P3. Separate file processing and prefix data processing for SafeBrowsing prefix set. r=gcp
SafeBrowsing prefix files LOAD/SAVE operations are handled in xxxPrefixSet.cpp.
It would be more clear if xxxPrefixSet.cpp only processes prefix data,
while LookupCacheV2/LookupCacheV4 which use prefix set process file.

This patch doesn't change any behavior, testcases need to update because
the LookupCache & xxxPrefixSet APIs are changed.

Differential Revision: https://phabricator.services.mozilla.com/D21462
2019-03-07 14:40:56 +00:00
dlee
205c59539f Bug 1353956 - P2. Do not use SHA-256 while loading the V4 prefix files. r=gcp
SHA256 is an expensive operation, we should avoid using them if
possible. SafeBrowsing prefix files are loaded during startup and
verify integrity with SHA256 which may affect the performance
especially on the low-end device.

This patch simply removes the SHA256 integrity check. CRC32 version
integrity check will be introduced in the other patch.

This patch also changes the behavior of recording
"Telemetry::URLCLASSIFIER_VLPS_LOAD_CORRUPT" a little bit.
It used to records only once per session(during startup, the first
time we load prefix set), now it records per update.

Differential Revision: https://phabricator.services.mozilla.com/D21461
2019-03-07 14:40:28 +00:00
Dimi Lee
0b4d6efd3f Bug 1353956 - P1. Rename checksum used in SafeBrowsing V4 to SHA256. r=gcp
SafeBrowsing V4 protocol use SHA-256 as the checksum to check integrity
of update data and also the integrity of prefix files.

SafeBrowsing V2 HashStore use MD5 as the checksum to check integrity of
.sbstore

Since we are going to use CRC32 as the integrity check of V4 prefix files,
I think rename V4 "checksum" to SHA256 can improve readability.

Differential Revision: https://phabricator.services.mozilla.com/D21460
2019-03-07 14:40:14 +00:00
Dorel Luca
688429d9d6 Backed out 6 changesets (bug 1353956) for Linux Build bustage
Backed out changeset 71dafccc22ae (bug 1353956)
Backed out changeset f1f29fe519cf (bug 1353956)
Backed out changeset 4978556a66f6 (bug 1353956)
Backed out changeset bc0b91abce9b (bug 1353956)
Backed out changeset 6b8412db5a05 (bug 1353956)
Backed out changeset 3d326cfcd002 (bug 1353956)
2019-03-07 01:49:03 +02:00
dlee
57ada30b9f Bug 1353956 - P6. Load the old prefixset(.pset) when there is no .vlpset. r=gcp
To avoid forcing a redownload of SafeBrowsing v4 list.

Differential Revision: https://phabricator.services.mozilla.com/D21876
2019-03-06 09:41:34 +00:00
dlee
8846f51858 Bug 1353956 - P5. Remove old v4 prefix files after new files are stored. r=gcp
This patch is to cleanup old SafeBrowsing v4 prefix files.

Differential Revision: https://phabricator.services.mozilla.com/D21464
2019-03-05 18:32:23 +00:00
dlee
cd1fa6d7c2 Bug 1353956 - P4. Add header and CRC32 checksum to SafeBrowsing V4 prefix files. r=gcp
After this patch, we may have the following files in SafeBrowsing
directory:
- (v2) .sbstore  : Store V2 chunkdata, for update, MD5 integrity check
                   while load
- (v2) .pset     : Store V2 prefixset, for lookup, load upon startup, no
                  integrity check
- (v4) .metadata : Store V4 state, for update, no integrity check
- (v4) .vlpset   : Store V4 prefixset, for lookup, load upon startup,
                   CRC32 integrity check
- (v4) .pset     : V4 prefix set before this patch, should be removed

The magic string is also added to ".vlpset" header so we can add
a telemetry to see if sanity check is good enough for prefix set
integrity check (The telemetry is not yet added). If yes, we can remove
the CRC32 in the future for even better performance.

Differential Revision: https://phabricator.services.mozilla.com/D21463
2019-03-06 22:57:12 +00:00
Dimi Lee
82b58b495d Bug 1353956 - P3. Separate file processing and prefix data processing for SafeBrowsing prefix set. r=gcp
SafeBrowsing prefix files LOAD/SAVE operations are handled in xxxPrefixSet.cpp.
It would be more clear if xxxPrefixSet.cpp only processes prefix data,
while LookupCacheV2/LookupCacheV4 which use prefix set process file.

This patch doesn't change any behavior, testcases need to update because
the LookupCache & xxxPrefixSet APIs are changed.

Differential Revision: https://phabricator.services.mozilla.com/D21462
2019-03-04 21:22:46 +00:00
dlee
d8899e5669 Bug 1353956 - P2. Do not use SHA-256 while loading the V4 prefix files. r=gcp
SHA256 is an expensive operation, we should avoid using them if
possible. SafeBrowsing prefix files are loaded during startup and
verify integrity with SHA256 which may affect the performance
especially on the low-end device.

This patch simply removes the SHA256 integrity check. CRC32 version
integrity check will be introduced in the other patch.

This patch also changes the behavior of recording
"Telemetry::URLCLASSIFIER_VLPS_LOAD_CORRUPT" a little bit.
It used to records only once per session(during startup, the first
time we load prefix set), now it records per update.

Differential Revision: https://phabricator.services.mozilla.com/D21461
2019-02-28 08:18:46 +00:00
Dimi Lee
15191963de Bug 1353956 - P1. Rename checksum used in SafeBrowsing V4 to SHA256. r=gcp
SafeBrowsing V4 protocol use SHA-256 as the checksum to check integrity
of update data and also the integrity of prefix files.

SafeBrowsing V2 HashStore use MD5 as the checksum to check integrity of
.sbstore

Since we are going to use CRC32 as the integrity check of V4 prefix files,
I think rename V4 "checksum" to SHA256 can improve readability.

Differential Revision: https://phabricator.services.mozilla.com/D21460
2019-02-28 08:12:36 +00:00
Sylvestre Ledru
e5a134f73a Bug 1511181 - Reformat everything to the Google coding style r=ehsan a=clang-format
# ignore-this-changeset
2018-11-30 11:46:48 +01:00
Francois Marier
8e84aa521c Bug 1362761 - Add more specific warnings in case of file corruption. r=dimi
MozReview-Commit-ID: KsgcQWLGulH

Differential Revision: https://phabricator.services.mozilla.com/D2061
2018-07-11 08:58:15 +00:00
Francois Marier
70a9e6972b Bug 1362761 - Force file and streams to use smart pointers. r=dimi
MozReview-Commit-ID: GscB9PaaN02

Differential Revision: https://phabricator.services.mozilla.com/D2060
2018-07-12 22:19:40 +00:00
Francois Marier
b910cbbb34 Bug 1434206 - Make TableUpdate objects const as much as possible. r=gcp
I tried to make TableUpdateArray point to const TableUpdate objects
everywhere but there were two problems:

- HashStore::ApplyUpdate() triggers a few Merge() calls which include
  sorting the underlying TableUpdate object first.

- LookupCacheV4::ApplyUpdate() calls TableUpdateV4::NewChecksum() when the
  checksum is missing and that sets mChecksum.

MozReview-Commit-ID: LIhJcoxo7e7
2018-05-11 16:02:37 -07:00
Francois Marier
ef8ea55aca Bug 1434206 - Keep TableUpdate objects in smart pointers. r=gcp
Manually keeping tabs on the lifetime of these objects is a pain
and is the likely source of some of our crashes. I suspect we might
also be leaking memory.

This change creates an explicit copy of the main array into the
update thread to avoid using a non-thread-safe shared data
structure. This is a shallow copy. Only the pointers to the
TableUpdates are copied, which means one pointer per list (e.g. 5
in total for google4 in a new profile).

MozReview-Commit-ID: 221d6GkKt0M
2018-06-01 15:48:48 -07:00
Francois Marier
ffcbff3049 Bug 1434206 - Add const to members and functions that can take it. r=gcp
MozReview-Commit-ID: B2aaQTttPAV
2018-05-16 15:26:14 -07:00
Francois Marier
2678718985 Bug 1452445 - Promote MOZ_LOG calls to NS_WARNING in LookupCacheV4. r=gcp
This should help narrow down which of the code paths is responsible
for the intermittent failures we are seeing.

MozReview-Commit-ID: JHVZzixpOg6
2018-04-30 16:44:35 -07:00
Francois Marier
170af9408f Bug 1438671 - Remove the std::string wrapper in TableUpdateV4. r=gcp
Given we're no longer using dependent strings in
LookupCacheV4::PrefixString(), we will end up make a copy of the
prefixes at some point. Let's do it early and remove a bunch of
complicated code.

Make the string copies fallible so that we return an error and
fail the update instead of crashing.

MozReview-Commit-ID: 5cZHSDIJSlD
2018-04-03 17:11:30 -07:00
Francois Marier
79f0140281 Bug 1438671 - Add assertions to enforce the size of prefix strings. r=gcp
Also document the meaning of mPrimed in LookupCache.h.

MozReview-Commit-ID: 63GAHwU3Rx3
2018-03-29 15:40:13 -07:00
Francois Marier
098fce2112 Bug 1438671 - Remove some inappropriate uses of dependent strings. r=gcp
Dependent strings are recommended only when dealing with a character
buffer (i.e. char*). Using it here makes it more likely that we'll
hang on to a string buffer that will be deallocated.

nsCString will by default share the underlying string buffers when
it can (i.e. when copying entire strings on the heap) so it should
be able to avoid unnecessary copies.

MozReview-Commit-ID: 3rTUYmouzcT
2018-03-29 16:31:39 -07:00
Francois Marier
612e949b97 Bug 1442486 - Mark LookupCacheV4 as primed after creating it. r=gcp
RegenActiveTables() relies on mPrimed being set correctly and so
the V4 lookup cache should behave the same way as the V2 one.

The V2 lookup cache on the other hand was unnecessarily setting
mPrimed to true twice.

MozReview-Commit-ID: LwNdI9DTqZ7
2018-03-01 18:09:58 -08:00
Francois Marier
2fcfa5bb32 Bug 1433636 - Put a limit on the length of Safe Browsing metadata values. r=gcp
Disk corruption can lead to the stored length of a value to be
unreasonably large and trigger an OOM.

Since values are all currently <= 32 bytes, we can safely enforce
a 256-byte upper bound.

MozReview-Commit-ID: XygReOpEK3
2018-01-30 14:21:33 -08:00
Thomas Nguyen
a63544b5cb Bug 1376410 - Handle OOM when appending prefix to map r=francois,hchang
MozReview-Commit-ID: 7MOHHAgEI1I
2017-08-11 17:28:40 +08:00
DimiL
02e313d384 Bug 1359299 - V4 caches in LookupCache need to be copied around in copy constructor. r=hchang
MozReview-Commit-ID: AjzUUmQKiPW
2017-06-06 14:16:57 +08:00
Ryan VanderMeulen
c121499332 Backed out changeset c0b940487708 (bug 1359299) for causing intermittent Windows safebrowsing crashes. 2017-05-24 09:11:04 -04:00
DimiL
ecf67ffc51 Bug 1359299 - Copy fullhash cache when update. r=hchang
After adopting the new thread model for safebrowsing, we will create a new
lookup cache for update so we can still check lookup cache at the same time.

Prefix set, completions will be generated when we open the new lookup cache
but it won't include cache, so we will loss cache after that.

This patch will copy cache data from old lookup cache to new lookup
cache while update.

MozReview-Commit-ID: L0WpiHOGIGm
2017-05-23 09:19:06 +08:00
DimiL
a0b8501692 Bug 1333328 - Refactor cache miss handling mechanism for V2. r=francois
In this patch, we will make Safebrowsing V2 caching use the same algorithm as V4.
So we remove "mMissCache" for negative caching and TableFresness check for
positive caching.

But Safebrowsing V2 doesn't contain negative/positive cache duration information in
gethash response. So we hard-code a fixed value, 15 minutes, as cache duration.
In this way, we can sync the mechanism we handle caching for V2 and V4.

An extra effort for V2 here is that we need to manually record prefixes misses
because we won't get any response for those prefixes(implemented in
nsUrlClassifierLookupCallback::CacheMisses).
2017-05-04 09:38:14 +08:00
dimi
b19db734bc Bug 1311933 - P1. Use integer as the key of safebrowsing cache. r=francois
In Bug 1323953, we always send 4-bytes prefix for completion and the prefix is also
used as the key to store cache result from gethash request.
Since it is always 4-bytes, we could convert it to integer for simplicity.

MozReview-Commit-ID: Lkvrg0wvX5Z
2017-04-11 16:07:26 +08:00
dimi
bb15dc150d Bug 1311935 - P3. Implement safebrowsing v4 caching logic. r=francois
LookupCacheV4::Has implements safebrowsing v4 caching logic.
1. Check if fullhash match any prefix in local database:
  - If not, the URL is safe.
2. Check if prefix is in the cache(prefix is always the first 4-byte of
   the fullhash, Bug 1323953):
  - If not, send fullhash request
3. Check if fullhash is in the positive cache:
  - If fullhash is found and it is not expired, the URL is not safe.
  - If fullhash is found and it is expired, send fullhash request.
4. If fullhash is not found, check negative cache expired time:
  - If negative cache time is not expired, the URL is safe.
  - If negative cache time is expired, send fullhash request.

MozReview-Commit-ID: GRX7CP8ig49
2017-04-10 14:21:08 +08:00
Iris Hsiao
cd018fd494 Backed out 4 changesets (bug 1311935) for causing assertion crash by developer's request
Backed out changeset 27e624cd9479 (bug 1311935)
Backed out changeset 4c0381ab0990 (bug 1311935)
Backed out changeset 73587838ef16 (bug 1311935)
Backed out changeset a5a6c0f79733 (bug 1311935)
2017-04-11 11:04:54 +08:00
dimi
3a7526678a Bug 1311935 - P3. Implement safebrowsing v4 caching logic. r=francois
LookupCacheV4::Has implements safebrowsing v4 caching logic.
1. Check if fullhash match any prefix in local database:
  - If not, the URL is safe.
2. Check if prefix is in the cache(prefix is always the first 4-byte of
   the fullhash, Bug 1323953):
  - If not, send fullhash request
3. Check if fullhash is in the positive cache:
  - If fullhash is found and it is not expired, the URL is not safe.
  - If fullhash is found and it is expired, send fullhash request.
4. If fullhash is not found, check negative cache expired time:
  - If negative cache time is not expired, the URL is safe.
  - If negative cache time is expired, send fullhash request.

MozReview-Commit-ID: GRX7CP8ig49
2017-04-10 14:21:08 +08:00
Thomas Nguyen
2b6e2edd11 Bug 1297962 - Add noise data when sending v4 gethash request r=francois
MozReview-Commit-ID: GbyvX7wcg8c
* * *
[mq]: 1297962_review

MozReview-Commit-ID: 1U2T0wq778R
2017-02-24 10:22:12 +08:00
dimi
5750b4baba Bug 1336909 - Restrict URLCLASSIFIER_PREFIX_MATCH to profiles that have working V4. r=francois
MozReview-Commit-ID: L3lKgiohalH
2017-02-08 15:18:35 +08:00
dimi
591977dd60 Bug 1336865 - Add telemetry to measure time spent on constructing variable-length prefix set. r=francois
MozReview-Commit-ID: CNhfYdH1ryA
2017-02-07 16:14:58 +08:00
dimi
9cac186808 Bug 1328821 - hash completion request for v4 should not depend on table freshness. r=francois,henry
MozReview-Commit-ID: EIjDrnj1I4S
2017-01-17 08:33:08 +08:00
DimiL
3cb66c6aee Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. r=francois,henry
MozReview-Commit-ID: JL4aZrUOGH7
2016-12-19 09:43:02 +08:00
Henry Chang
b9d5f5080f Bug 1312339 - LookupResult to support variable length partial hash. r=francois
MozReview-Commit-ID: DKwNCNKJAW
2016-12-16 14:34:32 +08:00
Thomas Nguyen
7064eed520 Bug 1315386 - Make Safe Browsing code more shutdown-aware. r=francois,gcp.
MozReview-Commit-ID: ATCVfh5YLZl
2016-11-25 16:02:37 +08:00
Thomas Nguyen
9a9556049a Bug 1298257 - Implement url matching for variable-length prefix set. r=dimi,gcp
MozReview-Commit-ID: 8Goh7yyAotN
2016-11-04 12:00:33 +08:00
dimi
6d436a40ee Bug 1305581 - Verify that V4 updates were applied correctly by computing a checksum on the final result. r=francois
MozReview-Commit-ID: LNtFOVMVw2U
2016-10-27 08:36:26 +08:00