This is 20% faster on my setup, and according to llvm-mca, the IPC for the false branch (which is the hottest one) goes from 3 to 5.7, thanks to unrolling and conditional moves. Basically, most of the bits of `bits` are generally set, so it's ok to do a few more extra operations as we do them faster. Differential Revision: https://phabricator.services.mozilla.com/D193366