A few months ago I reported on an experiment I had run in Firefox Nightly 53 where I compared stability (ie. crashes) between users with and without GPU Process. Since then we’ve fixed some bugs and are now days away from releasing GPU Process in Firefox 53. In anticipation of this moment I wanted to make sure we were ready so I organized a repeat of the previous experiment but this time on our Beta users.
As anticipated the results were different than we saw on Nightly but still favourable. I’d like to highlight some of those results in this post.
. As it stands this represents approximately 25% of users. Since GPU Process is enabled by default for these users, the experiment randomly selects half of these users and turns off GPU Process by flipping a pref. There is a small percentage of noise in the data (+/- ~0.5%) since flipping the pref doesn’t actually turn off GPU Process until the first restart following the pref flip.
17% fewer driver related crashes
In the grand scheme of things graphics driver crashes represented about 3.17% of all crashes reported. For those with GPU Process enabled the percentage was much lower (2.81%) compared to those with GPU Process disabled (3.54%). While these numbers are low it’s worth noting that all users are not created equal. Some users may experience more driver-related crashes than others based on a multitude of factors (modernity of OS and driver updates, hardware, and mixture of third-party software, etc). It is conceivable, although not provable by this experiment, that the impact of this change would be more noticeable to users more prone to driver related issues. It’s also worth noting that we managed to make this change without introducing any new driver related crashes in the UI process which means Firefox should be much less prone to crashing entirely because of an interaction with the driver, although content may still be affected.
22% fewer Direct3D related crashes
Another category of crashes that sees improvements from GPU Process is D3D related crashes. This category of crashes typically involves hardware accelerated content on the web. In the past we’d see these occurring in the UI process which resulted in Firefox crashing completely. Now, with GPU Process we see about a 1/5th reduction in these crashes and those that remain tend not to happen in the UI process anymore. The end-user impact is that you might have to reload a page but Firefox dies less often.
11% fewer Direct3D accelerated video crashes
More stable hardware accelerated video another interesting benefit of GPU Process. We see about 11% fewer DXVA (DirectX Video Accelerator) related crashes in the test group with GPU Process enabled than the test group with it disabled. The end result of this should be slightly fewer crashes which take down the browser when viewing hardware accelerated video, think sites like YouTube.
Looking at the topcrash charts from Socorro shows expected movement in overall crash volumes. Browser topcrashes are down approximately 10% overall when GPU Process is enabled. Meanwhile, Content topcrashes are up approximately 8% and Plugin topcrashes are down 18%. Topcrashes in the GPU process only account for 0.13% of the overall topcrash volume. These numbers are in line with what we expected based on prior testing.
All of the previous metrics are based on anonymous data we receive from Socorro, ie. crash reports submitted by users. This data is extremely useful in digging into very specific details about crashes but it biases towards users who submit crash reports. Telemetry gives us less detailed information but is better at determining the broader impact of features since it represents a broader user population.
For the purposes of the experiment I compared the overall crash rate for each test group, where crash rate is defined as the number of crashes per 1,000 hours of aggregate browser usage across the entire test group population. The findings from the experiment are interesting in that we saw a slight increase in the Browser process crash rate (up 5.9% in the Enabled cohort), a smaller increase in the Content process crash rate (up 2.5% in the Enabled cohort), and a large decrease in the Plugin crash rate (down 20.6% in the Enabled cohort).
Overall, however, comparing the Enabled cohort to Firefox 52.0.2 does show more expected results with 0.51% lower browser crash rate, 18.1% higher content crash rate, and 18% lower plugin crash rate. It’s also worth noting that the crash rate for GPU Process crashes is very low, relatively, with just 0.07 crashes per 1,000 usage hours. Put in other terms that’s one GPU Process crash every 14,285 hours compared to one Browser process crash every 353 hours.
In the end I think we have accomplished our goal: introducing a GPU process (a foundational piece of the Quantum project) without regressing overall product stability. We’ve reduced the overall volume of some categories of graphics-related crashes while making others less prone to taking down the entire browser. One of the primary fears was that in doing so we’d introduce new ways to take down the browser but I’ve not yet found evidence that this has happened.
Of course the numbers themselves don’t tell the whole story. Now begins a deeper investigation of which crashes (ie. signatures) have changed significantly so that we can improve GPU Process further, but I think we have an excellent foundation to build from.
And of course, watching what happens as we roll out to users in Release with Firefox 53 in the coming days.