Skip to content

Telemetry Results for Add-on Compatibility Check

Earlier this year (in Firefox 32), we landed a fix for bug 760356, to reduce how often we delay starting up the browser in order to check whether all your add-ons are compatible. We landed the related bug 1010449 in Firefox 31 to gather telemetry about the compatibility check, so that we could to before/after analysis.


When you upgrade to a new version of Firefox, changes to the core browser can break add-ons. For this reason, every add-on comes with metadata that says which versions of Firefox it works with. There are a couple of straightforward cases, and quite a few tricky corners…

  • The add-on is compatible with the new Firefox, and everything works just fine.
  • The add-on is incompatible and must be disabled.
    • But maybe there’s an updated version of the add-on available, so we should upgrade it.
    • Or maybe the add-on was installed in a system directory by a third party (e.g. an antivirus toolbar) and Firefox can’t upgrade it.
  • The add-on says it’s compatible, but it’s not – this could break your browser!
    • The add-on author could discover this in advance and publish updated metadata to mark the add-on incompatible.
    • Mozilla could discover the incompatibility and publish a metadata override at to protect our users.
  • The add-on says it’s not compatible, but it actually is.
    • Again, either the add-on author or Mozilla can publish a compatibility override.

We want to keep as many add-ons as possible enabled, because our users love (most of) their add-ons, while protecting users from incompatible add-ons that break Firefox. To do this, we implemented a very conservative check every time you update to a new version. On the first run with a new Firefox version, before we load any add-ons we ask *and* each add-on’s update server whether there is a metadata update available, and whether there is a newer version of the add-on compatible with the new Firefox version. We then enable/disable based on that updated metadata, and offer the user the chance to upgrade those add-ons that have new versions available. Once this is done, we can load up the add-ons and finish starting up the browser.

This check involves multiple network requests, so it can be rather slow. Not surprisingly, our users would rather not have to wait for these checks, so in bug 760356 we implemented a less conservative approach:

  • Keep track of when we last did a background add-on update check, so we know how out of date our metadata is.
  • On the first run of a new Firefox version, only interrupt startup if the metadata is too out of date (two days, in the current implementation) *or* if some add-ons were disabled by this Firefox upgrade but are allowed to be upgraded by the user.

Did it work?

Yes! On the Aurora channel, we went from interrupting 92.7% of the time on the 30 -> 31 upgrade  (378091 out of
407710 first runs reported to telemetry) to 74.8% of the time (84930 out of 113488) on the 31 -> 32 upgrade, to only interrupting 16.4% (10158 out of 61946) so far on the 32 -> 33 upgrade.

The change took effect over two release cycles; the new implementation was in 32, so the change from “interrupt if there are *any* add-ons the user could possibly update” to “interrupt if there is a *newly disabled* add-on the user could update” is in effect for the 31 -> 32 upgrade. However, since we didn’t start tracking the metadata update time until 32, the “don’t interrupt if the metadata is fresh” change wasn’t effective until the 32 -> 33 upgrade. I wish I had thought of that at the time; I would have added the code to remember the update time into the telemetry patch that landed in 31.

Cool, what else did we learn?

On Aurora 33, the distribution of metadata age was:

Age (days) Sessions
< 1 37207
1 9656
2 2538
3 997
4 535
5 319
6 – 10 565
11 – 15 163
16 – 20 94
21 – 25 69
26 – 30 82
31 – 35 50
36 – 40 48
41 – 45 53
46 – 50 6

so about 88% of profiles had fresh metadata when they upgraded. The tail is longer than I expected, though it’s not too thick. We could improve this by forcing a metadata ping (or a full add-on background update) when we download a new Firefox version, but we may need to be careful to do it in a way that doesn’t affect usage statistics on the AMO side.

What about add-on upgrades?

We also started gathering detailed information about how many add-ons are enabled or disabled during various parts of the upgrade process. The measures are all shown as histograms in the telemetry dashboard at;

The number of add-ons (both user-upgradeable and non-upgradeable) disabled during the upgrade because they are not compatible with the new version.
The number of user-upgradeable add-ons disabled during the upgrade.
The number of add-ons that changed from disabled to enabled because of metadata updates during the compatibility check.
The number of add-ons that changed from enabled to disabled because of metadata updates during the compatibility check.
The number of add-ons upgraded to a new compatible version during the add-on compatibility check.
The number of add-ons that had upgrades available during the compatibility check, but the user chose not to upgrade.
The number of add-ons that appeared to have upgrades available, but the attempt to install the upgrade failed.

For these values, we got good telemetry data from the Beta 32 upgrade. The counts represent the number of Firefox sessions that reported that number of affected add-ons (e.g. 3170 Telemetry session reports said that 2 add-ons were XPIDB_DISABLED by the upgrade):

0 2.6M 2.6M  2.6M 2.6M  2.6M 2.6M  2.6M
1 36230 7360  59240 14780  824  121  98
2 3170 1570  2 703  5  1  0
3 648 35  0 43  1  0  0
4 1070 14  1 6  0  0  0
5 53 20  0 0  0  0  0
6 157 194  0 0  0  0  0
7+ 55 9  0 1  0  0  0

The things I find interesting here are:

  • The difference between XPIDB disabled and APPUPDATE disabled is (roughly) the add-ons installed in system directories by third party installers. This implies that 80%-ish of add-ons made incompatible by the upgrade are 3rd party installs.
  • upgraded + declined + failed is (roughly) the add-ons a user *could* update during the browser upgrade, which works out to fewer than one in 2000 browser upgrades having a useful add-on update available. I suspect this is because most add-on updates have already been performed by our regular background update. In any case, to me this implies that further work on updating add-ons during browser upgrade won’t improve our user experience much.

How many Firefox extensions do people use?

I found myself wondering what a “normal” number of XPI extensions is, and since I was working on some other analysis runs on the Mozilla Telemetry database, I took the time to run a count across one day’s data (May 29 2014). Without much further ado, here’s the median and higher percentiles of the number of active add-ons for Firefox and Fennec (Firefox for Android):

Installed add-ons, all versions/channels
Application Platform Sessions Median 75 % 90 % 95 % 99 % max
Firefox WINNT 10472433 3 5 8 11 19 273
Firefox Darwin 364473 3 6 9 12 21 117
Firefox Linux 22701 4 8 16 23 35 112
Firefox ALL 10859607 3 5 8 11 19 273
Fennec Android 735437 0 0 2 3 6 36

Of course there are lots of caveats… These are only enabled add-ons; there was a bug in my code to collect total installed add-ons, and I didn’t feel like waiting for another multi-hour run to get the total. These are only the add-ons managed by the XPI Provider, which means Extensions, Dictionaries and Themes (but not Lightweight Themes or Plugins). Desktop versions of Firefox all come with a pre-configured Default theme, but Fennec doesn’t. Telemetry counts sessions, not users, so the results are somewhat biased toward users that restart their browser more often, and for many users Telemetry is opt-in, so the results are also biased toward the sort of users who would enable it.

That said, we have a useful estimate of how large our internal data structures will get, which can help us make implementation decisions.

The Telemetry map-reduce I used for this had a couple of bugs that I worked around to get my output; because it was a one-off I didn’t bother fixing them and re-running the analysis. That said, you can see the code, map-reduce output and further summarized data at

A Mixed Bag of Telemetry

There were a few open questions, and follow ups on previous work, that I wanted to look at in Mozilla’s Telemetry database (, though that page is out of date).

Add-on Compatibility Check

For Bug 772484, we wanted to know how often browser start up is blocked by the add-on compatibility check. Fortunately there is a clear telemetry probe for that; we set SIMPLE_MEASURES_STARTUPINTERRUPED to 1 if we display the dialog. Looking at the current (as of last week) Release and Beta versions in the dashboard (e.g.

Startup Interrupted (millions of events)
Version Sessions No Yes %
Release 27 318.96 313.07 5.89 1.85
Beta 28 11.66 11.53 0.128 1.10

Now, we only show this UI if (a) the version string of the installed browser changed from the last run in any way (major or minor version change) and (b) the user has add-ons installed somewhere other than the browser’s installation directory; that is, add-ons that weren’t included as part of the software distribution and thus would be outside the user’s control. So that’s the obvious next question – how often does a browser version change *not* trigger the compatibility dialog? This needed a custom Telemetry analysis ( I sampled a few days around when Fx 27 was released, when most users would be upgrading.

Version Changes
Dates Sessions Version changed Interrupted %
Feb 4-8 6653299 959662 662146 69.0
Feb 10-11 8333667 291936 215213 73.7

We’ll want to run this (or a comparable analysis) after Bug 760356 to see if we’ve improved.

SQLITE to JSON conversions

Did Bug 853388 and Bug 853389 make a noticeable difference in browser start up time? For this, I looked at the processed data in the new Telemetry dashboard at for both Firefox (the desktop browser) and Fennec (the Android browser). Bug 853389 shipped in Firefox 25, and 853388 shipped in Fx 26. For the desktop browser we normally measure start up time using ‘FIRSTPAINT’, which is the number of milliseconds from when firefox.exe started running, to when we start displaying the first web page. The following table is summarized from (and ..25.. etc. for the other releases):

Firefox Release, Time to First Paint (seconds)
Percentile Version
24 25 26 27
5% 0.692 0.695 0.694 0.693
25% 1.52 1.53 1.52 1.52
median 3.12 3.37 2.98 3.22
75% 7.13 7.46 6.51 7.41
95% NaN NaN NaN 30.01


Firefox Beta, Time to First Paint (seconds)
Percentile Version
24 25 26 27
5% 0.714 0.714 0.711 0.710
25% 1.81 1.79 1.59 1.59
median 3.58 3.47 3.39 3.40
75% 8.00 7.63 7.49 7.49
95% NaN 30.03 30.01 30.01

For the Android browser (Fennec), we don’t collect FIRSTPAINT but we have an event FENNEC_STARTUP_TIME_GECKOREADY that records when the HTML & JavaScript engine is done initializing.

Fennec Release, Time to Gecko Ready (seconds)
Percentile Version
24 25 26 27
5% 1.72 1.72 2.10 2.10
25% 2.41 2.58 2.75 2.68
median 3.18 3.19 3.90 3.90
75% 4.83 4.85 6.02 6.09
95% 13.32 13.34 16.39 16.41

Unfortunately there isn’t a really clear signal in this data; the higher percentiles of Desktop do improve a little; this makes sense, since browser profiles with extensions installed are likely to have a longer start up time to begin with and be more affected by the change in data storage. The Fennec start up times get significantly worse in version 26; we’re not sure why yet, it could be the overhead of IO.File starting up a separate JavaScript worker thread.

Start-up Exceptions

Bug 952543 added telemetry reporting of exceptions in Addon Manager and XPI Provider start up, and Bug 972852 fixed several of the bugs revealed. I re-ran the analysis; the fixes in 972853 worked, but there are still a few issues. Filed & patched Bug 986080 and Bug 986000; filed Bug 985998 and started a discussion on about the Preferences API, and filed Bug 986104.

Add-on compatibility checks and new Firefox versions

I’ve been working on Bug 772484 – “extension check dialog is annoying and can effectively hang the Firefox process”, and Vladan suggested I post an outline of the issues and proposed changes.

The underlying problem is that when you start up an updated version of Firefox for the first time, we want to make sure that as many as possible of your add-ons keep working, but that we disable any add-ons that are known to cause problems with the new Firefox. This process could include updating add-ons when the user has an old version of the add-on installed and a newer version would be compatible with the new Firefox.

If we don’t do this check, there are two possible bad outcomes: We start up with an add-on disabled when we could have enabled it, which leaves the user without their desired Firefox configuration, or we start up with an add-on enabled that is not compatible, breaking some or all of the browser. When we switched to ‘compatible by default’ for add-ons, to make the rapid release process easier for add-on authors and users, we were worried that we might need to quickly publish compatibility overrides to protect users from cases where we allowed an add-on to load even though the new browser was not compatible with it. To give us the best chance of not breaking the user’s browser, the current implementation completely blocks the browser start up until after we get updated compatibility information from (AMO).

The compatibility check is triggered if the software version of the browser is different from a saved preference containing the version ID of the last run. If the versions are different, the add-on manager startup goes like:

  1. Check all installed add-ons to see if they are compatible with the new browser version, keeping track of any that are enabled or disabled.
  2. Put up a modal dialog box that blocks Firefox start up, and prevent the user from cancelling out of the dialog.
  3. Send a search request (the “metadata ping”) to AMO with a list of all installed add-on IDs. AMO will return up-to-date add-on metadata for any add-ons it knows, including compatibility overrides for add-ons not hosted on AMO but where Mozilla is aware of compatibility issues.
  4. Update stored add-on compatibility for all add-ons, enabling or disabling any for which the AMO information is different from what we knew before.
  5. At this point we allow the user to exit the dialog box.
  6. For all installed add-ons, request add-on update information directly from the add-on’s update URL, which could be AMO again or could be an external add-on hosting location.
  7. Update the stored compatibility info again based on any new info from step 6.
  8. Display a list of all the add-ons that used to be compatible, but aren’t any more, and ask the user whether we should check whether new versions of these add-ons are available.
  9. If the user chooses to proceed, request the add-on’s update URL again (Bug 960597) to find out whether there is a different version of the add-on that is compatible with the new browser version.
  10. Present a list of add-ons for which a compatible version is available, and allow the user to select which ones to download and update.
  11. Download and apply the selected add-ons.
  12. Continue starting up the browser, including loading enabled add-ons.

Everything up to step 11 happens early enough in the start up process that no add-ons have been loaded yet; this allows us to apply any necessary add-on updates without restarting the browser.

Bug 772484 is mostly concerned with steps 2 through 4, where the modal dialog does not allow the user to cancel. If the browser is on a disconnected network where packets sent to Internet hosts are silently discarded, we may wait for either a DNS or TCP connection timeout (up to 70-75 seconds) before we allow the user to interact with the browser.

The metadata ping to AMO in step 3 has a cancel() API, but attempting to use it uncovered Bug 966374. We could make this search interruptible, in which case the effect would be the same as if the search failed – we would start the browser with out-of-date compatibility information, which could lead to running with an incompatible add-on enabled. If we leave the search running in the background, it is almost certain that the browser will proceed to initialize add-ons based on out of date information before the search completes; in this case the browser would enable/disable restartless add-ons on the fly, and mark those that require restart so that they would be in the right state on next startup. In the mean time it is still possible to be running with incompatible add-ons. If we want to ask the user to restart, we would need new UI.

It was possible to close the dialog during the update requests in step 6, but some of the background work was not being cleaned up. Support was added in Bug 925389 so that we can now cancel those requests if we want. As with step 3 above, if we allow the user to cancel during this step we may start with incompatible add-ons, and if we allow the requests to complete in the background we may need a restart to get into a consistent state.

I have work in progress that allows the user to close the add-on check dialog during the initial AMO search (steps 3 and 4) but lets the search complete and updates the add-on database with new compatibility information when it’s done. If the user cancels during the first group of update URL requests (step 6), any requests that have not completed are cancelled. This would  in rare cases lead to starting up with incompatible add-ons (or without compatible ones), but I think it’s a reasonable trade off – particularly since a likely common reason the user wants to cancel is that the requests are taking too long, and cancelling them leads to the same result as letting them time out.

My WIP patch is also adding tests for closing the add-on update dialog at various other steps where it is possible, which has uncovered a couple of other places where we weren’t cleaning up.

In the mean time, Blair has been working on Bug 760356 which reworks the add-on check to avoid blocking startup whenever possible. See the WIP patches on that bug to understand the fine details of the change, but the rough idea is:

  • Keep track of when we last updated add-on compatibility information (typically once per day, when we do the automatic blocklist and add-on background update check)
  • At start up, skip the compatibility dialog if any of the following are true:
    • No add-ons can be updated by the user (e.g. all were pre-installed with the application binaries rather than in the user’s profile)
    • No add-ons were made incompatible and disabled by the browser upgrade
    • the last update of compatibility information was recent enough *and* none of the newly-incompatible add-ons can be updated by the user
  • If we do show the compatibility dialog, it doesn’t try to update all add-ons, only the ones disabled by the browser upgrade.

I’m open to suggestions about both of these approaches, either here or on the relevant bugs.

Add-on Manager JSON databases landed!

Felipe landed the Addon Repository (Bug 853389) changes on Aug 1, so they rode the uplift train and are currently in Firefox Aurora (25). My changes for the XPI Provider (Bug 853388), aside from the Telemetry measurements, landed on Sunday (Aug 11) and are in the latest nightly (Firefox 26). I still need to make some changes to the Telemetry patch and get that through reviews, but the core of the work is finally done.

There are a few follow-up bugs that need to be prioritized and sorted out:

Many thanks to Felipe for his work on the Addon Repository, and to Blair McBride for reviewing everything.

Add-on Manager progress: Almost done!

Felipe has a full suite of r+ for his work on AddonRepository.jsm in Bug 853389, and I’m in the middle of handling review comments for the XPI Database changes in Bug 853388. I need to update based on the review comments, implement asynchronous loading of the JSON database, and add some telemetry so we can track performance and correctness of the new version. Once that is done, we’ll be ready to land on Nightly.

I’ve implemented a DeferredSave module based in the discussion in my previous blog post; I suspect it’s something that will be useful to others trying to convert to asynchronous saving of data blobs. Check it out at

Saving browser state asynchronously

As part of switching the Firefox extensions databases from sqlite to a flat file containing JSON, we want to build a module that flushes the in-memory state of the data out to disk after it changes, in a way that doesn’t hang the main thread waiting for the I/O to complete. Felipe did an initial implementation for bug 853389 based on the DeferredTask module, and I’ve been working on sorting out all the edge cases that the XPIDatabase tests trigger. Unfortunately, there are plenty of edge cases. As a result the attempted implementation is getting rather twisted, so I wanted to put up the current state of my design both to clarify it in my own mind, and to solicit input on ways we could simplify. In particular, I’d like to hear from people who have a good feel for Promise based asynchronous code, to see if there’s an idiomatic Promise based implementation that is more straightforward than the current approach with explicit callbacks.

The basic assumptions of the design are:

  • The working copy of the data is a JavaScript tree is mapped to and from the on-disk structure.
  • All modification of the data is done through the JS structure, so the data only needs to be read in once at start up time.
  • When we decide to write the data, we take a snapshot (via JSON.stringify) and then start the async I/O.
  • There is no locking; the in-memory data can be modified while we’re saving the snapshot.
  • We wait for a short time after the in-memory data is modified before we start writing, so that we can coalesce sets of changes into a single write.
  • Clients indicate that the data needs to be saved by calling saveChanges(), and can optionally be notified (by callback or Promise resolution) when all in-memory changes up to the time saveChanges() was called have been written to disk.

The core states for data saving are:

  1. In memory data is clean and synced to disk
  2. In memory data is dirty, waiting before syncing snapshot to disk
  3. Memory clean, writing of snapshot in progress
  4. Memory dirty, writing of snapshot in progress

With the API that Felipe built for AddonRepository, we support clients providing a callback when they mark the data dirty, and that callback will be invoked after the change in question is written to disk.

In normal operation, the state transitions are:

Clean/Synced –> (modify data) –> Dirty/Waiting
Dirty/Waiting –> (write starts) –> Clean/InProgress
Clean/InProgress –> (write ends) –> Clean/Synced

When operations overlap, we see:

Dirty/Waiting –> (modify data) –> Dirty/Waiting
Clean/InProgress –> (modify data) –> Dirty/InProgress
Dirty/InProgress –> (modify data) –> Dirty/InProgress

Dirty/InProgress –> (write ends) –> Dirty/Waiting

The constraints this meets are:

  • We only have one write going at a time
  • Data could be modified at any time
  • Multiple modifications can be batched into a single write (though the delay could be as short as the next event loop cycle)
  • Beginning to write the data takes a snapshot (JSON.stringify) so it’s OK for the in-memory structure to change while the write is in progress
  • If the data is modified while we’re writing, we need to write again after the in progress write completes

The presence of the callback makes the control flow a little more complicated; for example, we support the following sequence of events:

saveChanges(callback_1)  (now Dirty/Waiting)
saveChanges(callback_1b) (still Dirty/Waiting)
(possible delay before batched write)
begin writing            (now Clean/InProgress)
saveChanges(callback_2)  (now Dirty/InProgress)
SaveChanges(callback_2b) (still Dirty/InProgress)
end writing
callback_1b(result)      (now Dirty/Waiting)
(possible delay before batched write)
begin writing            (now Clean/InProgress)
end writing
callback_2b(result)      (now Clean/Synced)

Turns out we may be able to drop the callback from saveChanges, because neither AddonRepository nor XPIDatabase need it. It saves a bit of bookkeeping, but I’m not sure it makes a huge difference to the overall complexity.

The big headache comes in flushing the data at shutdown time. We need to skip all the delays and get the data flushed, and because things are happening asynchronously, we need to be careful that objects we need aren’t destroyed before we use them. We also need to support a final callback to signal the completion of the flush. We assume that it is unsupported to modify the in-memory data once a flush starts; whether we throw an error if someone tries is TBD.

The flows we need here are:


immediately call flush_callback(success)


Cancel batched write delay
begin writing
(asynchronous wait)
end writing
call saveChanges callbacks
call flush_callback


(async wait)
end writing
call saveChanges callbacks
call flush_callback

Dirty/InProgress: It’s easiest to start with the scenario described above for normal operation…

saveChanges(callback_1)  (now Dirty/Waiting)
saveChanges(callback_1b) (still Dirty/Waiting)
(possible delay before batched write)
begin writing            (now Clean/InProgress)
saveChanges(callback_2)  (now Dirty/InProgress)
SaveChanges(callback_2b) (still Dirty/InProgress)
*** flush(flush_callback)
capture snapshot of in-memory data
(async wait)
end writing
*** no delay
begin writing saved snapshot
(async wait)
end writing

The important differences in the shutdown time flow are:

  • Capture the in-memory state immediately, rather than waiting for the “begin writing” event – this protects against the state being cleaned up by other shutdown events while we’re waiting for the first async write to complete.
  • No delay between the end of the in-progress write and the start of the second write.
  • We could do the final write synchronously, since it has to be complete before Firefox exits, but the flush API would still need to be async to allow for an async write that was already in progress.

At some point we’d like to wrap all this in a module that other parts of the browser could use. First we want to get the behaviour right…

Add-On Manager Progress

It’s been a month since Felipe and I started coding on the add-on manager conversion, and we’ve made quite a bit of progress. The core code for both modules has been converted to load and save in JSON format; we’re now cleaning up corner cases like upgrading from previous database formats.

By far the biggest headache for me has been debugging test failures in the xpcshell test suite. The test suite for Addon Manager is extremely thorough, which is great because it gives me quite a bit of confidence that the new version will work correctly if we get all the tests to pass. The down side is that almost all of the tests are written as end-to-end scenarios where a number of add-ons are installed, updated, uninstalled, etc. and the (simulated) browser environment is started, stopped, upgraded, etc. When a test fails, there is usually an extended debugging session necessary to find the cause.

Unfortunately, all our spiffy new JavaScript debugging tools don’t work in the xpcshell environment; we really need to make progress on Bug 809561 so that we aren’t stuck with putting dump() statements all over the code to find problems. I did put together one patch for the asynchronous test harness to print function names as asynchronous test cases start and end, to at least make it easier to diagnose hanging async tests; that’s in bug 863311.


See Bug 853388 and Bug 853389 for work-in-progress patches and discussion of some of the other issues that have come up.

Speeding up the Add-On Manager

One of the focuses of the performance team this year is to improve start-up time for the browser. We’ve identified that the Add-on Manager has several issues that can delay start up, and can also cause brief user interface hangs (“jank”) during normal operation. This Bugzilla search shows what we’re tracking right now.

After investigating bug 729330, we decided that the best approach to removing the database I/O bottlenecks in the code was to remove the database. The amount of information we store about addons is relatively small (typically tens of kilobytes, a few hundred KB in the worst case). It is not modified very often, and doesn’t really require the sort of random search and update provided by the SQLITE database we’re currently using. We hope that switching the data storage to JSON, with asynchronous I/O to update the persistent copy in the user’s profile, will improve both overall performance and responsiveness.

We hashed out the approach on an Etherpad document at, and now Felipe and I are digging in and starting to change code. The work is being done under Bug 853388 and Bug 853389.

I’ll post progress reports here every week or two, or when anything particularly interesting comes up.

Waking Up

It’s been a long time since I’ve posted anything here. In the mean time I’ve moved jobs to Mozilla, originally working on the Thunderbird email client and more recently on the Firefox performance team, known as “Snappy”. I’ll be trying out using this space to kick around ideas related to the specific portions of Firefox I’m trying to improve.