Let’s review our performance measures for 2013.
– significant regressions in “time to throbber start/stop” and Eideticker startup tests
– most Talos measurements stable, or regressions addressed
– slight, gradual improvements to frame rate and responsiveness
This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Native Fennec (Android 2.2 opt). The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.
This test runs the third-party CanvasMark benchmark suite, which measures the browser’s ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex. Results are a score “based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type”. Higher values are better.
7800 (start of period) – 7700 (end of period).
This test was introduced in September and has been fairly stable ever since. There does however seem to be a slight, gradual regression over this period.
Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.
4.4 (start of period) – 2.8 (end of period)
This test saw lots of regressions and improvements over 2013, ending on a stable high note.
Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.
14000 (start of period) – 110000 (end of period)
The nature of this test measurement makes it one of the most variable Talos tests. We overcame the worst of the Sept/Oct regression, but still ended the year worse off than we started.
Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.
375 (start of period) – 375 (end of period).
This test has hardly ever reported significant change — is it a useful test?
An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.
1200 (start of period) – 7200 (end of period).
Introduced in September; the regression of Nov 27 is tracked in bug 944429.
Generic page load test. Lower values are better.
700 (start of period) – 700 (end of period).
This version of tp4m was introduced in September; no significant changes here.
Startup performance test. Lower values are better.
4300 (start of period) – 4300 (end of period)
Introduced in September; there are no significant regressions here, but there is a lot of variability, possibly related to the frequent test failures — see bug 781107.
Throbber Start / Throbber Stop
These graphs are taken from http://phonedash.mozilla.org. Browser startup performance is measured on real phones (a variety of popular devices).
“Time to throbber start” measures the time from process launch to the start of the throbber animation. Smaller values are better.
There is so much data here, it is hard to see what is happening, but a troubling upward trend over the year is evident.
“Time to throbber stop” measures the time from process launch to the end of the throbber animation. Smaller values are better.
Again, there is a lot of data here. Here’s another graph that hides the data for all but a few devices:
Evidently we have lost a lot of ground over the year, with an increase in “time to throbber stop” of nearly 80% for some devices.
These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.
More info at: https://wiki.mozilla.org/Project_Eideticker
Eideticker confirms that startup time has regressed over the year:
Most checkerboarding and frame rate measurements have been steady, or show slight improvement:
Responsiveness — a new measurement added this year — is similarly steady with slight improvement:
See https://www.areweslimyet.com/mobile/ for content and background information.
There is an upward trend here for many of the measurements, but what I find striking is how often we have managed to keep these numbers stable while adding new features.