Firefox for Android Performance Measures – September/October check-up

Tags

,

I skipped my monthly review for September, so here is a review of Firefox for Android performance measurements for September and October. Highlights:

- significant regression in tcheck2

- minor regressions in tp4m, tspaint

- improvements in trobopan, tsvgx

- checkerboard regressions in eideticker

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

check2

12 (start of period) – 17.8 (end of period)

Significant regression Oct 7 – bug 1086642.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 30000 (end of period)

Distinct improvement around Oct 8.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6100 (end of period).

Distinct improvement October 4.

tp4m

Generic page load test. Lower values are better.

850 (start of period) – 880 (end of period).

Regression on October 22 – bug 1088669.

ts_paint

Startup performance test. Lower values are better.

tspaint

3850 (start of period) – 3850 (end of period).

Regression on Oct 28 – bug 1091664.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbberstart

Note the changes in throbber start (and corresponding improvements in throbber stop, below) around Oct 1 — bug 888482. Changes around Oct 28 are being investigated in bug 1091664.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

cnn

cnn2

cnn3

dirty

taskjs

Running my own AutoPhone

Tags

,

AutoPhone is a brilliant platform for running automated tests on physical mobile devices.

:bc maintains an AutoPhone instance running startup performance tests (aka “S1/S2 tests” or “throbber start/stop tests”) on a small farm of test phones; those tests run against all of our Firefox for Android builds and results are reported to PhoneDash, available for viewing at http://phonedash.mozilla.org/.

I have used phonedash.mozilla.org for a long time now, and reported regressions in bugs and in my monthly “Performance Check-up” posts, but I have never looked under the covers or tried to use AutoPhone myself — until this week.

All things considered, it is surprisingly easy to set up your own AutoPhone instance and run your own tests. You might want to do this to reproduce phonedash.mozilla.org results on your own computer, or to check for regressions on a feature before check-in.

Here’s what I did to run my own AutoPhone instance running S1/S2 tests against mozilla-inbound builds:

Install AutoPhone:

git clone https://github.com/mozilla/autophone

cd autophone

pip install -r requirements.txt

Install PhoneDash, to store and view results:

git clone https://github.com/markrcote/phonedash

Create a phonedash settings file, phonedash/server/settings.cfg with content:

[database]
SQL_TYPE=sqlite
SQL_DB=yourdb
SQL_SERVER=localhost
SQL_USER=
SQL_PASSWD=

Start phonedash:

python server.py <ip address of your computer>

It will log status messages to the console. Watch that for any errors, and to get a better understanding of what’s happening.

Prepare your device:

Connect your Android phone or tablet to your computer by USB. Multiple devices may be connected. Each device must be rooted. Check that you can see your devices with adb devices — and note the serial number(s) (see devices.ini below).

Configure your device:

cp devices.ini.example devices.ini

Edit devices.ini, changing the serial numbers to your device serial numbers and the device names to something meaningful to you. Here’s my simple devices.ini for one device I called “gbrown”:

[gbrown]
serialno=01498B300600B008

Configure autophone:

cp autophone.ini.example autophone.ini

Edit autophone.ini to make it your own. Most of the defaults are fine; here is mine:

[settings]
#clear_cache = False
#ipaddr = …
#port = 28001
#cachefile = autophone_cache.json
#logfile = autophone.log
loglevel = DEBUG
test_path = tests/manifest.ini
#emailcfg = email.ini
enable_pulse = True
enable_unittests = False
#cache_dir = builds
#override_build_dir = None
repos = mozilla-inbound
#buildtypes = opt
#build_cache_port = 28008
verbose = True

#build_cache_size = 20
#build_cache_expires = 7
#device_ready_retry_wait = 20
#device_ready_retry_attempts = 3
#device_battery_min = 90
#device_battery_max = 95
#phone_retry_limit = 2
#phone_retry_wait = 15
#phone_max_reboots = 3
#phone_ping_interval = 15
#phone_command_queue_timeout = 10
#phone_command_queue_timeout = 1
#phone_crash_window = 30
#phone_crash_limit = 5

python autophone.py -h provides help on options, which are analogues of the autophone.ini settings.

Configure your tests:

Notice that autophone.ini has a test path of tests/manifest.ini. By default, tests/manifest.ini is configured for S1/S2 tests — it points to configs/s1s2_settings.ini. We need to set up that file:

cd configs

cp s1s2_settings.ini.example s1s2_settings.ini

Edit s1s2_settings.ini to make it your own. Here’s mine:

[paths]
#source = files/
#dest = /mnt/sdcard/tests/autophone/s1test/
#profile = /data/local/tmp/profile

[locations]
# test locations can be empty to specify a local
# path on the device or can be a url to specify
# a web server.
local =
remote = http://192.168.0.82:8080/

[tests]
blank = blank.html
twitter = twitter.html

[settings]
iterations = 2
resulturl = http://192.168.0.82:8080/api/s1s2/

[signature]
id =
key =

Be sure to set the resulturl to match your PhoneDash instance.

If running local tests, copy your test files (like blank.html above) to the files directory. If runnng remote tests, be sure that your test files are served from the resulturl (if using PhoneDash, copy to the html directory).

Start autophone:

python autophone.py –config autophone.ini

With these settings, autophone will listen for new builds on mozilla-inbound, and start tests on your device(s) for each one. You should start to see your device reboot, then Firefox will be installed and startup tests will run. As more builds complete on mozilla-inbound, more tests will run.

autophone.py will print some diagnostics to the console, but much more detail is available in autophone.log — watch that to see what’s happening.

Check your phonedash instance for results — visit http://<ip address of your computer>:8080. At first this won’t have any data, but as autophone runs tests, you’ll start to see results. Here’s my instance after a few hours:

myphonedash

Firefox for Android Performance Measures – August check-up

Tags

, , ,

My monthly review of Firefox for Android performance measurements. This month’s highlights:

 – Eideticker for Android is back!

 – small regression in ts_paint

 – small improvement in tp4m

 – small regression in time to throbber start / stop

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test is not currently run on Android 4.0.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

12 (start of period) – 12 (end of period)

There was a temporary regression in this test for much of the month, but it seems to be resolved now.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 50000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6300 (end of period).

tp4m

Generic page load test. Lower values are better.

940 (start of period) – 850 (end of period).

tp4m

Improvement noted around August 21.

ts_paint

Startup performance test. Lower values are better.

3650 (start of period) – 3850 (end of period).

tspaint

Note the slight regression around August 12, and perhaps another around August 27 – bug 1061878.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

Note the regression in time to throbber start around August 14 — bug 1056176.

throbstop

The same regression, less pronounced, is seen in time to throbber stop.

Eideticker

Eideticker for Android is back after a long rest – yahoo!!

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1 eide2 eide3 eide4 eide5

Firefox for Android Performance Measures – July check-up

Tags

, ,

My monthly review of Firefox for Android performance measurements. This month’s highlights:

- No significant regressions or improvements found!

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test is not currently run on Android 4.0.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

Screenshot from 2014-08-03 13:13:44

12 (start of period) – 12 (end of period)

The temporary regression of July 24 was caused by bug 1031107; resolved by bug 1044702.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 50000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6300 (end of period).

tp4m

Generic page load test. Lower values are better.

940 (start of period) – 940 (end of period).

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3650 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Screenshot from 2014-08-03 13:39:26

Screenshot from 2014-08-03 13:45:17

Screenshot from 2014-08-03 13:49:07

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

The Eideticker dashboard is slowly coming back to life, but there are still not enough results to show graphs here. We’ll check back at the end of August.

New try aliases “xpcshell” and “robocop”

Tags

, ,

We now have two new try aliases which will be of interest to some mobile developers.

Android 2.3 tests run xpcshell tests in 3 chunks, which can be specified in a try push:

try: … -u xpcshell-1,xpcshell-2,xpcshell-3

but since all other test platforms run xpcshell as a single chunk, it’s easy to forget about Android 2.3’s chunks and push something like:

try: -b o -p all -u xpcshell -t none

…and then wonder why xpcshell tests didn’t run for Android 2.3!

As of today, a new try alias recognizes “xpcshell” to mean “run all the xpcshell test chunks”.

Similarly, a new try alias recognizes “robocop” to mean “run all the robocop test chunks”.

An example: https://tbpl.mozilla.org/?tree=Try&rev=e52bcf945dcd

tryaliases

How convenient!

(Of course, “-u xpcshell-1″, “-u robocop-2, robocop-3″, etc still work and you should use them if you only need to run specific chunks.)

Thanks to :Callek and :RyanVM for making this happen.

Firefox for Android Performance Measures – June check-up

Tags

, ,

My monthly review of Firefox for Android performance measurements. June highlights:

- Talos values tracked here switch to Android 4.0, rather than Android 2.2

- Talos regressions in tcheck2 and tsvgx

- small regression in time to throbber stop

- Eideticker still not reporting results.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Native Fennec. In all of my previous posts, this section has tracked Talos for Android 2.2 Opt. This month, and going forward, I switch to Android 4.0 Opt, since the Android 2.2 Opt tests are being phased out. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test is not currently run on Android 4.0.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck2

6 (start of period) – 12 (end of period)

Regression of June 17 – bug 1026742.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 50000 (end of period)

There was a large temporary regression between June 12 and June 14 – bug 1026798.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6100 (start of period) – 6300 (end of period).

Regression of June 16 – bug 1026551.

tp4m

Generic page load test. Lower values are better.

940 (start of period) – 940 (end of period).

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3600 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

 

throbstop

“Time to throbber start” looks very flat for all devices, but “Time to throbber stop” has a slight upward trend, especially for nexus-s-2 — bug 1032249.

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Eideticker results are still not available. We’ll check back at the end of July.

Firefox for Android Performance Measures – May check-up

Tags

,

My monthly review of Firefox for Android performance measurements. May highlights:

- slight regressions in tcanvasmark and trobopan

- small regression in time to throbber stop

- Eideticker still not reporting results.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Native Fennec (Android 2.2 opt). The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test runs the third-party CanvasMark benchmark suite, which measures the browser’s ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex. Results are a score “based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type”. Higher values are better.

Image

6300 (start of period) – 5700 (end of period).

Regression of May 12 – bug 1009646.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

9 (start of period) – 9 (end of period)

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

Image

110000 (start of period) – 130000 (end of period)

This regression just happened today and has not triggered a Talos alert yet — I don’t have a bug number yet.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

425 (start of period) – 425 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

7300 (start of period) – 7300 (end of period).

tp4m

Generic page load test. Lower values are better.

750 (start of period) – 750 (end of period).

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3600 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Image

Image

The improvement on May 2 was due to a change in the test setup (sut vs adb).

The small regression of May 11 is tracked in bug 1018463.

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Eideticker results are still not available. We’ll check back at the end of June.

Firefox for Android Performance Measures – April check-up

Tags

,

My monthly review of Firefox for Android performance measurements. April highlights:

- No Talos regressions, no Throbber Start/Stop regressions.

- tcheck2 improvement.

- Eideticker still not reporting results.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Native Fennec (Android 2.2 opt). The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test runs the third-party CanvasMark benchmark suite, which measures the browser’s ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex. Results are a score “based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type”. Higher values are better.

6300 (start of period) – 6300 (end of period).

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

Image

24 (start of period) – 9 (end of period)

Note significant improvement and noise reduction.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

110000 (start of period) – 110000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

425 (start of period) – 425 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

7300 (start of period) – 7300 (end of period).

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3600 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Image

Image

No regressions noted this month.

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Eideticker results are still not available. We’ll check back at the end of May.

Android 2.3 Opt tests on tbpl

Tags

,

Today, we started running some Android 2.3 Opt tests on tbpl:

Image

“Android 2.3 Opt” tests run on emulators running Android 2.3. The emulator is simply the Android arm emulator, taken from the Android SDK (version 18). The emulator runs a special build of Gingerbread (2.3.7), patched and built specifically to support our Android tests. The emulator is running on an aws ec2 host. Android 2.3 Opt runs one emulator at a time on a host (unlike the Android x86 emulator tests, which run up to 4 emulators concurrently on one ix host).

Android 2.3 Opt tests generally run slower than tests run on devices. We have found that tests will run faster on faster hosts; for instance, if we run the emulator on an aws m3.large instance (more memory, more cpu), mochitests run in about 1/3 of the time that they do currently, on m1.medium instances.

Reftests – plain reftests, js reftests, and crashtests – run particularly slowly. In fact, they take so long that we cannot run them to completion with a reasonable number of test chunks. We are investigating more and also considering the simple solution: running on different hosts.

We have no plans to run Talos tests on Android 2.3 Opt; we think there is limited value in running performance tests on emulators.

Android 2.3 Opt tests are supported on try — “try: -b o -p android …” You can also request that a slave be loaned to you for debugging more intense problems: https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave. In my experience, these methods – try and slave loans – are more effective at reproducing test results than running an emulator locally: The host seems to affect the emulator’s behavior in significant and unpredictable ways.

Once the Android 2.3 Opt tests are running reliably, we hope to stop the corresponding tests on Android 2.2 Opt, reducing the burden on our old and limited population of Tegra boards.

As with any new test platform, we had to disable some tests to get a clean run suitable for tbpl. These are tracked in bug 979921.

There are also a few unresolved issues causing infrequent problems in active tests. These are tracked in bug 967704.

Firefox for Android Performance Measures – March check-up

Tags

,

My monthly review of Firefox for Android performance measurements. March highlights:

- 3 throbber start/stop regressions

- Eideticker not reporting results for the last couple of weeks.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Native Fennec (Android 2.2 opt). The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test runs the third-party CanvasMark benchmark suite, which measures the browser’s ability to render a variety of canvas animations at a smooth framerate as the scenes grow more complex. Results are a score “based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type”. Higher values are better.

7200 (start of period) – 6300 (end of period).

Regression of March 5 – bug 980423 (disable skia-gl).

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

24 (start of period) – 24 (end of period)

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

110000 (start of period) – 110000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

375 (start of period) – 425 (end of period).

Regression of March 29 – bug 990101. (test modified)

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

7600 (start of period) – 7300 (end of period).

tp4m

Generic page load test. Lower values are better.

710 (start of period) – 750 (end of period).

No specific regression identified.

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3600 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

3 regressions were reported this month: bug 980757, bug 982864, bug 986416.

:bc continued his work on noise reduction in March. Changes in the test setup have likely affected the phonedash graphs this month. We’ll check back at the end of April.

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Eideticker results for the last couple of weeks are not available. We’ll check back at the end of April.

Follow

Get every new post delivered to your Inbox.