test_awsy_lite

Tags

,

Bug 1233220 added a new Android-only mochitest-chrome test called test_awsy_lite.html. Inspired by https://www.areweslimyet.com/mobile/, test_awsy_lite runs similar code and takes similar measurements to areweslimyet.com, but runs as a simple mochitest and reports results to Perfherder.

There are some interesting trade-offs to this approach to performance testing, compared to running a custom harness like areweslimyet.com or Talos.

+ Writing and adding a mochitest is very simple.

+ It is easy to report to Perfherder (see http://wrla.ch/blog/2015/11/perfherder-onward/).

+ Tests can be run locally to reproduce and debug test failures or irregularities.

+ There’s no special hardware to maintain. This is a big win compared to ad-hoc systems that might fail because someone kicks the phone hanging off the laptop that’s been tucked under their desk, or because of network changes, or failing hardware. areweslimyet.com/mobile was plagued by problems like this and hasn’t produced results in over a year.

? Your new mochitest is automatically run on every push…unless the test job is coalesced or optimized away by SETA.

? Results are tracked in Perfherder. I am a big fan of Perfherder and think it has a solid UI that works for a variety of data (APK sizes, build times, Talos results). I expect Perfherder will accommodate test_awsy_lite data too, but some comparisons may be less convenient to view in Perfherder compared to a custom UI, like areweslimyet.com.

– For Android, mochitests are run only on Android emulators, running on aws. That may not be representative of performance on real phones — but I’m hoping memory use is similar on emulators.

– Tests cannot run for too long. Some Talos and other performance tests run many iterations or pause for long periods of time, resulting in run-times of 20 minutes or more. Generally, a mochitest should not run for that long and will probably cause some sort of timeout if it does.

For test_awsy_lite.html, I took a few short-cuts, worth noting:

  •  test_awsy_lite only reports “Resident memory” (RSS); other measurements like “Explicit memory” should be easy to add;
  •  test_awsy_lite loads fewer pages than areweslimyet.com/mobile, to keep run-time manageable; it runs in about 10 minutes, using about 6.5 minutes for page loads.

Results are in Perfherder. Add data for “android-2-3-armv7-api9” or “android-4-3-armv7-api15” and you will see various tests named “Resident Memory …”, each corresponding to a traditional areweslimyet.com measurement.

perfh

Firefox for Android Performance Measures – Q4 Check-up

Tags

,

Highlights:

  •  now measuring APK size
  •  tcheck2 (temporarily) retired
  •  tsvgx and tp4m improved – thanks :jchen!

 

APK Size

This quarter we began tracking the size of the Firefox for Android APK, and some of its components. You can see the size of every build on treeherder using Perfherder.

Here’s how the APK size changed over the last 2 months, for mozilla-central Android 4.0 opt builds:

apksize

There are lots of increases and a few decreases here. The most significant decrease (almost half a megabyte) is on Nov 23, from mfinkle’s change for Bug 1223526. The most significant increase (~200K) is on Dec 20, from a Skia update, Bug 1082598.

It is worth noting that the sizes of libxul.so over the same period were almost always increasing:

libxul

Talos

This section tracks Perfherder graphs for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

We intend to retire the remaining Android Talos tests, migrating these tests to autophone in the very near future.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

This test is no longer running. It was noisy and needed to be rewritten for APZ. See discussion in bug 1213032 and bug 1230572.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

tsvg

730 (start of period) – 110 (end of period)

A small regression at the end of November corresponded with the introduction of APZ; it was investigated in bug 1229118. An extraordinary improvement on Dec 25 was the result of jchen’s refactoring.

tp4m

Generic page load test. Lower values are better.

tp4

730 (start of period) – 680 (end of period)

Note the same regression and improvement as seen in tsvgx.

Autophone

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

throbstop

Eideticker

Android tests are no longer run on Eideticker.

mozbench

These graphs are taken from the mozbench dashboard at http://ouija.allizom.org/grafana/index.html#/dashboard/file/mozbench.json which includes some comparisons involving Firefox for Android. More info at https://wiki.mozilla.org/Auto-tools/Projects/Mozbench.

bench1

Sadly, the other mobile benchmarks have no data for most of November and December…I’m not sure why.

Comparing Linux mochitest results across environments

Tags

,

A few weeks ago, I was trying to run Linux Debug mochitests in an unfamiliar environment and that got me to thinking about how well tests run on different computers. How much does the run-time environment – the hardware, the OS, system applications, UI, etc. – affect the reliability of tests?

At that time, Linux 64 Debug plain, non-e10s mochitests on treeherder – M(1) .. M(5) – were running well: Nearly all jobs were green. The most frequent intermittent failure was dom/html/test/test_fullscreen-api-race.html, but even that test failed only about 1 time in 10. I wondered, are those tests as reliable in other environments? Do intermittent failures reproduce with the same frequency on other computers?

Experiment: Borrow a test slave, run tests over VNC

I borrowed an aws test slave – see https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave – and used VNC to access the slave and run tests. I downloaded builds and test packages from mozilla-central and invoked run_tests.py with the same arguments used for the automated tests shown on treeherder.   To save time, I restricted my tests to mochitest-1, but I repeated mochitest-1 10 times. All tests passed all 10 times. Additional runs produced intermittent failures, like test_fullscreen-api-race, with approximately the same frequency reported by Orange Factor for recent builds. tl;dr Treeherder results, including intermittent failures, for mochitests can be reliably reproduced on borrowed slaves accessed with VNC.

Experiment: Run tests on my laptop

Next I tried running tests on my laptop, a ThinkPad w540 running Ubuntu 14. I downloaded the same builds and test packages from mozilla-central and invoked run_tests.py with the same arguments used for the automated tests shown on treeherder. This time I noticed different results immediately: several tests in mochitest-1 failed consistently. I investigated and tracked down some failures to environmental causes: essential components like pulseaudio or gstreamer not installed or not configured correctly. Once I corrected those issues, I still had a few permanent test failures (like dom/base/test/test_applet_alternate_content.html, which has no bugs on file) and very frequent intermittents (like dom/base/test/test_bug704320_policyset.html, which is decidedly low-frequency in Orange Factor). I also could not reproduce the most frequent mochitest-1 intermittents I found on Orange Factor and reproduced earlier on the borrowed slave. An intermittent failure like test_fullscreen-api-race, which I could generally reproduce at least once in 10 to 20 runs on a borrowed slave, I could not reproduce at all in over 100 runs on my laptop. (That’s 100 runs of the entire mochitest-1 job. I also tried running specific tests or specific directories of tests up to 1000 times, but I still could not reproduce the most common intermittent failures seen on treeherder.) tl;dr Intermittent failures seen on treeherder are frequently impossible to reproduce on my laptop; some failures seen on my laptop have never been reported before.

Experiment: Run tests on a Digital Ocean instance

Digital Ocean offers virtual servers in the cloud, similar to AWS EC2. Digital Ocean is of interest because rr can be used on Digital Ocean but not on aws. I repeated my test runs, again with the same methodology, on a Digital Ocean instance set up earlier this year for Orange Hunter.

My experience on Digital Ocean was very similar to that on my own laptop. Most tests pass, but there are some failures seen on Digital Ocean that are not seen on treeherder and not seen on my laptop, and intermittent failures which occur with some frequency on treeherder could not be reproduced on Digital Ocean.

tl;dr Intermittent failures seen on treeherder are frequently impossible to reproduce on Digital Ocean; some failures seen on Digital Ocean have never been reported before; failures on Digital Ocean are also different (or of different frequency) from those seen on my laptop.

 

I found it relatively easy to run Linux Debug mochitests in various  environments in a manner similar to the test jobs we see on treeherder. Test results were similar to treeherder, in that most tests passed. That’s all good, and expected.

However, test results often differed in small but significant ways across environments and I could not reproduce most frequent intermittent failures seen on treeherder and tracked in Orange Factor. This is rather discouraging and the cause of the concern mentioned in my last post: While rr appears to be an excellent tool for recording and replaying intermittent test failures and seems to have minimal impact on the chances of reproducing an intermittent failure, rr cannot be run on the aws instances used to run Firefox tests in continuous integration, and it seems difficult to reproduce many intermittent test failures in different environments. (I don’t have a good sense of why this is: Timing differences, hardware, OS, system configuration?)

If rr could be run on aws, all would be grand: We could record test runs in aws with excellent chances of reproducing and recording intermittent test failures and could make those recordings available to developers interested in debugging the failures. But I don’t think that’s possible.

We had hoped that we could run tests in another environment (Digital Ocean) and observe the same failures seen on aws and reported in treeherder, but that doesn’t seem to be the case.

Another possibility is bug 1226676: We hope to start running Linux tests in a docker container soon. Once that’s working, if rr can be run in the container, perhaps intermittent failures will behave the same way and can be reproduced and recorded.

Recording and replaying mochitests with rr and mach

Tags

, , ,

rr is a lightweight debugging tool that allows program execution to be recorded and subsequently replayed and debugged. gdb-based debugging of recordings is enhanced by reverse execution.

rr can be used to record and replay Firefox and Firefox tests on Linux. See https://github.com/mozilla/rr/wiki/Recording-Firefox. If you have rr installed and have a Linux Debug build of Firefox handy, recording a mochitest is as simple as:

  ./mach mochitest --debugger=rr ...

For example, to record a single mochitest:

  ./mach mochitest testing/mochitest/tests/Harness_sanity/test_sanitySimpletest.html \
    --keep-open=false --debugger=rr

Even better, use –run-until-failure to repeat the mochitest until an intermittent failure occurs:

  ./mach mochitest testing/mochitest/tests/Harness_sanity/test_sanitySimpletest.html \
    --keep-open=false --run-until-failure --debugger=rr

To replay and debug the most recent recording:

  rr replay

Similar techniques can be applied to reftests, xpcshell tests, etc.

For a fun and simple experiment, you can update a test to fail randomly, maybe based on Math.random(). Run the test in a loop or with –run-until-failure to reproduce your failure, then replay: Your “random” failure should occur at exactly the same point in execution on replay.

In recent weeks, I have run many mochitests on my laptop in rr, hoping to improve my understanding of how well rr can record and replay intermittent test failures.

rr has some, but only a little, effect on test run-time. I can normally run mochitest-1 via mach on my laptop in about 17 minutes; with rr, that increases to about 22 minutes (130% of normal). That’s consistent with :roc’s observations at http://robert.ocallahan.org/2015/11/even-more-rr-replay-performance.html.

I observed no difference in test results, when running on my laptop: the same tests passed and failed with or without rr, and intermittent failures occurred with approximately the same frequency with or without rr. (This may not be universal; others have noted differences: https://mail.mozilla.org/pipermail/rr-dev/2015-December/000310.html.)

So my experience with rr has been very encouraging: If I can reproduce an intermittent test failure on my laptop, I can record it with rr, then debug it at my leisure and benefit from rr “extras” like reverse execution. This seems great!

I still have a concern about the practical application of rr to recording intermittent failures reported on treeherder…I’ll try to write a follow-up post on that soon.

Running and debugging Firefox for Android with mach

Tags

, ,

Recent updates to mach provide support for running and debugging Firefox for Android.

When run from a Firefox for Android context, ‘mach run’ starts Firefox on a connected Android device. As with other Android mach commands, if no device is found, mach offers to start an emulator, and if Firefox is not installed, mach offers to install it.

gbrown@mozpad:~/src$ ./mach run
No Android devices connected. Start an emulator? (Y/n) y 
Starting emulator running Android 4.3...
It looks like Firefox is not installed on this device.
Install Firefox? (Y/n) y
Installing Firefox. This may take a while...
 1:22.97 /usr/bin/make -C . -j8 -s -w install
 1:32.04 make: Entering directory `/home/gbrown/objdirs/droid'
 1:47.48 2729 KB/s (42924584 bytes in 15.358s)
 1:48.22     pkg: /data/local/tmp/fennec-45.0a1.en-US.android-arm.apk
 2:05.97 Success
 2:06.34 make: Leaving directory `/home/gbrown/objdirs/droid'
Starting: Intent { act=android.activity.MAIN cmp=org.mozilla.fennec_gbrown/.App }

Parameters can be passed to Firefox on the command line. For example, ‘mach run –guest’ starts Firefox in guest mode.

mach also supports gdb-based debugging with JimDB, :jchen’s celebrated fork of gdb for Firefox for Android. ‘mach run –debug’ starts JimDB. If necessary, mach will even fetch, install, and configure JimDB for you.

  $ ./mach run --debug
  JimDB (arm) not found: /home/gbrown/.mozbuild/android-device/jimdb-arm does not exist
  Download and setup JimDB (arm)? (Y/n) y
  Installing JimDB (linux64/arm). This may take a while...
  From https://github.com/darchons/android-gdbutils
   * [new branch]      master     -> origin/master
   * [new tag]         gdbutils-2 -> gdbutils-2
   * [new tag]         initial-release -> initial-release
   1:45.57 /home/gbrown/.mozbuild/android-device/jimdb-arm/bin/gdb -q --args 

  Fennec GDB utilities
    (see utils/gdbinit and utils/gdbinit.local on how to configure settings)
  1. Debug Fennec (default)
  2. Debug Fennec with env vars and args
  3. Debug using jdb
  4. Debug content Mochitest
  5. Debug compiled-code unit test
  6. Debug Fennec with pid
  Enter option from above: 1

  New ADB device is "emulator-5554"
  Using device emulator-5554
  Using object directory: /home/gbrown/objdirs/droid
  Set sysroot to "/home/gbrown/.mozbuild/android-device/jimdb-arm/lib/emulator-5554".
  Updated solib-search-path.
  Ignoring BHM signal.
  Using package org.mozilla.fennec_gbrown.
  Launching org.mozilla.fennec_gbrown... Done
  Attaching to pid 674... Done
  Setting up remote debugging... Done

  Ready. Use "continue" to resume execution.
  : No such file or directory.
  (gdb)

See https://wiki.mozilla.org/Mobile/Fennec/Android/GDB for more info on JimDB.

More enhancements to mach test commands for Android

Tags

, ,

As I wrote in my last post, using mach to test Firefox for Android in an emulator simplifies the testing process and removes the need to connect a physical phone or tablet. Similarly, mach now looks out for and offers to “fix” some other common Android-specific complications.

The first complication is Firefox itself. “Browser tests” like mochitests and reftests run inside Firefox. On Android, that means that Firefox for Android must be installed on your device. When using a phone or tablet, you can connect it by usb, and use “mach install” to install Firefox. But you might forget — I know I forget all the time and then wonder, why didn’t my tests run?! Also, if you are running an emulator automatically from a mach test command, you may not have a chance to install Firefox. So now mach test commands that require Firefox for Android check to see if it is installed; if it isn’t, mach prompts you to install Firefox from your local build.

Another complication is the “host utilities” required for most test types on Android. Many tests make requests from Firefox (running on the Android device) back to a web server running on the local computer – the test “host”. The test harnesses automatically start that web server for you, but they need to run executables like xpcshell and ssltunnel to do so. These host utilities must run on your computer (the host driving the tests via mach and the test harnesses) rather than on Android. Your Android build probably has xpcshell and ssltunnel, but they are Android executables and will not run on the Linux or OSX that’s probably running on your host. You can set the MOZ_HOST_BIN environment variable to point to utilities suitable for your host (a desktop Firefox build will do), but if you neglect to set MOZ_HOST_BIN, mach will notice and prompt you to set up ready-made utilities that can be downloaded (for Linux or OSX only).

Putting it all together, if nothing is set up and all these components are needed, you might see something like this:

gbrown@mozpad:~/src$ ./mach robocop testLoad
No Android devices connected. Start an emulator? (Y/n) y
Fetching AVD. This may take a while...
Starting emulator running Android 4.3...
It looks like Firefox is not installed on this device.
Install Firefox? (Y/n) y
Installing Firefox. This may take a while...
Host utilities not found: environment variable MOZ_HOST_BIN is not set to a directory containing host xpcshell
Download and setup your host utilities? (Y/n) y
Installing host utilities. This may take a while...

…and then your tests will run!

Some people are concerned about all this prompting; they suggest just going ahead and doing the necessary steps rather than waiting for these Y/n questions to be answered. I see the appeal, but there are consequences. For example, you may have simply forgotten to connect your physical device and have no desire to download an AVD and run an emulator. Overall, I think it is best to prompt and it is easy to avoid most prompts if you wish:

mach android-emulator && mach install && mach <your test>

Happy testing!

Running Firefox for Android tests in the emulator

Tags

, ,

Recent enhancements to mach test commands make it easier than ever to run tests of all sorts – mochitests, reftests, robocop, etc – against Firefox for Android.

One of the barriers to running tests on an Android device is the device itself. You can run tests on an Android phone or tablet as long as it meets certain requirements:

  • Firefox is supported on your Android version
  • the device is connected via usb
  • the device is visible to “adb devices”
  • the adb shell has sufficient privileges for the test being run: some test types require root permissions to copy files to certain locations or for other privileged operations
  • for some test types (like mochitests), the device must be able to connect back to a web server running on your computer, so your phone typically needs a wifi connection and must be on the same network as your computer.

In my experience, most Android devices work just fine: Connect a phone or tablet by usb, check that adb can see it, check that the phone is connected to wifi, and all is well. But there are several places where things can go wrong. And of course, not everyone has an Android phone available for testing. Running tests in an Android emulator resolves several of these concerns.

Now when you run a mach test command from a Firefox for Android build environment and there is no adb-visible device connected, mach will offer to start an emulator for you:

gbrown@mozpad:~/src$ ./mach robocop testLoad
No Android devices connected. Start an emulator? (Y/n) y
Fetching AVD. This may take a while...
Starting emulator running Android 4.3...

mach will search for the emulator first in the build config location, then in the Android SDK (as found by the ANDROID_SDK_ROOT environment variable), and finally in the “mach bootstrap” cache (under .mozbuild).

Once the emulator is found, mach will then download an Android image (an “AVD”, an Android Virtual Device, including kernel and system images) from Mozilla tooltool servers. The image is selected from those we use for tests on treeherder:

  • mach downloads the Android 4.2 x86 image if your build configuration is for x86 (based on TARGET_CPU); otherwise, it selects one of the arm images:
    • mach downloads the Android 2.3 arm image if your configuration supports Android SDK 9 (MOZ_ANDROID_MIN_SDK_VERSION);
    • mach downloads the Android 4.3 arm image for all other cases.

The AVD download can take a few minutes — those are big files. Don’t worry, they are cached (in your .mozbuild directory) so that subsequent runs are much faster. Once the AVD is downloaded, the emulator is launched and as soon as Android has booted, mach will start the requested tests.

You can also start the emulator directly with “mach android-emulator” and even select an image version; for example:

mach android-emulator --version 2.3

See “mach help android-emulator” for more details.

Once an emulator is started, all mach test commands will recognize that device and use it automatically.

Happy testing!

More Android mach test commands

Tags

, ,

Many of your favourite desktop mach test commands now work with Firefox for Android.

These commands now explicitly support Firefox for Android, running tests against a connected Android device or emulator when run from a Firefox for Android build environment:

 mach robocop
 mach mochitest
 mach reftest
 mach crashtest
 mach jstestbrowser
 mach xpcshell-test
 mach cppunittest

Note that most commands allow running the entire suite (rarely a good idea!), a directory or manifest, or a single test.

As usual, detailed information on testing Firefox for Android can be found at https://wiki.mozilla.org/Mobile/Fennec/Android/Testing.

Firefox for Android Performance Measures – Q3 Check-up

Tags

,

I am a little late with the Q3 check-up – sorry!

This review of Android performance measurements covers the period July 1 – September 30: the third quarter of 2015.

Highlights:

– tcheck2 noisy – bug 1213032

– most test results are fairly steady over the quarter

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

We intend to retire the remaining Android Talos tests, migrating these tests to autophone in the near future.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

check2

20 (start of period) – 4 (end of period)

There is significant noise in recent results – bug 1213032 – but overall there is much less checkerboarding.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

svg

680 (start of period) – 730 (end of period)

Minor regression but fairly consistent results over the period.

tp4m

Generic page load test. Lower values are better.

tp4

670 (start of period) – 670 (end of period)

Very steady results over the period.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

Slight improvement in time to throbber start.

throbstop

Slight regression in time to throbber stop, especially for nexus-s-4.

Eideticker

Android tests are no longer run on Eideticker.

mozbench

These graphs are taken from the mozbench dashboard at http://ouija.allizom.org/grafana/index.html#/dashboard/file/mozbench.json which includes some comparisons involving Firefox for Android. More info at https://wiki.mozilla.org/Auto-tools/Projects/Mozbench.

bench1

bench2

bench3

bench4

mochitest-chrome tests for Android

Tags

,

Support for mochitest-chrome tests on Android was recently improved and mochitest-chrome tests can now be seen on treeherder (the M(c) job) for all Firefox for Android builds. Most mochitest-chrome tests are desktop-specific or cross-platform; most Firefox for Android tests have been written for Robocop, our Robotium-based Android UI test framework.

I noticed that some Robocop tests were implemented almost entirely in Javascript and could easily be converted to mochitest-chrome, where tests would run much more efficiently. Bug 1184186 converted about 20 such Robocop tests to mochitest-chrome, reducing our Robocop load by about 30 minutes while increasing mochitest-chrome load by only about 3 minutes. (We save time by not starting and stopping the browser between mochitests, and not waiting around for state changes, as frequently required in UI tests.)

The “new” mochitest-chrome tests are all located in mobile/android/tests/browser/chrome and can also be run locally from mach. Just make sure you have:

  • a Firefox for Android build
  • an adb-connected device or emulator
  • MOZ_HOST_BIN set to a location for desktop xpcshell

Here is a screen recording showing the new tests in action: http://people.mozilla.org/~gbrown/mochichrome-screencast.mp4

Want to write your own mochitest-chrome test for Firefox for Android? Be sure to see https://developer.mozilla.org/en/docs/Mochitest#Writing_tests for general advice. If you are looking for an example, http://hg.mozilla.org/mozilla-central/file/3e51753a099f/mobile/android/tests/browser/chrome/test_app_constants.html is one of the simplest tests — a great place to start.

As always, be sure to see https://wiki.mozilla.org/Mobile/Fennec/Android/Testing for detailed information about running tests for Firefox for Android, and ping me on irc if you run into any trouble.

Follow

Get every new post delivered to your Inbox.