mochitest-chrome tests for Android

Tags

,

Support for mochitest-chrome tests on Android was recently improved and mochitest-chrome tests can now be seen on treeherder (the M(c) job) for all Firefox for Android builds. Most mochitest-chrome tests are desktop-specific or cross-platform; most Firefox for Android tests have been written for Robocop, our Robotium-based Android UI test framework.

I noticed that some Robocop tests were implemented almost entirely in Javascript and could easily be converted to mochitest-chrome, where tests would run much more efficiently. Bug 1184186 converted about 20 such Robocop tests to mochitest-chrome, reducing our Robocop load by about 30 minutes while increasing mochitest-chrome load by only about 3 minutes. (We save time by not starting and stopping the browser between mochitests, and not waiting around for state changes, as frequently required in UI tests.)

The “new” mochitest-chrome tests are all located in mobile/android/tests/browser/chrome and can also be run locally from mach. Just make sure you have:

  • a Firefox for Android build
  • an adb-connected device or emulator
  • MOZ_HOST_BIN set to a location for desktop xpcshell

Here is a screen recording showing the new tests in action: http://people.mozilla.org/~gbrown/mochichrome-screencast.mp4

Want to write your own mochitest-chrome test for Firefox for Android? Be sure to see https://developer.mozilla.org/en/docs/Mochitest#Writing_tests for general advice. If you are looking for an example, http://hg.mozilla.org/mozilla-central/file/3e51753a099f/mobile/android/tests/browser/chrome/test_app_constants.html is one of the simplest tests — a great place to start.

As always, be sure to see https://wiki.mozilla.org/Mobile/Fennec/Android/Testing for detailed information about running tests for Firefox for Android, and ping me on irc if you run into any trouble.

Firefox for Android Performance Measures – Q2 Check-up

Tags

,

This review of Android performance measurements covers the period April 1 – June 30: the second quarter of 2015. I will write these summary posts on a quarterly basis from now on.

Highlights:

– Most tests fairly steady over the quarter.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck2

19 (start of period) – 20 (end of period)

Small regression on June 10 (no bug?). This test is exhibiting some inconsistent behavior, as noted in bug 1149567.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

svg

720 (start of period) – 680 (end of period).

Small improvement on May 27.

tp4m

Generic page load test. Lower values are better.

tp4

680 (start of period) – 670 (end of period).

Small improvement on May 18.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Unfortunately, I could not find any devices for mozilla-central which reported consistently throughout this period.

throb1

throb2

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide4

Most other tests have incomplete data for this time period.

mozbench

These graphs are taken from the mozbench dashboard at http://ouija.allizom.org/grafana/index.html#/dashboard/file/mozbench.json which includes some comparisons involving Firefox for Android. More info at https://wiki.mozilla.org/Auto-tools/Projects/Mozbench.

massive

webaudio

kraken

chalkboard

Handling intermittent test timeouts in long-running tests

Tags

, ,

Tests running on our new-ish Android 4.3 Opt emulator platform have recently been plagued by intermittent timeouts and I have been having a closer look at some of them (like bug 919246 and bug 1154505) .

A few of these tests normally run “quickly”. Think of a test that runs to completion in under 10 seconds most of the time but times out after 300+ seconds intermittently. In a case like this, it seems likely that there is an intermittent hang and the test needs debugging to determine the underlying cause.

But most of the recent Android 4.3 Opt test timeouts seem to be affecting what I classify as “long-running” tests. Think of a test that normally runs to completion in 250 to 299 seconds, but intermittently times out after 300 seconds. It seems likely that normal variations in test duration are intermittently pushing past the timeout threshold; if we can tolerate a longer time-out, or make the test run faster in general, we can probably eliminate the intermittent test failure.

We have a lot of options for dealing with long-running tests that sometimes timeout.

Option: Simplify or optimize the test

Long-running tests are usually doing a lot of work. A lot of assertions can be run in 300 seconds, even on a slow platform! Do we need to test all of those cases, or could some be eliminated? Is there some setup or tear down code being run repeatedly that could be run just once, or even just less often?

We usually don’t worry about optimizing tests but sometimes a little effort can help a test run a lot more efficiently, saving test time, money (think aws costs), and aggravation like intermittent time-outs.

Option: Split the test into 2 or more smaller tests

Some tests can be split into 2 or more smaller tests with minimal effort. Instead of testing 100 different cases in one test, we may be able to test 50 in each. There may be some loss of efficiency: Maybe some setup code will need to be run twice, and copied and pasted to the second test. But now each half runs faster, reducing the chance of a timeout. And when one test fails, the cause is – at least slightly – more isolated.

Option: Request a longer timeout for the test

Mochitests can call SimpleTest.requestLongerTimeout(2) to double the length of the timeout applied to the test. We currently have about 100 mochitests that use this feature.

For xpcshell tests, the same thing can be accomplished with a manifest annotation:

[your-test]
requesttimeoutfactor = 2

That’s a really simple “fix” and an effective way of declaring that a test is known to be long-running.

On the other hand, it is avoiding the problem and potentially covering up an issue that could be solved more effectively by splitting, optimizing, or simplifying. Also, long-running tests make our test job “chunking” less effective: It’s harder to split load evenly amongst jobs when some tests run 100 times longer than others.

Option: Skip the test on slow platforms

Sometimes it’s not worth the effort. Do we really need to run this test on Android as well as on all the desktop platforms? Do we get value from running this test on both Android 2.3 and Android 4.3? We may “disable our way to victory” too often, but this is a simple strategy, doesn’t affect other platforms and sometimes it feels like the right thing to do.

Option: Run on faster hardware

This usually isn’t practical, but in special circumstances it seems like the best way forward.

If you have a lot of timeouts from long-running tests on one platform and those tests don’t timeout on other platforms, it may be time to take a closer look at the platform.

Our Android arm emulator test platforms are infamous for slowness. In fairness, the emulator has a lot of work to do, Firefox is complex, our tests are often relentless (compared to human-driven browsing), and we normally run the emulator on the remarkably slow (and cheap!) m1.medium AWS instances.

If we are willing to pay for better cpu, memory, and I/O capabilities, we can easily speed up the emulator by running on a faster AWS instance type — but the cost must be justified.

I recently tried running Android 4.3 Debug mochitests on m1.medium and found that many tests timed out. Also, since all tests were taking longer, each test job (each “chunk”) needed 2 to 3 hours to complete — much longer than we can wait. Increasing chunks seemed impractical (we would need 50 or so) and we would still have all those individual timeouts to deal with. In this case, running the emulator on c3.xlarge instances for Debug mochitests made a big difference, allowing them to run in the same number of chunks as Opt on m1.medium and eliminating nearly all timeouts.

I’ve enjoyed investigating mochitest timeouts and found most of them to be easy to resolve. I’ll try to investigate more timeouts as I see them. Won’t you join me?

Android 4.3 Opt tests running on trunk trees

Tags

, ,

Beginning today, “Android 4.3 API11+ opt” unit tests are running on treeherder on all trunk trees. These tests run against our standard Firefox for Android API11+ arm builds, but run in an Android arm emulator running Android 4.3. Just like the existing Android 2.3 tests, the emulator for 4.3 runs on an aws instance.

The emulator environment has these characteristics:

  • Android 4.3 (AOSP 4.3.1_r1, JLS36I); standard 2.6.29 kernel
  • 1 GB of memory (not quite fully utilized because the kernel does not support CONFIG_HIGHMEM)
  • 720×1280, 320 dpi screen
  • 128 MB VM heap
  • 600 MB /data and 600 MB /sdcard partitions
  • front and back emulated cameras; all emulated sensors
  • standard crashreporter, logcat, anr, and tombstone support

This Android 4.3 emulator environment is very much like the existing “Android 2.3 API9 opt” environment. Broadly, tests seem to run in about the same amount of time on 4.3 as on 2.3 and we see some of the same failures on 4.3 as on 2.3. One significant difference between the 4.3 and 2.3 environments is the “device manager” used to communicate between the test harnesses and the device. On Android 2.3, sutagent is installed on the device and a custom tcp protocol is used to push/pull files, start processes, etc; on Android 4.3, sutagent is not used at all (it doesn’t play well with SELinux security) and adb is used instead.

Android 4.3 API11+ opt tests are available on try and run as a consequence of:

try: -b o -p android-api-11 -u …

As Android 4.3 API11+ opt tests have been introduced, the corresponding “Android 4.0 API11+ opt” test jobs have been disabled. Android 4.0 tests were running on our aging Pandaboards; running in the emulator on aws is more scalable, cost-effective, and future-safe.

Android 4.0 API11+ debug tests continue to run, but we plan to migrate those to the 4.3 emulator soon.

A few Android 4.0 API11+ opt Talos tests continue to run. We are evaluating whether those can be replaced by similar Autophone tests.

As with the introduction of any new test platform, some tests failed on Android 4.3 and had to be disabled or marked as failing on 4.3. Corresponding bugs have been opened for these tests and you can find them by looking for bugs with “[test disabled on android 4.3]” on the whiteboard:

https://bugzilla.mozilla.org/buglist.cgi?list_id=12205056&resolution=—&resolution=FIXED&resolution=INVALID&resolution=WONTFIX&resolution=DUPLICATE&resolution=WORKSFORME&resolution=INCOMPLETE&resolution=SUPPORT&resolution=EXPIRED&resolution=MOVED&status_whiteboard_type=allwordssubstr&query_format=advanced&status_whiteboard=disabled%20on%204.3&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED

Great thanks to everyone who has contributed to the 4.3 effort, but especially :kmoir for all the Release Engineering work to get everything running smoothly in continuous integration.

Looking for more details about the new environment? Have a look at:

Bug 1062365 Investigate Android 4.3/4.4 emulator test setup

Bug 1133833 Android 4.3 emulator tests

…or ask me!

mach support for mochitest-plain and mochitest-chrome on Android

Tags

, ,

We recently added mach commands for running mochitest-plain and mochitest-chrome tests on Android. For now, these commands only support the adb device manager (no way to run with sutagent) and there is minimal support for mochitest options available on desktop; see/comment on bug 1152944.

The old make targets continue to work, but should be considered deprecated.

See https://wiki.mozilla.org/Mobile/Fennec/Android#mochitest-plain / https://wiki.mozilla.org/Mobile/Fennec/Android#mochitest-chrome for detailed instructions.

Firefox for Android Performance Measures – February/March Check-up

Tags

,

I skipped my regular February post and I am thinking of writing up these performance summaries less frequently — maybe every 2 or 3 months. Any objections?

Highlights:

– Talos trobopan, tprovider, and ts tests have been retired.

– Big improvement in tsvgx.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 O. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

18.7 (start of period) – 19.0 (end of period)

trobopan

This test is no longer run.

tprovider

This test is no longer run.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

svg

3500 (start of period) – 720 (end of period).

Big improvements Feb 26 and March 6. The March 6 improvement seems to have been caused by a Talos update.

tp4m

Generic page load test. Lower values are better.

710 (start of period) – 680 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start seems steady this month.

throbber-start

Time to throbber stop has some small improvements and regressions.

throbber-stop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

eide4

mozbench

I recently discovered the mozbench dashboard at http://ouija.allizom.org/grafana/index.html#/dashboard/file/mozbench.json which includes some comparisons involving Firefox for Android. More info at https://wiki.mozilla.org/Auto-tools/Projects/Mozbench.

massive

webaudio

kraken

chalkboard

Complete logcats for Android tests (updated for 2015)

Tags

,

I described Android test “complete logcats” last year in https://gbrownmozilla.wordpress.com/2014/02/12/complete-logcats-for-android-tests/, but a few details have changed since then, so here’s a re-write!

“Logcats” – those Android logs you see when you execute “adb logcat” – are an essential part of debugging Firefox for Android. We include logcats in our Android test logs on treeherder: After a test run, we run logcat on the device, collect the output and dump it to the test log. Sometimes those logcats are very useful; other times, they are too little, too late. A typical problem is that a failure occurs early in a test run, but does not cause the test to fail immediately; by the time the test ends, the fixed-size logcat buffer has filled up and overwritten the earlier, important messages. How frustrating!

All Android test jobs also offer “complete logcats”: logcat is run for the duration of the test job, the output is collected continuously, and dumped to a file. At the end of the test job, the file is uploaded to an aws server, and a link is displayed in treeherder. Here’s a sample of a treeherder summary, from the bottom right (you may need to resize or scroll to see the whole thing):

logcat

Notice the “artifact uploaded logcat.log” line? Open that link and you have a complete logcat showing what was happening on the device for the duration of the test job.

We have not changed the “old” logcat features in test logs: We still run logcat at the end of most jobs and dump the output to the test log. That might be more convenient in some cases.

Happy test debugging!

More memory for the Android emulator

Tags

In bug 1062365, I have been trying to get our Firefox for Android tests running for a new platform, Android 4.4. Like the Android 2.3 tests, these will run on the Android arm emulator, hosted on aws; they will just be run on the much newer Android 4.4.

I recently noticed that some 4.4 test failures were related to Firefox for Android’s low memory handling: To allow for operation on low memory devices, some Firefox features behave differently when Android reports that the total memory on the device is below a certain threshold.

But my test failures were quite unexpected: I was configuring the emulator AVD with more than 1 GB of memory — much higher than the “low memory” threshold. What was going on?

Firefox for Android’s notion of a “low memory” vs a “high memory” device is based on the MemTotal reported in /proc/meminfo. If you configure an Android emulator to run Android 4.4, request a RAM size of 256 MB and then cat /proc/meminfo, you will see a MemTotal of approximately 256 MB:

MemTotal:         253904 kB

If you modify your emulator avd configuration to request a RAM size of 512 MB and cat /proc/meminfo, you will see a MemTotal of approximately 512 MB. That seems reasonable and expected.

But if you request 1024 MB, you will see:

MemTotal:         765372 kB

What!?

If you request 2048 MB, you will still see:

MemTotal:         765372 kB

You might be tempted to try setting the “-memory” command line option when starting the emulator, or passing “-qemu -m …” to request more memory in qemu, but none of that will help: Android will stubbornly refuse to acknowledge anything more than 765372 kB of memory.

Thankfully dmesg provided a clue about the missing memory:

Truncating RAM at 00000000-7fffffff to -2f7fffff (vmalloc region overlap).
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
      .text : 0xc0008000 - 0xc044a190   (4361 kB)
      .init : 0xc044b000 - 0xc0470000   ( 148 kB)
      .data : 0xc0470000 - 0xc04a8fc0   ( 228 kB)
       .bss : 0xc04a9000 - 0xc05f33c8   (1321 kB)

In the words of http://www.embedded-bits.co.uk/2011/vmalloc-region-overlap/, this is the kernel’s way of saying “I understand there may be some RAM here – but I’m not going to use it all”. That article gives a great description of what is happening and possible ways of accessing the extra RAM — I won’t regurgitate all the details here. It suggests that one way of using additional memory is to use a kernel configured with CONFIG_HIGHMEM.

I checked some Android kernels and found that several are normally built with CONFIG_HIGHMEM — but not the “goldfish” kernel, used in the Android emulator. I rebuilt the goldfish kernel, with CONFIG_HIGHMEM defined and installed the new kernel in my avd. Now, requesting 1536 MB in the avd configuration, I see:

MemTotal:         1537336 kB

Perfect!

How much memory should we configure the emulator to use when running Firefox tests? That’s an open question still. 765 MB seemed restrictive and out of step with the RAM found on many modern Android devices, so I’m glad to have a choice now, and interested in determining if the additional memory will help the performance of our tests, or resolve any known failures.

Great thanks to :snorp for helping me track down my “missing memory” and understand all of this!

The curious will find more context in bug 1128548.

Firefox for Android Performance Measures – January Check-up

Tags

,

My monthly review of Firefox for Android performance measurements for January 2015.

Highlights:

– No significant Talos regressions or improvements.

– Improvements in autophone’s “Time to throbber stop”.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 O. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck2

18.2 (start of period) – 18.7 (end of period)

Minor regression January 12 – bug 1122012.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

62000 (start of period) – 70000 (end of period)

Significant noise noted; no specific regression.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

5900 (start of period) – 5900 (end of period).

tp4m

Generic page load test. Lower values are better.

tp4

855 (start of period) – 870 (end of period).

Regression of January 15 may be reversed January 28 – no bug?

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start seems steady this month.

throbberstart

Time to throbber stop has multiple small improvements.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

eide4

Those all look pretty steady to me, but there’s lots more to explore — be sure to check out eideticker for yourself!

Firefox for Android Performance Measures – 2014 in Review

Tags

,

Let’s review our performance measures for 2014.

Highlights:

– regressions in 2014 for tcheck2, trobopan, and tspaint.

– improvements in tprovider, tsvgx, and tp4m.

– overall regressions in time to throbber start and stop.

– recent checkerboard regressions apparent on Eideticker.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck

Jan 2014: 4.7

Dec 2014: 18.2

This test seems to be one of our most frequently regressing tests. We had some good improvements this year, but overall we end the year significantly regressed from where we started. Silver lining: Test results are much less noisy now than they have been all year!

(For details on the December regression, see bug 1111565 / bug 1097318).

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

tpan

Jan 2014: 28000

Dec 2014: 62000

Again, there are some wins and losses over the year but we end the year significantly regressed. There is a lot of noise in the results.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

tprovider

Jan 2014: 560

Dec 2014: 520

Very steady performance here with a slight improvement in April carrying through to the end of the year.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

tsvg

Jan 2014: 6150

Dec 2014: 5900

This is great — we’re seeing the best performance of the year.

tp4m

Generic page load test. Lower values are better.

tp4

Jan 2014: 970

Dec 2014: 855

Wow, even better. This tells me someone out there really cares about our page load performance.

ts_paint

Startup performance test. Lower values are better.

tspaint

Jan 2014: 3700

Dec 2014: 4100

You can’t win them all? It feels like we’re slowly losing ground here.

Note that this test is currently hidden on treeherder; it fails very often – bug 1112697.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

I could not get cohesive phonedash graphs for the entire year, since we made so many changes to autophone over the year, but here are views for the last 6 months. It looks like we have some work to do on time to throbber start. Time to throbber stop is better, but we have lost ground there too.

throbstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Again, I couldn’t generate good graphs for the whole year, but here are some for the last 3 months.

eide1

eide2

eide3

Eideticker startup tests seem to be performing well.

eide4

eide5

But we’ve had some recent regressions in checkerboarding.

Happy New Year!

Follow

Get every new post delivered to your Inbox.