Android 4.3 Opt tests running on trunk trees

Tags

, ,

Beginning today, “Android 4.3 API11+ opt” unit tests are running on treeherder on all trunk trees. These tests run against our standard Firefox for Android API11+ arm builds, but run in an Android arm emulator running Android 4.3. Just like the existing Android 2.3 tests, the emulator for 4.3 runs on an aws instance.

The emulator environment has these characteristics:

  • Android 4.3 (AOSP 4.3.1_r1, JLS36I); standard 2.6.29 kernel
  • 1 GB of memory (not quite fully utilized because the kernel does not support CONFIG_HIGHMEM)
  • 720×1280, 320 dpi screen
  • 128 MB VM heap
  • 600 MB /data and 600 MB /sdcard partitions
  • front and back emulated cameras; all emulated sensors
  • standard crashreporter, logcat, anr, and tombstone support

This Android 4.3 emulator environment is very much like the existing “Android 2.3 API9 opt” environment. Broadly, tests seem to run in about the same amount of time on 4.3 as on 2.3 and we see some of the same failures on 4.3 as on 2.3. One significant difference between the 4.3 and 2.3 environments is the “device manager” used to communicate between the test harnesses and the device. On Android 2.3, sutagent is installed on the device and a custom tcp protocol is used to push/pull files, start processes, etc; on Android 4.3, sutagent is not used at all (it doesn’t play well with SELinux security) and adb is used instead.

Android 4.3 API11+ opt tests are available on try and run as a consequence of:

try: -b o -p android-api-11 -u …

As Android 4.3 API11+ opt tests have been introduced, the corresponding “Android 4.0 API11+ opt” test jobs have been disabled. Android 4.0 tests were running on our aging Pandaboards; running in the emulator on aws is more scalable, cost-effective, and future-safe.

Android 4.0 API11+ debug tests continue to run, but we plan to migrate those to the 4.3 emulator soon.

A few Android 4.0 API11+ opt Talos tests continue to run. We are evaluating whether those can be replaced by similar Autophone tests.

As with the introduction of any new test platform, some tests failed on Android 4.3 and had to be disabled or marked as failing on 4.3. Corresponding bugs have been opened for these tests and you can find them by looking for bugs with “[test disabled on android 4.3]” on the whiteboard:

https://bugzilla.mozilla.org/buglist.cgi?list_id=12205056&resolution=—&resolution=FIXED&resolution=INVALID&resolution=WONTFIX&resolution=DUPLICATE&resolution=WORKSFORME&resolution=INCOMPLETE&resolution=SUPPORT&resolution=EXPIRED&resolution=MOVED&status_whiteboard_type=allwordssubstr&query_format=advanced&status_whiteboard=disabled%20on%204.3&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED

Great thanks to everyone who has contributed to the 4.3 effort, but especially :kmoir for all the Release Engineering work to get everything running smoothly in continuous integration.

Looking for more details about the new environment? Have a look at:

Bug 1062365 Investigate Android 4.3/4.4 emulator test setup

Bug 1133833 Android 4.3 emulator tests

…or ask me!

mach support for mochitest-plain and mochitest-chrome on Android

Tags

, ,

We recently added mach commands for running mochitest-plain and mochitest-chrome tests on Android. For now, these commands only support the adb device manager (no way to run with sutagent) and there is minimal support for mochitest options available on desktop; see/comment on bug 1152944.

The old make targets continue to work, but should be considered deprecated.

See https://wiki.mozilla.org/Mobile/Fennec/Android#mochitest-plain / https://wiki.mozilla.org/Mobile/Fennec/Android#mochitest-chrome for detailed instructions.

Firefox for Android Performance Measures – February/March Check-up

Tags

,

I skipped my regular February post and I am thinking of writing up these performance summaries less frequently — maybe every 2 or 3 months. Any objections?

Highlights:

– Talos trobopan, tprovider, and ts tests have been retired.

– Big improvement in tsvgx.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 O. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

18.7 (start of period) – 19.0 (end of period)

trobopan

This test is no longer run.

tprovider

This test is no longer run.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

svg

3500 (start of period) – 720 (end of period).

Big improvements Feb 26 and March 6. The March 6 improvement seems to have been caused by a Talos update.

tp4m

Generic page load test. Lower values are better.

710 (start of period) – 680 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start seems steady this month.

throbber-start

Time to throbber stop has some small improvements and regressions.

throbber-stop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

eide4

mozbench

I recently discovered the mozbench dashboard at http://ouija.allizom.org/grafana/index.html#/dashboard/file/mozbench.json which includes some comparisons involving Firefox for Android. More info at https://wiki.mozilla.org/Auto-tools/Projects/Mozbench.

massive

webaudio

kraken

chalkboard

Complete logcats for Android tests (updated for 2015)

Tags

,

I described Android test “complete logcats” last year in https://gbrownmozilla.wordpress.com/2014/02/12/complete-logcats-for-android-tests/, but a few details have changed since then, so here’s a re-write!

“Logcats” – those Android logs you see when you execute “adb logcat” – are an essential part of debugging Firefox for Android. We include logcats in our Android test logs on treeherder: After a test run, we run logcat on the device, collect the output and dump it to the test log. Sometimes those logcats are very useful; other times, they are too little, too late. A typical problem is that a failure occurs early in a test run, but does not cause the test to fail immediately; by the time the test ends, the fixed-size logcat buffer has filled up and overwritten the earlier, important messages. How frustrating!

All Android test jobs also offer “complete logcats”: logcat is run for the duration of the test job, the output is collected continuously, and dumped to a file. At the end of the test job, the file is uploaded to an aws server, and a link is displayed in treeherder. Here’s a sample of a treeherder summary, from the bottom right (you may need to resize or scroll to see the whole thing):

logcat

Notice the “artifact uploaded logcat.log” line? Open that link and you have a complete logcat showing what was happening on the device for the duration of the test job.

We have not changed the “old” logcat features in test logs: We still run logcat at the end of most jobs and dump the output to the test log. That might be more convenient in some cases.

Happy test debugging!

More memory for the Android emulator

Tags

In bug 1062365, I have been trying to get our Firefox for Android tests running for a new platform, Android 4.4. Like the Android 2.3 tests, these will run on the Android arm emulator, hosted on aws; they will just be run on the much newer Android 4.4.

I recently noticed that some 4.4 test failures were related to Firefox for Android’s low memory handling: To allow for operation on low memory devices, some Firefox features behave differently when Android reports that the total memory on the device is below a certain threshold.

But my test failures were quite unexpected: I was configuring the emulator AVD with more than 1 GB of memory — much higher than the “low memory” threshold. What was going on?

Firefox for Android’s notion of a “low memory” vs a “high memory” device is based on the MemTotal reported in /proc/meminfo. If you configure an Android emulator to run Android 4.4, request a RAM size of 256 MB and then cat /proc/meminfo, you will see a MemTotal of approximately 256 MB:

MemTotal:         253904 kB

If you modify your emulator avd configuration to request a RAM size of 512 MB and cat /proc/meminfo, you will see a MemTotal of approximately 512 MB. That seems reasonable and expected.

But if you request 1024 MB, you will see:

MemTotal:         765372 kB

What!?

If you request 2048 MB, you will still see:

MemTotal:         765372 kB

You might be tempted to try setting the “-memory” command line option when starting the emulator, or passing “-qemu -m …” to request more memory in qemu, but none of that will help: Android will stubbornly refuse to acknowledge anything more than 765372 kB of memory.

Thankfully dmesg provided a clue about the missing memory:

Truncating RAM at 00000000-7fffffff to -2f7fffff (vmalloc region overlap).
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
      .text : 0xc0008000 - 0xc044a190   (4361 kB)
      .init : 0xc044b000 - 0xc0470000   ( 148 kB)
      .data : 0xc0470000 - 0xc04a8fc0   ( 228 kB)
       .bss : 0xc04a9000 - 0xc05f33c8   (1321 kB)

In the words of http://www.embedded-bits.co.uk/2011/vmalloc-region-overlap/, this is the kernel’s way of saying “I understand there may be some RAM here – but I’m not going to use it all”. That article gives a great description of what is happening and possible ways of accessing the extra RAM — I won’t regurgitate all the details here. It suggests that one way of using additional memory is to use a kernel configured with CONFIG_HIGHMEM.

I checked some Android kernels and found that several are normally built with CONFIG_HIGHMEM — but not the “goldfish” kernel, used in the Android emulator. I rebuilt the goldfish kernel, with CONFIG_HIGHMEM defined and installed the new kernel in my avd. Now, requesting 1536 MB in the avd configuration, I see:

MemTotal:         1537336 kB

Perfect!

How much memory should we configure the emulator to use when running Firefox tests? That’s an open question still. 765 MB seemed restrictive and out of step with the RAM found on many modern Android devices, so I’m glad to have a choice now, and interested in determining if the additional memory will help the performance of our tests, or resolve any known failures.

Great thanks to :snorp for helping me track down my “missing memory” and understand all of this!

The curious will find more context in bug 1128548.

Firefox for Android Performance Measures – January Check-up

Tags

,

My monthly review of Firefox for Android performance measurements for January 2015.

Highlights:

– No significant Talos regressions or improvements.

– Improvements in autophone’s “Time to throbber stop”.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 O. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck2

18.2 (start of period) – 18.7 (end of period)

Minor regression January 12 – bug 1122012.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

62000 (start of period) – 70000 (end of period)

Significant noise noted; no specific regression.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

5900 (start of period) – 5900 (end of period).

tp4m

Generic page load test. Lower values are better.

tp4

855 (start of period) – 870 (end of period).

Regression of January 15 may be reversed January 28 – no bug?

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start seems steady this month.

throbberstart

Time to throbber stop has multiple small improvements.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

eide4

Those all look pretty steady to me, but there’s lots more to explore — be sure to check out eideticker for yourself!

Firefox for Android Performance Measures – 2014 in Review

Tags

,

Let’s review our performance measures for 2014.

Highlights:

– regressions in 2014 for tcheck2, trobopan, and tspaint.

– improvements in tprovider, tsvgx, and tp4m.

– overall regressions in time to throbber start and stop.

– recent checkerboard regressions apparent on Eideticker.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck

Jan 2014: 4.7

Dec 2014: 18.2

This test seems to be one of our most frequently regressing tests. We had some good improvements this year, but overall we end the year significantly regressed from where we started. Silver lining: Test results are much less noisy now than they have been all year!

(For details on the December regression, see bug 1111565 / bug 1097318).

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

tpan

Jan 2014: 28000

Dec 2014: 62000

Again, there are some wins and losses over the year but we end the year significantly regressed. There is a lot of noise in the results.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

tprovider

Jan 2014: 560

Dec 2014: 520

Very steady performance here with a slight improvement in April carrying through to the end of the year.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

tsvg

Jan 2014: 6150

Dec 2014: 5900

This is great — we’re seeing the best performance of the year.

tp4m

Generic page load test. Lower values are better.

tp4

Jan 2014: 970

Dec 2014: 855

Wow, even better. This tells me someone out there really cares about our page load performance.

ts_paint

Startup performance test. Lower values are better.

tspaint

Jan 2014: 3700

Dec 2014: 4100

You can’t win them all? It feels like we’re slowly losing ground here.

Note that this test is currently hidden on treeherder; it fails very often – bug 1112697.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

I could not get cohesive phonedash graphs for the entire year, since we made so many changes to autophone over the year, but here are views for the last 6 months. It looks like we have some work to do on time to throbber start. Time to throbber stop is better, but we have lost ground there too.

throbstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Again, I couldn’t generate good graphs for the whole year, but here are some for the last 3 months.

eide1

eide2

eide3

Eideticker startup tests seem to be performing well.

eide4

eide5

But we’ve had some recent regressions in checkerboarding.

Happy New Year!

New Android job names on treeherder: Update your try pushes!

Tags

Treeherder builds are now producing two separate APKs for Android arm, each targeting a different Android API range — see bug 1073772 for discussion and details.

Instead of seeing:

old

You should now see:

new

Notice that Instead of a single Android Opt arm build (used for both Android 2.3 opt and Android 4.0 opt tests), there are now two separate builds: Android 2.3 API9 opt and Android 4.0 API10+ opt. There is a very similar change for Android Debug arm builds. Android x86 builds are unchanged.

This change affects try pushes for Android arm, because the builder names have changed. Instead of:

try: -b o -p android

you should use:

try: -b o -p android-api-9,android-api-11 …

Unfortunately, “-p android” no longer matches any builder name, so if you forget to update your try syntax, your push will not build on Android and you won’t run any tests — oops!

oops

Firefox for Android Performance Measures – November check-up

Tags

,

My monthly review of Firefox for Android performance measurements for November.

Highlights:

– Significant improvements in tcheck2, tsvgx, and tp4m.

– Phonedash and Eideticker measurements holding steady.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

17.8 (start of period) – 7.6 (end of period)

Significant improvement Nov 5.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

30000 (start of period) – 30000 (end of period)

Significant noise noted.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6100 (start of period) – 4500 (end of period).

Significant improvement Nov 24.

tp4m

Generic page load test. Lower values are better.

880 (start of period) – 800 (end of period).

Improvement Nov 24. This is the best score we have seen all year:

tp4-annual

ts_paint

Startup performance test. Lower values are better.

3850 (start of period) – 3950 (end of period).

No specific regression identified.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start and throbber stop seem steady this month.

throbstart

throbstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

Firefox for Android Performance Measures – September/October check-up

Tags

,

I skipped my monthly review for September, so here is a review of Firefox for Android performance measurements for September and October. Highlights:

– significant regression in tcheck2

– minor regressions in tp4m, tspaint

– improvements in trobopan, tsvgx

– checkerboard regressions in eideticker

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

check2

12 (start of period) – 17.8 (end of period)

Significant regression Oct 7 – bug 1086642.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 30000 (end of period)

Distinct improvement around Oct 8.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6100 (end of period).

Distinct improvement October 4.

tp4m

Generic page load test. Lower values are better.

850 (start of period) – 880 (end of period).

Regression on October 22 – bug 1088669.

ts_paint

Startup performance test. Lower values are better.

tspaint

3850 (start of period) – 3850 (end of period).

Regression on Oct 28 – bug 1091664.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbberstart

Note the changes in throbber start (and corresponding improvements in throbber stop, below) around Oct 1 — bug 888482. Changes around Oct 28 are being investigated in bug 1091664.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

cnn

cnn2

cnn3

dirty

taskjs

Follow

Get every new post delivered to your Inbox.