More memory for the Android emulator

Tags

In bug 1062365, I have been trying to get our Firefox for Android tests running for a new platform, Android 4.4. Like the Android 2.3 tests, these will run on the Android arm emulator, hosted on aws; they will just be run on the much newer Android 4.4.

I recently noticed that some 4.4 test failures were related to Firefox for Android’s low memory handling: To allow for operation on low memory devices, some Firefox features behave differently when Android reports that the total memory on the device is below a certain threshold.

But my test failures were quite unexpected: I was configuring the emulator AVD with more than 1 GB of memory — much higher than the “low memory” threshold. What was going on?

Firefox for Android’s notion of a “low memory” vs a “high memory” device is based on the MemTotal reported in /proc/meminfo. If you configure an Android emulator to run Android 4.4, request a RAM size of 256 MB and then cat /proc/meminfo, you will see a MemTotal of approximately 256 MB:

MemTotal:         253904 kB

If you modify your emulator avd configuration to request a RAM size of 512 MB and cat /proc/meminfo, you will see a MemTotal of approximately 512 MB. That seems reasonable and expected.

But if you request 1024 MB, you will see:

MemTotal:         765372 kB

What!?

If you request 2048 MB, you will still see:

MemTotal:         765372 kB

You might be tempted to try setting the “-memory” command line option when starting the emulator, or passing “-qemu -m …” to request more memory in qemu, but none of that will help: Android will stubbornly refuse to acknowledge anything more than 765372 kB of memory.

Thankfully dmesg provided a clue about the missing memory:

Truncating RAM at 00000000-7fffffff to -2f7fffff (vmalloc region overlap).
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
      .text : 0xc0008000 - 0xc044a190   (4361 kB)
      .init : 0xc044b000 - 0xc0470000   ( 148 kB)
      .data : 0xc0470000 - 0xc04a8fc0   ( 228 kB)
       .bss : 0xc04a9000 - 0xc05f33c8   (1321 kB)

In the words of http://www.embedded-bits.co.uk/2011/vmalloc-region-overlap/, this is the kernel’s way of saying “I understand there may be some RAM here – but I’m not going to use it all”. That article gives a great description of what is happening and possible ways of accessing the extra RAM — I won’t regurgitate all the details here. It suggests that one way of using additional memory is to use a kernel configured with CONFIG_HIGHMEM.

I checked some Android kernels and found that several are normally built with CONFIG_HIGHMEM — but not the “goldfish” kernel, used in the Android emulator. I rebuilt the goldfish kernel, with CONFIG_HIGHMEM defined and installed the new kernel in my avd. Now, requesting 1536 MB in the avd configuration, I see:

MemTotal:         1537336 kB

Perfect!

How much memory should we configure the emulator to use when running Firefox tests? That’s an open question still. 765 MB seemed restrictive and out of step with the RAM found on many modern Android devices, so I’m glad to have a choice now, and interested in determining if the additional memory will help the performance of our tests, or resolve any known failures.

Great thanks to :snorp for helping me track down my “missing memory” and understand all of this!

The curious will find more context in bug 1128548.

Firefox for Android Performance Measures – January Check-up

Tags

,

My monthly review of Firefox for Android performance measurements for January 2015.

Highlights:

– No significant Talos regressions or improvements.

– Improvements in autophone’s “Time to throbber stop”.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 O. The test names shown are those used on treeherder. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck2

18.2 (start of period) – 18.7 (end of period)

Minor regression January 12 – bug 1122012.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

62000 (start of period) – 70000 (end of period)

Significant noise noted; no specific regression.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

5900 (start of period) – 5900 (end of period).

tp4m

Generic page load test. Lower values are better.

tp4

855 (start of period) – 870 (end of period).

Regression of January 15 may be reversed January 28 – no bug?

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start seems steady this month.

throbberstart

Time to throbber stop has multiple small improvements.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

eide4

Those all look pretty steady to me, but there’s lots more to explore — be sure to check out eideticker for yourself!

Firefox for Android Performance Measures – 2014 in Review

Tags

,

Let’s review our performance measures for 2014.

Highlights:

– regressions in 2014 for tcheck2, trobopan, and tspaint.

– improvements in tprovider, tsvgx, and tp4m.

– overall regressions in time to throbber start and stop.

– recent checkerboard regressions apparent on Eideticker.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

tcheck

Jan 2014: 4.7

Dec 2014: 18.2

This test seems to be one of our most frequently regressing tests. We had some good improvements this year, but overall we end the year significantly regressed from where we started. Silver lining: Test results are much less noisy now than they have been all year!

(For details on the December regression, see bug 1111565 / bug 1097318).

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

tpan

Jan 2014: 28000

Dec 2014: 62000

Again, there are some wins and losses over the year but we end the year significantly regressed. There is a lot of noise in the results.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

tprovider

Jan 2014: 560

Dec 2014: 520

Very steady performance here with a slight improvement in April carrying through to the end of the year.

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

tsvg

Jan 2014: 6150

Dec 2014: 5900

This is great — we’re seeing the best performance of the year.

tp4m

Generic page load test. Lower values are better.

tp4

Jan 2014: 970

Dec 2014: 855

Wow, even better. This tells me someone out there really cares about our page load performance.

ts_paint

Startup performance test. Lower values are better.

tspaint

Jan 2014: 3700

Dec 2014: 4100

You can’t win them all? It feels like we’re slowly losing ground here.

Note that this test is currently hidden on treeherder; it fails very often – bug 1112697.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

I could not get cohesive phonedash graphs for the entire year, since we made so many changes to autophone over the year, but here are views for the last 6 months. It looks like we have some work to do on time to throbber start. Time to throbber stop is better, but we have lost ground there too.

throbstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

Again, I couldn’t generate good graphs for the whole year, but here are some for the last 3 months.

eide1

eide2

eide3

Eideticker startup tests seem to be performing well.

eide4

eide5

But we’ve had some recent regressions in checkerboarding.

Happy New Year!

New Android job names on treeherder: Update your try pushes!

Tags

Treeherder builds are now producing two separate APKs for Android arm, each targeting a different Android API range — see bug 1073772 for discussion and details.

Instead of seeing:

old

You should now see:

new

Notice that Instead of a single Android Opt arm build (used for both Android 2.3 opt and Android 4.0 opt tests), there are now two separate builds: Android 2.3 API9 opt and Android 4.0 API10+ opt. There is a very similar change for Android Debug arm builds. Android x86 builds are unchanged.

This change affects try pushes for Android arm, because the builder names have changed. Instead of:

try: -b o -p android

you should use:

try: -b o -p android-api-9,android-api-11 …

Unfortunately, “-p android” no longer matches any builder name, so if you forget to update your try syntax, your push will not build on Android and you won’t run any tests — oops!

oops

Firefox for Android Performance Measures – November check-up

Tags

,

My monthly review of Firefox for Android performance measurements for November.

Highlights:

– Significant improvements in tcheck2, tsvgx, and tp4m.

– Phonedash and Eideticker measurements holding steady.

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

17.8 (start of period) – 7.6 (end of period)

Significant improvement Nov 5.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

30000 (start of period) – 30000 (end of period)

Significant noise noted.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6100 (start of period) – 4500 (end of period).

Significant improvement Nov 24.

tp4m

Generic page load test. Lower values are better.

880 (start of period) – 800 (end of period).

Improvement Nov 24. This is the best score we have seen all year:

tp4-annual

ts_paint

Startup performance test. Lower values are better.

3850 (start of period) – 3950 (end of period).

No specific regression identified.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Time to throbber start and throbber stop seem steady this month.

throbstart

throbstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1

eide2

eide3

Firefox for Android Performance Measures – September/October check-up

Tags

,

I skipped my monthly review for September, so here is a review of Firefox for Android performance measurements for September and October. Highlights:

– significant regression in tcheck2

– minor regressions in tp4m, tspaint

– improvements in trobopan, tsvgx

– checkerboard regressions in eideticker

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

check2

12 (start of period) – 17.8 (end of period)

Significant regression Oct 7 – bug 1086642.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 30000 (end of period)

Distinct improvement around Oct 8.

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6100 (end of period).

Distinct improvement October 4.

tp4m

Generic page load test. Lower values are better.

850 (start of period) – 880 (end of period).

Regression on October 22 – bug 1088669.

ts_paint

Startup performance test. Lower values are better.

tspaint

3850 (start of period) – 3850 (end of period).

Regression on Oct 28 – bug 1091664.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbberstart

Note the changes in throbber start (and corresponding improvements in throbber stop, below) around Oct 1 — bug 888482. Changes around Oct 28 are being investigated in bug 1091664.

throbberstop

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

cnn

cnn2

cnn3

dirty

taskjs

Running my own AutoPhone

Tags

,

AutoPhone is a brilliant platform for running automated tests on physical mobile devices.

:bc maintains an AutoPhone instance running startup performance tests (aka “S1/S2 tests” or “throbber start/stop tests”) on a small farm of test phones; those tests run against all of our Firefox for Android builds and results are reported to PhoneDash, available for viewing at http://phonedash.mozilla.org/.

I have used phonedash.mozilla.org for a long time now, and reported regressions in bugs and in my monthly “Performance Check-up” posts, but I have never looked under the covers or tried to use AutoPhone myself — until this week.

All things considered, it is surprisingly easy to set up your own AutoPhone instance and run your own tests. You might want to do this to reproduce phonedash.mozilla.org results on your own computer, or to check for regressions on a feature before check-in.

Here’s what I did to run my own AutoPhone instance running S1/S2 tests against mozilla-inbound builds:

Install AutoPhone:

git clone https://github.com/mozilla/autophone

cd autophone

pip install -r requirements.txt

Install PhoneDash, to store and view results:

git clone https://github.com/markrcote/phonedash

Create a phonedash settings file, phonedash/server/settings.cfg with content:

[database]
SQL_TYPE=sqlite
SQL_DB=yourdb
SQL_SERVER=localhost
SQL_USER=
SQL_PASSWD=

Start phonedash:

python server.py <ip address of your computer>

It will log status messages to the console. Watch that for any errors, and to get a better understanding of what’s happening.

Prepare your device:

Connect your Android phone or tablet to your computer by USB. Multiple devices may be connected. Each device must be rooted. Check that you can see your devices with adb devices — and note the serial number(s) (see devices.ini below).

Configure your device:

cp devices.ini.example devices.ini

Edit devices.ini, changing the serial numbers to your device serial numbers and the device names to something meaningful to you. Here’s my simple devices.ini for one device I called “gbrown”:

[gbrown]
serialno=01498B300600B008

Configure autophone:

cp autophone.ini.example autophone.ini

Edit autophone.ini to make it your own. Most of the defaults are fine; here is mine:

[settings]
#clear_cache = False
#ipaddr = …
#port = 28001
#cachefile = autophone_cache.json
#logfile = autophone.log
loglevel = DEBUG
test_path = tests/manifest.ini
#emailcfg = email.ini
enable_pulse = True
enable_unittests = False
#cache_dir = builds
#override_build_dir = None
repos = mozilla-inbound
#buildtypes = opt
#build_cache_port = 28008
verbose = True

#build_cache_size = 20
#build_cache_expires = 7
#device_ready_retry_wait = 20
#device_ready_retry_attempts = 3
#device_battery_min = 90
#device_battery_max = 95
#phone_retry_limit = 2
#phone_retry_wait = 15
#phone_max_reboots = 3
#phone_ping_interval = 15
#phone_command_queue_timeout = 10
#phone_command_queue_timeout = 1
#phone_crash_window = 30
#phone_crash_limit = 5

python autophone.py -h provides help on options, which are analogues of the autophone.ini settings.

Configure your tests:

Notice that autophone.ini has a test path of tests/manifest.ini. By default, tests/manifest.ini is configured for S1/S2 tests — it points to configs/s1s2_settings.ini. We need to set up that file:

cd configs

cp s1s2_settings.ini.example s1s2_settings.ini

Edit s1s2_settings.ini to make it your own. Here’s mine:

[paths]
#source = files/
#dest = /mnt/sdcard/tests/autophone/s1test/
#profile = /data/local/tmp/profile

[locations]
# test locations can be empty to specify a local
# path on the device or can be a url to specify
# a web server.
local =
remote = http://192.168.0.82:8080/

[tests]
blank = blank.html
twitter = twitter.html

[settings]
iterations = 2
resulturl = http://192.168.0.82:8080/api/s1s2/

[signature]
id =
key =

Be sure to set the resulturl to match your PhoneDash instance.

If running local tests, copy your test files (like blank.html above) to the files directory. If runnng remote tests, be sure that your test files are served from the resulturl (if using PhoneDash, copy to the html directory).

Start autophone:

python autophone.py –config autophone.ini

With these settings, autophone will listen for new builds on mozilla-inbound, and start tests on your device(s) for each one. You should start to see your device reboot, then Firefox will be installed and startup tests will run. As more builds complete on mozilla-inbound, more tests will run.

autophone.py will print some diagnostics to the console, but much more detail is available in autophone.log — watch that to see what’s happening.

Check your phonedash instance for results — visit http://<ip address of your computer>:8080. At first this won’t have any data, but as autophone runs tests, you’ll start to see results. Here’s my instance after a few hours:

myphonedash

Firefox for Android Performance Measures – August check-up

Tags

, , ,

My monthly review of Firefox for Android performance measurements. This month’s highlights:

 – Eideticker for Android is back!

 – small regression in ts_paint

 – small improvement in tp4m

 – small regression in time to throbber start / stop

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test is not currently run on Android 4.0.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

12 (start of period) – 12 (end of period)

There was a temporary regression in this test for much of the month, but it seems to be resolved now.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 50000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6300 (end of period).

tp4m

Generic page load test. Lower values are better.

940 (start of period) – 850 (end of period).

tp4m

Improvement noted around August 21.

ts_paint

Startup performance test. Lower values are better.

3650 (start of period) – 3850 (end of period).

tspaint

Note the slight regression around August 12, and perhaps another around August 27 – bug 1061878.

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

throbstart

Note the regression in time to throbber start around August 14 — bug 1056176.

throbstop

The same regression, less pronounced, is seen in time to throbber stop.

Eideticker

Eideticker for Android is back after a long rest – yahoo!!

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

eide1 eide2 eide3 eide4 eide5

Firefox for Android Performance Measures – July check-up

Tags

, ,

My monthly review of Firefox for Android performance measurements. This month’s highlights:

– No significant regressions or improvements found!

Talos

This section tracks Perfomatic graphs from graphs.mozilla.org for mozilla-central builds of Firefox for Android, for Talos tests run on Android 4.0 Opt. The test names shown are those used on tbpl. See https://wiki.mozilla.org/Buildbot/Talos for background on Talos.

tcanvasmark

This test is not currently run on Android 4.0.

tcheck2

Measure of “checkerboarding” during simulation of real user interaction with page. Lower values are better.

Screenshot from 2014-08-03 13:13:44

12 (start of period) – 12 (end of period)

The temporary regression of July 24 was caused by bug 1031107; resolved by bug 1044702.

trobopan

Panning performance test. Value is square of frame delays (ms greater than 25 ms) encountered while panning. Lower values are better.

50000 (start of period) – 50000 (end of period)

tprovider

Performance of history and bookmarks’ provider. Reports time (ms) to perform a group of database operations. Lower values are better.

520 (start of period) – 520 (end of period).

tsvgx

An svg-only number that measures SVG rendering performance. About half of the tests are animations or iterations of rendering. This ASAP test (tsvgx) iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the page load time, or, for animations/iterations – overall duration the sequence/animation took to complete. Lower values are better.

6300 (start of period) – 6300 (end of period).

tp4m

Generic page load test. Lower values are better.

940 (start of period) – 940 (end of period).

ts_paint

Startup performance test. Lower values are better.

3600 (start of period) – 3650 (end of period).

Throbber Start / Throbber Stop

These graphs are taken from http://phonedash.mozilla.org.  Browser startup performance is measured on real phones (a variety of popular devices).

Screenshot from 2014-08-03 13:39:26

Screenshot from 2014-08-03 13:45:17

Screenshot from 2014-08-03 13:49:07

Eideticker

These graphs are taken from http://eideticker.mozilla.org. Eideticker is a performance harness that measures user perceived performance of web browsers by video capturing them in action and subsequently running image analysis on the raw result.

More info at: https://wiki.mozilla.org/Project_Eideticker

The Eideticker dashboard is slowly coming back to life, but there are still not enough results to show graphs here. We’ll check back at the end of August.

New try aliases “xpcshell” and “robocop”

Tags

, ,

We now have two new try aliases which will be of interest to some mobile developers.

Android 2.3 tests run xpcshell tests in 3 chunks, which can be specified in a try push:

try: … -u xpcshell-1,xpcshell-2,xpcshell-3

but since all other test platforms run xpcshell as a single chunk, it’s easy to forget about Android 2.3’s chunks and push something like:

try: -b o -p all -u xpcshell -t none

…and then wonder why xpcshell tests didn’t run for Android 2.3!

As of today, a new try alias recognizes “xpcshell” to mean “run all the xpcshell test chunks”.

Similarly, a new try alias recognizes “robocop” to mean “run all the robocop test chunks”.

An example: https://tbpl.mozilla.org/?tree=Try&rev=e52bcf945dcd

tryaliases

How convenient!

(Of course, “-u xpcshell-1″, “-u robocop-2, robocop-3″, etc still work and you should use them if you only need to run specific chunks.)

Thanks to :Callek and :RyanVM for making this happen.

Follow

Get every new post delivered to your Inbox.