Hunting a race condition in the Android 10 Emulator

Keith Johnson, Friday April 16, 2021

How we found a race condition in the AOSP with the Android emulator that affected the amount of heap space available to apps.

Rainforest supports testing native mobile applications on Android using the official android emulator from Google. Using emulators instead of real physical devices provides a bunch of benefits such as being able to reproduce issues locally (hard to do if you don’t have the same hardware device, but trivial if you can run the same emulator), better isolation (no need to wipe/worry about data leaking on a real device, we can just throw the entire emulator away and make a new one), faster turnaround (we are not limited by the number of physical devices), and some nice debugging features/functionality that the emulator supports (location spoofing, virtual camera support, etc).

One of our customers was experiencing periodic crashes of their application when testing in our Android 10 emulator. After some initial investigation it appeared that their application was exhausting the amount of heap space available leading to a crash. We found that in the instances where their application was crashing, the log messages would show that the heap was only 16 MiB. This was quite surprising to us since our emulators have ~4 GiB of memory. It was also very odd because further investigation (i.e. adb shell getprop dalvik.vm.heapsize) showed that the heap size should have been 576m. So why was the application crashing after only using 16 MiB?

How Android’s jvm is configured

To get to the bottom of this we needed to understand how the Android java virtual machine is configured. The short version is that on boot a process called zygote is started. This process launches a jvm and preloads some common Android classes into it. When any application is started, the zygote process is forked and the application starts running on the pre-initialized jvm instance. This saves the cost of initializing a new jvm instance every time an app starts. There is a great post about how all of this works if you’re looking for more detail.

The important thing here is that zygote is what controls the jvm runtime configuration, including the jvm heap size. Since Android is open source we can go look at the source to see how the jvm is configured. In frameworks/base/core/jni/AndroidRuntime.cpp:770 we find:

  * The default starting and maximum size of the heap.  Larger
  * values should be specified in a product property override.
parseRuntimeOption("dalvik.vm.heapsize", heapsizeOptsBuf, "-Xmx", "16m");

This argument should look familiar if you’ve worked with java in the past: -Xmx is the standard way to control the heap size in a java application. This line of code sets the heap size to the value of the dalvik.vm.heapsize property with a default of 16m if that property doesn’t exist.

It also turns out that all the jvm arguments are logged by the zygote process on startup so we can check what the value being set on boot is. In our case we were seeing -Xmx 16m as one of the arguments:

zygote  : option[0]=-Xzygote
zygote  : option[1]=-Xcheck:jni
zygote  : option[2]=exit
zygote  : option[3]=vfprintf
zygote  : option[4]=sensitiveThread
zygote  : option[5]=-verbose:gc
zygote  : option[6]=-Xms4m
zygote  : option[7]=-Xmx16m
zygote  : option[8]=-Xusejit:true

Finding the race

After the Android emulator finishes booting we know dalvik.vm.heapsize has the correct value. But we know from the logs that when zygote initalizes it’s being set to 16m. This means the value is either 16m at that time, or it’s unset and falling back to the default.

System properties

System properties are loaded by the init binary before it starts executing the init scripts and read a few different files (see system/core/init/property_service.cpp:876 for the specific details); but in the case of the Android 10 emulator the only relevant one is /system/build.prop. Checking this file reveals that there is no value for the dalvik.vm.heapsize property specified. Searching the filesystem for other property files that contain this property setting comes up empty as well.

Android Init

Inside system/core/init/init.cpp:648, a function called process_kernel_cmdline() runs which parses the kernel command line and creates Android properties out of what it finds there. These properties are created as ro.kernel.<kernel argument>. This way of setting properties is really only useful to the Android emulator so that it can allow users some settings knobs which impact these values inside the emulator. Since this is only setting ro.kernel.qemu.dalvik.vm.heapsize, we’re still left wondering how dalvik.vm.heapsize eventually gets set to the proper value.

Digging into the init scripts, we discover in vendor/etc/init/hw/init.ranchu.rc:36-37 a call to setprop which copies the value from the ro.kernel property to the real one. In the main /init.rc we find out that zygote is asked to start before this. Simplifying that down, the relevant parts of the init scripts look something like this:

# in /init.rc
on late-init
    trigger zygote-start
    trigger boot

on zygote-start
    start zygote

# in vendor/etc/init/hw/init.ranchu.rc
on boot
    setprop dalvik.vm.heapsize ${ro.kernel.qemu.dalvik.vm.heapsize}

It’s important to understand how these init scripts are parsed; thankfully this is described in detail in the AOSP source code. To save you some reading, the important part is that start zygote is not synchronous (and even if it were, zygote does not immediately read the system property). The race should become clear now.

The race in action:

  1. Android boots, and eventually the main init binary starts
  2. init parses the kernel cmdline, which sets ro.kernel.qemu.dalvik.vm.heapsize to the correct value the emulator provides (576m)
  3. init starts running the rc init scripts
  4. on late-init runs, which triggers the zygote-start event
  5. on zygote-start runs start zygote, which launches the zygote process asynchronously
  6. on late-init continues running, and eventually triggers boot
  7. The race is lost: zygote reads the dalvik.vm.heapsize property, finds it unset, and defaults to 16m
  8. on boot from init.ranchu.rc runs, and calls setprop dalvik.vm.heapsize ${ro.kernel.qemu.dalvik.vm.heapsize}
  9. The race is won: zygote reads the dalvik.vm.heapsize property which contains the correct value (576m)

Making sure we always win the race

Now that we understand what is going wrong we need to figure out a fix. Fortunately it ends up being very simple in our case since we have root inside the emulator and can change any files we’d like: just statically configure dalvik.vm.heapsize in /system/build.prop instead of relying on it to come through as a kernel command line argument. This will be loaded early in the init process so that it’s present before zygote launches and needs it. It’s important to make sure that this value matches the value the emulator is setting on the kernel command line, otherwise you can still encounter some inconsistency if the emulator wins the race and resets the value to something different.

Interesting? We're looking for hungry hackers who love testing and building tools for other developers. Click here to get in touch.