image_pdfimage_print

U-Boot Concept now has built-in memory-leak detection that can be enabled with a single flag. It snapshots the heap before each test and reports any new allocations left behind afterwards, with full caller backtraces showing exactly where the leaked memory was allocated.

The problem

U-Boot’s driver model creates and destroys hundreds of devices during each test. Every device_bind(), uclass_get() and device_probe() call allocates memory, and every device_unbind() and uclass_destroy() must free it all back. A single missing free() is easy to introduce and hard to spot in review.

Before this work, the only way to find leaks was to manually add malloc_dump_to_file() calls, run the test, and diff the output. This was tedious enough that it rarely happened.

The solution (for sandbox)

Three new entry points make leak detection automatic:

ut -L (U-Boot command line):

    => ut -L dm dm_test_acpi_bgrt
    Test: acpi_bgrt: acpi.c
    Leak: 2 allocs
      14a5c5c0 110 stdio_clone:230 <- stdio_register_dev:244 <-vidconsole_post_probe:961
      14a5c6d0  b0 map_to_sysmem:210 <-video_post_probe:823 <-device_probe:589

um t –leak-check (uman tool):

$ um t --leak-check dm
  550/563: 550 passed, 0 failed, 0 skipped, 2 leaked (448)
Top leaks:
    448  acpi_bgrt
           272  stdio_clone:230 <-stdio_register_dev:244 <-vidconsole_post_probe:961
           176  map_to_sysmem:210 <-video_post_probe:823 <-device_probe:589

test.py –leak-check:

$ test/py/test.py -B sandbox --leak-check -k dm
1234 passed, 159 skipped
Results: 391 leaked (10.3M)
Top leaks:
   2.0M  bootflow_cmdline
           736  abuf_realloc:81 <-fs_read_alloc:1277 <-bootmeth_alloc_file:371
        24x 2.0M  sandbox_mmc_probe:194 <-device_probe:584 ...

Each leaked allocation is printed with its heap address, chunk size, and the caller backtrace captured at malloc() time (when CONFIG_MCHECK_HEAP_PROTECTION is enabled, which is the default for sandbox).

malloc leak (interactive command):

=> malloc leak start
Heap snapshot: 974 allocs
=> setenv foo bar
=> malloc leak end
  14a2a9a0 90 sandbox_strdup:353 <-hsearch_r:403 <-env_do_env_set:130
  14a2aa30 90 sandbox_strdup:353 <-hsearch_r:403 <-env_do_env_set:130
2 leaked allocs

The malloc leak command is useful for investigating leaks interactively. Use malloc leak (without arguments) to check the count without releasing the snapshot, so you can keep narrowing down.

How it works

The implementation snapshots every in-use heap chunk address at the start of each test and compares after cleanup:

  1. Snapshot: malloc_leak_check_start() walks the dlmalloc heap and records every in-use chunk address in an os_malloc()-allocated array. Using host memory on sandbox avoids perturbing the heap being measured. This could potentially be implemented using separate memory on another board.
  2. Check: malloc_leak_check_end() walks the heap again and reports any chunk whose address was not in the snapshot. The addresses are sorted, so lookup uses binary search (O(log n) per chunk).
  3. Caller info: For each leaked chunk, the mcheck header at the start of the chunk is read directly (bypassing the fixed-size mcheck registry, which can overflow in long sessions). The canary is validated before trusting the caller string to avoid printing garbage from chunks allocated while mcheck was disabled.

Leaks found and fixed

The initial scan of the driver model test suite found 90 leaking test runs across 7 distinct bugs:

SubsystemBugBytes
SCMIOutput buffer not freed on success path in scmi_base_discover_list_protocols_int()160
PMICSandbox I2C PMIC emulator has no remove handler to free register buffer160
SPIstrdup() name in sandbox_sf_bind_emul() not marked with DM_FLAG_NAME_ALLOCED160
PCIhose->regions array allocated in decode_regions() but never freed on device removal256
ACPI DPfree_context() missing free(ctx->base); two tests missing free_context() calls640+
ACPI itemsacpi_reset_items() not freeing item->buf allocations; three tests missing free(buf)4K
Videoshow_splash() leaking map_to_sysmem() mapping176

After fixing these, the DM test suite went from 90 leaking test runs down to 2 (the remaining acpi_bgrt leak requires changes to the stdio/console multiplexer and needs another look).

Getting started

To scan for leaks in your subsystem:

# Quick scan with uman
$ um t --leak-check -V <suite>

# Full pytest run
$ test/py/test.py -B sandbox --leak-check -k <test_pattern>

# Interactive investigation
=> malloc leak start
=> <your commands here>
=> malloc leak end

The caller backtraces point directly to the allocation site, so in most cases you can identify and fix the leak without any additional debugging.

See the documentation for the full API reference and workflow guide.

Author

  • Simon Glass is a primary author of U-Boot, with around 10K commits. He is maintainer of driver model and various other subsystems in U-Boot.

Leave a Reply

Your email address will not be published. Required fields are marked *