The problem

U-Boot’s pytest suite runs hundreds of tests in a single sandbox session. Each test sends commands to a long-running U-Boot process, so state left behind by one test can break those that follow. CI runs all tests in collection order and that is the only configuration known to work. When tests are run in subsets or different orders, mysterious failures can appear: wrong addresses, corrupted EFI logs, exhausted bootstage tables and out-of-memory errors from bzip2.

These failures are hard to reproduce because they depend on test ordering and on which earlier test polluted the state. A new series in Concept aims to help with all of this.

Hunting the bugs

The first step was to add a –malloc-dump option to sandbox. This writes the dlmalloc heap to a file on exit, showing every allocation with its caller backtrace. Comparing dumps from different points in a session quickly reveals leaked allocations:

$ diff /tmp/before.txt /tmp/after.txt

The %d format in the filename produces a separate dump for each U-Boot restart, making it easy to see how the heap evolves across a session:

test/py/test.py -B sandbox --malloc-dump /tmp/heap%d.txt

To make this work reliably, sandbox_exit() was updated to accept an exit code, and the -c command path was changed to call it instead of os_exit() directly. This ensures state_uninit() always runs and the dump is always written.

What we found

Memory leaks – bootflow_scan() leaked ~1.2 MB per scan because failed bootflows were not freed before retrying. The malloc_fill_pool test leaked the entire 120 MB pool when an assertion failed before freeing.

Stale environment variables – PXE tests changed kernel_addr_r, ramdisk_addr_r and fdt_addr_r without restoring them. The source test changed image_load_addr via loadaddr. The log format test left a non-default format active.

EFI log pollution – host load calls efi_set_bootdev() which logs EFI pool operations. When bootflow_efi() ran later, it found stale free_pool(NULL) entries that failed validation.

Bootstage exhaustion – Each bootm call adds ~19 unique bootstage records. With only 50 slots and no cleanup between tests, the table filled after a few dozen FIT tests. Later tests then saw Bootstage space exhausted in the console output, which broke console assertions.

The fixes

Each fix follows the same principle: the test that creates the mess should clean it up, or the framework should prevent accumulation.

bootflow_efi resets the EFI log at the start so it only checks its own entries.
extlinux tests restore default address env vars before scanning.
test_source sets loadaddr to match CONFIG_SYS_LOAD_ADDR and restores the FDT pointer at the end.
bootstage save/restore commands (bootstage save and bootstage restore) let the test framework snapshot and restore the record count. A preserve_bootstage() context manager wraps FIT tests that trigger bootm.
restart marker – tests that restart U-Boot are marked with @pytest.mark.restart so they can be skipped with -k 'not restart' when debugging under GDB.

Tooling added

A new -malloc-dump option for both sandbox and pytest writes the dlmalloc heap to a file on exit. The filename supports %d which is replaced with a sequence number that increments on each U-Boot restart, producing a separate dump for each session.

To make this work reliably, sandbox_exit() now accepts an exit code and the -c command path uses it instead of calling os_exit() directly. This ensures state_uninit() always runs and the dump is always written.

New bootstage save and bootstage restore subcommands (behind CONFIG_BOOTSTAGE_SAVE) let the test framework snapshot and restore the bootstage record count. A preserve_bootstage() context manager in test/py/utils.py wraps this for Python tests that trigger bootm and other bootstage-heavy commands.

Lessons

State leaks are inevitable in a long-running process. The test framework needs save/restore hooks for global state, not just DM reinit.
Dumps beat debuggers for heap issues. Comparing two heap dumps instantly shows what leaked, while stepping through thousands of alloc/free calls is impractical.
Test ordering matters. CI runs tests in collection order, so that is the only ordering known to work. A test that passes alone but fails in a full session is almost always a side-effect bug. The um py --pollute <test> option automates a binary search to find which earlier test causes the failure.
Clean up after yourself – or better, make the framework do it. The bootstage save/restore is a good example: individual tests don’t need to know about it.

Author

Simon Glass

Simon Glass is a primary author of U-Boot, with around 10K commits. He is maintainer of driver model and various other subsystems in U-Boot.

Fixing Pytest Inter-Test Side Effects