Introducing Pickman: AI-Powered Cherry-Pick Management for U-Boot

Managing cherry-picks across multiple branches is one of the more tedious aspects of maintaining a large project like U-Boot. When you need to backport dozens of commits from an upstream branch while handling merge commits, resolving conflicts, and creating merge requests, the process can consume hours of developer time.

Today we’re introducing pickman, a new tool in the U-Boot Concept tree that automates cherry-pick workflows using AI assistance. Pickman combines database tracking, GitLab integration, and the Claude Agent SDK to transform what was once a manual, error-prone process into a streamlined, largely automated workflow.

The Problem

U-Boot maintainers regularly need to cherry-pick commits from upstream branches (like us/next) to integration branches. This involves:

  • Identifying which commits haven’t been cherry-picked yet
  • Handling merge commits that group related changes
  • Resolving conflicts when commits don’t apply cleanly
  • Creating merge requests for review
  • Tracking which commits have been processed
  • Responding to review comments on merge requests

With hundreds of commits to process, this becomes a significant time investment. Pickman aims to reduce this burden dramatically.

How Pickman Works

Database Tracking

Pickman maintains a local SQLite database (.pickman.db) that tracks:

  • Source branches being monitored
  • The last cherry-picked commit for each source
  • Individual commit status (pending, applied, conflict, skipped)
  • Merge requests created and their status
  • Processed review comments

This persistent state allows pickman to resume work across sessions and avoid re-processing commits that have already been handled.

AI-Powered Cherry-Picking

The core innovation in pickman is its use of the Claude Agent SDK to handle the actual cherry-pick operations. When you run pickman apply, the tool:

  1. Identifies the next set of commits to cherry-pick (typically grouped by merge commit)
  2. Creates a new branch for the work
  3. Invokes Claude to perform the cherry-picks
  4. Claude handles any conflicts that arise, using its understanding of the codebase
  5. Records the results in the database
  6. Optionally pushes and creates a GitLab merge request

The AI agent can resolve many common conflicts automatically, understanding context like renamed files, moved code, and API changes. When it encounters something it can’t resolve, it reports the issue clearly.

GitLab Integration

Pickman integrates directly with GitLab to:

  • Push branches and create merge requests automatically
  • Monitor MR status (open, merged, closed)
  • Fetch and process review comments
  • Update the database when MRs are merged

This creates a closed loop where pickman can operate continuously, creating an MR, waiting for it to be reviewed and merged, then moving on to the next batch of commits.

Getting Started

Prerequisites

# Install the Claude Agent SDK
pip install claude-agent-sdk

# Install the GitLab API library
pip install python-gitlab

# Set up your GitLab token (or use ~/.config/pickman.conf)
export GITLAB_TOKEN="your-token-here"

Basic Usage

# Add a source branch to track
pickman add-source us/next abc123def

# See what commits are pending
pickman next-set us/next

# Apply the next batch of commits
pickman apply us/next --push

# Check comments on open merge requests, taking action as needed
pickman review --remote ci

# Run continuously until stopped
pickman poll us/next --interval 300

Key Commands

Command Description
add-source Register a new source branch to track
compare Show differences between branches
next-set Preview the next commits to be cherry-picked
apply Cherry-pick commits using Claude
step Process merged MRs and create new ones
poll Run step continuously at an interval
review Check MRs and handle review comments
count-merges Show how many merges remain to process

The Workflow in Practice

A typical workflow using pickman looks like this:

# Initial setup - track the upstream branch starting from a known commit
$ pickman add-source us/next e7f94bcbcb0

# See how much work there is
$ pickman count-merges us/next
Found 47 merge commits to process

# Preview what's coming next
$ pickman next-set us/next
Next 3 commits to cherry-pick from us/next:
  - a1b2c3d: arm: Fix cache alignment issue
  - d4e5f6a: drivers: gpio: Add new driver
  - b7c8d9e: Merge "ARM improvements"

# Let pickman handle it
$ pickman apply us/next --push --remote ci --target master
Creating branch cherry-a1b2c3d...
Cherry-picking 3 commits...
Pushing to ci...
Creating merge request...
MR created: https://gitlab.com/project/-/merge_requests/123

# Later, process any review comments and continue
$ pickman poll us/next --remote ci --interval 300

The Human Review Loop

While pickman automates much of the cherry-pick process, human oversight remains central to the workflow. The tool is designed to work alongside maintainers, not replace them. Here’s how the review cycle works:

1. MR Creation

When pickman creates a merge request, it includes a detailed description with the source branch, list of commits, and a conversation log showing how Claude handled the cherry-picks. This gives reviewers full visibility into what happened.

2. Human Review

Maintainers review the MR just like any other contribution. They can:

  • Examine the diff to verify the cherry-pick is correct
  • Check that conflicts were resolved appropriately
  • Request changes by leaving comments on specific lines
  • Ask questions about why certain decisions were made

3. Automated Comment Handling

When reviewers leave comments, pickman’s review command detects them and invokes Claude to address the feedback:

$ pickman review --remote ci
Found 1 open pickman MR(s):
  !123: [pickman] arm: Fix cache alignment issue
Processing comments for MR !123...
  Comment from maintainer: "This variable name should match the upstream style"
  Addressing comment...
  Pushing updated branch...

Claude reads the comment, understands the requested change, modifies the code accordingly, and pushes an updated branch. The merge-request notes are updated with the new transcript. The database tracks which comments have been processed to avoid duplicate work.

4. Iteration

The review cycle can repeat multiple times. Each time a reviewer adds new comments, running pickman review or pickman poll will detect and address them. This continues until the reviewer is satisfied with the changes.

5. Approval and Merge

Once the maintainer is happy with the MR, he or she approves and merges it through GitLab’s normal interface. Pickman detects the merged status on its next step or poll cycle:

$ pickman step us/next --remote ci --target master
Checking for merged MRs...
  MR !123 has been merged - updating database
  Source us/next: abc123 -> def456
Creating next MR...
  Cherry-picking commits from def456...
  MR created: https://gitlab.com/project/-/merge_requests/124

The database is updated to record the new “last processed” commit, and pickman automatically moves on to the next batch of commits.

The Continuous Loop

With pickman poll, this entire cycle runs continuously:

$ pickman poll us/next --remote ci --target master --interval 300
Polling every 300 seconds (Ctrl+C to stop)...

[09:00] Checking for merged MRs... none found
[09:00] Checking for review comments... none found
[09:00] Open MR !123 pending review
[09:00] Sleeping 300 seconds...

[09:05] Checking for merged MRs... none found
[09:05] Checking for review comments... 2 new comments on !123
[09:05] Addressing comments...
[09:05] Pushed updates to MR !123
[09:05] Sleeping 300 seconds...

[09:10] Checking for merged MRs... !123 merged!
[09:10] Updated database: us/next now at def456
[09:10] Creating new MR for next commits...
[09:10] MR !124 created
[09:10] Sleeping 300 seconds...

The maintainer simply reviews each MR as it appears, adds comments when needed, and merges when satisfied. Pickman handles everything else automatically, creating a smooth continuous integration pipeline for cherry-picks. If manual intervention is needed, you can typically just make some edits and push an update to the branch.

Handling Merge Commits

One of pickman’s key features is intelligent handling of merge commits. Rather than cherry-picking merge commits directly (which often fails), pickman identifies the individual commits within a merge and processes them as a group. This ensures that related changes stay together in a single merge request.

The tool follows the first-parent chain to identify merge boundaries, which matches the typical workflow of merging topic branches into the main development branch.

History Tracking

Pickman maintains a .pickman-history file in the repo that records each cherry-pick operation, including:

  • Date and source branch
  • Branch name created
  • List of commits processed
  • The full conversation log with Claude

This provides an audit trail and helps with debugging when things don’t go as expected.

Future Directions

Pickman is an experimental tool. We will use the next few months to refine it and learn how best to envolve it.

    Try It Out

    Pickman is available in the U-Boot Concept tree under tools/pickman/. To run the tests:

    $ ./tools/pickman/pickman test

    We welcome feedback and contributions. If you maintain a branch that requires regular cherry-picks from upstream, give pickman a try and let us know how it works for your workflow.

    Pickman was developed with significant assistance from Claude (Anthropic’s AI assistant) for both the implementation and the cherry-pick automation itself.




    The Best of Both Worlds: Hybrid Python/C Testing in U-Boot

    U-Boot has two testing worlds that rarely meet. Python tests are flexible and can set up complex scenarios – disk images, network configurations, boot environments. C tests are fast, debuggable, and run directly on hardware. What if we could combine them?

    The Problem

    Consider filesystem testing. You need to:

    1. Create a disk image with specific files
    2. Calculate MD5 checksums for verification
    3. Mount it in U-Boot
    4. Run read/write operations
    5. Verify the results

    The Python test framework handles steps 1-2 beautifully. But the actual test logic in Python looks like this:

    output = ubman.run_command(f'{cmd}load host 0:0 {addr} /{filename}')
    assert 'complete' in output
    output = ubman.run_command(f'md5sum {addr} {hex(size)}')
    assert expected_md5 in output

    String parsing. Hoping the output format doesn’t change. No stepping through with a debugger when it fails (well actually it is possible, but it requires gdbserver). And try running this on real hardware without a console connection.

    The Solution: Pass Arguments to C

    What if Python could call a C test with parameters?

    cmd = f'ut -f fs fs_test_load_norun fs_type={fs_type} fs_image={path} md5={expected}'
    ubman.run_command(cmd)

    And in C:

    static int fs_test_load_norun(struct unit_test_state *uts)
    {
        const char *fs_type = ut_str(0);
        const char *fs_image = ut_str(1);
        const char *expected_md5 = ut_str(2);
    
        ut_assertok(fs_set_blk_dev("host", "0", fs_type));
        ut_assertok(fs_read("/testfile", addr, 0, 0, &actread));
        ut_assertok(verify_md5(uts, expected_md5));
    
        return 0;
    }

    Real assertions. Real debugging. Real portability.

    How It Works

    1. Declare Arguments with Types

    UNIT_TEST_ARGS(fs_test_load_norun, UTF_CONSOLE | UTF_MANUAL, fs,
                   { "fs_type", UT_ARG_STR },
                   { "fs_image", UT_ARG_STR },
                   { "md5", UT_ARG_STR });

    The UNIT_TEST_ARGS macro creates the test with argument definitions. Each argument has a name and type (UT_ARG_STR or UT_ARG_INT).

    2. Parse on the Command Line

    => ut -f fs fs_test_load_norun fs_type=ext4 fs_image=/tmp/test.img md5=abc123

    The ut command parses name=value pairs and populates uts->args[].

    3. Access in C

    const char *fs_type = uts->args[0].vstr;    // String access
    int count = uts->args[1].vint;              // Integer access

    Arguments are accessed by index in declaration order.

    A Real Example: Filesystem Tests

    Here’s the before and after for a filesystem size test.

    Before (Pure Python):

    def test_fs3(self, ubman, fs_obj_basic):
        fs_type, fs_img, _ = fs_obj_basic
        ubman.run_command(f'host bind 0 {fs_img}')
        output = ubman.run_command(f'{fs_type}size host 0:0 /{BIG_FILE}')
        ubman.run_command('printenv filesize')
        # Parse output, check values, hope nothing changed...

    After (Hybrid):

    def test_fs3(self, ubman, fs_obj_basic):
        fs_type, fs_img, _ = fs_obj_basic
        assert run_c_test(ubman, fs_type, fs_img, 'fs_test_size_big',
                          big=BIG_FILE)
    static int fs_test_size_big_norun(struct unit_test_state *uts)
    {
        const char *big = ut_str(2);
        loff_t size;
    
        ut_assertok(fs_size(big, &size));
        ut_asserteq_64((loff_t)SZ_1M * 2500, size);
    
        return 0;
    }

    The Python test is now 4 lines. The C test has real assertions and can easily be debugged.

    The Private Buffer

    Tests often need temporary storage – paths, formatted strings, intermediate results. Rather than allocating memory or using globals, each test gets a 256-byte private buffer:

    static int my_test(struct unit_test_state *uts)
    {
        // Build a path using the private buffer
        snprintf(uts->priv, sizeof(uts->priv), "/%s/%s", dir, filename);
    
        ut_assertok(fs_read(uts->priv, addr, 0, 0, &size));
    
        return 0;
    }

    No cleanup needed. The buffer is part of unit_test_state and exists for the life of each test.

    Why Not Just Write Everything in C?

    You could. But consider:

    • Creating a 2.5GB sparse file with specific content: Python’s os and subprocess modules make this trivial
    • Calculating MD5 checksums: One line in Python
    • Setting up complex boot environments: Python’s pytest fixtures handle dependencies elegantly
    • Parameterized tests: pytest’s @pytest.mark.parametrize runs the same test across ext4, FAT, exFAT automatically

    The hybrid approach uses each language for what it does best.

    Why Not Just Write Everything in Python?

    • Debugging: GDB beats print statements
    • Hardware testing: C tests run on real boards (and sandbox) without console parsing
    • Speed: No string-parsing overhead; less back-and-forth across the Python->U-Boot console
    • Assertionsut_asserteq() gives precise failure locations
    • Code coverage: C tests contribute to coverage metrics (once we get them!)

    Getting Started

    1. Declare your test with arguments:

    static int my_test_norun(struct unit_test_state *uts)
    {
        const char *input = ut_str(0);
        int expected = ut_int(1);
    
        // Your test logic here
        ut_asserteq(expected, some_function(input));
    
        return 0;
    }
    UNIT_TEST_ARGS(my_test_norun, UTF_CONSOLE | UTF_MANUAL, my_suite,
                   { "input", UT_ARG_STR },
                   { "expected", UT_ARG_INT });

    2. Call from Python:

    def test_something(self, ubman):
        ubman.run_command(f'ut -f my_suite my_test_norun input={value} expected={result}')

    3. Check the result:

        output = ubman.run_command('echo $?')
        assert output.strip() == '0'

    The Documentation

    Full details are in the documentation. The filesystem tests in test/fs/fs_basic.c and test/py/tests/test_fs/test_basic.py serve as a complete working example.

    This infrastructure was developed to convert U-Boot’s filesystem tests from pure Python to a hybrid model. The Python setup remains, but the test logic now lives in debuggable, portable C code.




    When -858993444 Tests Run: A Tale of Linker Lists and Magic Numbers

    Have you ever seen output like this from your test suite?

    Running -858993444 bloblist tests

    That’s not a buffer overflow or memory corruption. It’s a wierd interaction between linker alignment, compiler optimisations, and pointer arithmetic. Let me tell you how we tracked it down.

    The Mystery

    U-Boot uses ‘linker lists’ extensively – a pattern where the linker collects scattered data structures into contiguous arrays. Drivers, commands, and unit tests all use this mechanism. Each list has start (_1) and end (_3) markers, with entries (_2_*) in between.

    To count entries, we use pointer subtraction:

    #define ll_entry_count(_type, _list) \
        (ll_entry_end(_type, _list) - ll_entry_start(_type, _list))

    Simple, right? The compiler divides the byte span by sizeof(struct) to get the count. Except sometimes it returns garbage.

    The Clue: 0xCCCCCCCC

    That -858993444 value caught my eye. In hex, it’s 0xCCCCCCCC – a suspiciously regular pattern. This isn’t random memory; it’s the result of a calculation.

    GCC optimizes division by constants using multiplicative inverses. Instead of expensive division, it multiplies by a “magic number” and shifts. For dividing by 40 (a typical struct size), GCC generates something like:

    movl    %eax, %edx
    imulq   $-858993459, %rdx    ; magic number for /40
    shrq    $34, %rdx

    This optimization is mathematically correct – but only when the dividend is an exact multiple of the divisor. When it’s not, you get rubbish.

    The Root Cause

    U-Boot’s CONFIG_LINKER_LIST_ALIGN (32 bytes on sandbox) aligns each list’s start. But here’s the subtle bug: the end marker was also being aligned:

    #define ll_entry_end(_type, _list)                    \
    ({                                                    \
        static char end[0] __aligned(32)                   \  // <-- Problem!
            __attribute__((used))                         \
            __section(".u_boot_list_"#_list"_3");         \
        (_type *)&end;                                    \
    })

    When the next list in memory has a higher alignment requirement, the linker inserts padding before our end marker. Our list might have 15 entries of 40 bytes each (600 bytes), but the span from start to end becomes 608 bytes due to padding.

    608 / 40 = 15.2

    Feed 608 into GCC’s magic-number division for 40, and out comes 0xCCCCCCCC.

    The Fix

    Change the end marker alignment to 1:

    static char end[0] __aligned(1)  // No padding before end marker

    Now the end marker sits immediately after the last entry, the span is an exact multiple, and pointer arithmetic works correctly.

    Detection

    We enhanced our check_linker_lists.py script to catch this:

    1. Gap analysis: Compare gaps between consecutive symbols. Inconsistent gaps mean padding was inserted within the list.
    2. Size comparison: Using nm -S to get symbol sizes, we check if gap > size. If so, padding exists.
    3. Span verification: Check if (end - start) is a multiple of struct size. If not, pointer subtraction will fail.
    ./scripts/check_linker_lists.py u-boot -v

    Lessons Learned

    1. Magic number division is fragile: It only works for exact multiples. Any padding breaks it silently.
    2. Zero-size markers inherit alignment from neighbours: The linker places them at aligned boundaries based on what follows.
    3. Pointer arithmetic assumes contiguous arrays: This assumption is violated when padding sneaks in.
    4. Garbage values often have patterns0xCCCCCCCC isn’t random – it’s a clue pointing to failed arithmetic.

    The fix was one line. Finding it took considerably longer, but that’s firmware development for you.

    This bug was found while working on U-Boot’s unit test infrastructure. The fix is in the patch “linker_lists: Fix end-marker alignment to prevent padding”.




    The Silent Saboteurs: Detecting and Resolving malloc() Failures in U-Boot

    The robust operation of any complex software system, especially one as foundational as U-Boot, hinges on the reliability of its core services. Among these, dynamic memory allocation via malloc() is paramount. While often taken for granted, failures in malloc() can be silent saboteurs, leading to unpredictable behaviour, security vulnerabilities, or outright system crashes. Here, we delve into the mechanisms for detecting and resolving these subtle yet critical issues, ensuring the integrity of U-Boot’s memory management.

    The background for this topic is a recent series in Concept, which aims to improve tools and reliability in this area. It builds on the recent update to dlmalloc.

    The Challenge of Dynamic Memory

    In the constrained environment of an embedded bootloader, where resources are often tight and determinism is key, malloc() failures present unique challenges:

    1. Subtlety: A failed malloc() call doesn’t always immediately manifest as a crash. It might return NULL, and if not meticulously checked, subsequent operations on this NULL pointer can lead to memory corruption, double-frees, or use-after-free vulnerabilities much later in execution.
    2. Asynchronicity: With the use of the ‘cyclic’ feature, memory allocation can become a race condition, exacerbating the difficulty of reproducing and debugging issues.
    3. Heap Fragmentation: Long-running systems or complex sequences of allocations and deallocations can lead to heap fragmentation, where sufficient total memory exists, but no contiguous block is large enough for a requested allocation. This is particularly insidious as it’s not a memory exhaustion issue per se, but an allocation-strategy issue.
    4. Debugging Overhead: Traditional heap debugging tools can themselves consume significant memory and execution time, making them impractical for a bootloader.

    Proactive Detection: The New Toolkit 🛠️

    A new series in Concept introduces powerful new instrumentation and commands, moving U-Boot toward best-in-class memory debugging capabilities.

    1. Real-Time Heap Statistics (malloc info)

    The new malloc info command provides a clear, instantaneous snapshot of the heap’s health and usage patterns:

    Statistic Description
    total bytes Total size of the malloc() heap (set by CONFIG_SYS_MALLOC_LEN).
    in use bytes Current memory allocated and held by the application.
    malloc count Total number of calls to malloc().
    free count Total number of calls to free().
    realloc count Total number of calls to realloc().

    This information is helpful for quickly identifying memory leaks (high malloc count with low/stagnant free count) or excessive memory churn (high total counts).

    2. Caller Tracking and Heap Walk (malloc dump)

    When enabled via CONFIG_MCHECK_HEAP_PROTECTION, the malloc dump command becomes the most potent debugging tool:

    • Heap Walk: It systematically walks the entire heap, printing the address, size, and status (used or free) of every memory chunk.
    • Allocation Traceability: For every allocated chunk, the header now stores a condensed backtrace string, showing the function and line number of the code that requested the memory:

      • 19a0e010   a0       log_init:453 <-board_init_r:774 <-sandbox_flow:

    • Post-free() Analysis: This caller information is also preserved in the metadata of freed chunks. This is invaluable for detecting memory leaks, as you can see precisely which function allocated a chunk that is now free, or identifying potential double-free sources. Of course, free blocks can be reused, so this isn’t a panacea.

    3. Heap Protection (mcheck)

    The integration of the mcheck heap-protection feature embeds ‘canary’ data before and after each allocated chunk.

    • Boundary Checking: These canaries are checked on free() and during heap walks. If a canary is corrupted, it instantly signals a buffer overflow or buffer underflow—a classic symptom of heap corruption.
    • Detection: This shifts the memory integrity issue from a mysterious crash hours later to an immediate, localized fault, dramatically speeding up remediation.

    How This Series Helps U-Boot Development

    Note: Some of these features are currently largely available only on sandbox, U-Boot’s development and testing environment. In particular, there is currently no way to obtain line-number information at runtime on other architectures.

    Overall, this series represents a qualitative shift in U-Boot’s memory diagnostics, providing a mechanism for detecting and finding the root cause of subtle memory bugs that were previously nearly impossible to find.

    1. Pinpointing Leaks (Performance & Stability): Before this series, finding a memory leak was a slow process of elimination. Now, a simple malloc dump reveals which functions are responsible for the largest or most persistent allocated chunks, directly mapping resource usage to the source code (log_init:453 or membuf_new:420).
    2. Tracking Heap Corruption (Reliability): Heap corruption is often caused by writing beyond the boundaries of an allocated buffer. With mcheck, this corruption is immediately detected. Furthermore, the malloc dump allows developers to see the call site of the corrupted chunk, leading you straight to the faulty allocation rather than searching half the code base.
    3. Enabling Backtrace for Debugging: The series includes a refactor of the backtrace feature, ensuring that it no longer relies on malloc(). This guarantees that backtraces can be collected safely even when the allocator itself is in a compromised state (e.g., during an mcheck failure or stack smash), providing reliable context for crash reports.

    Early results

    This work has already yielded results. A huge memory leak involving the SCMI was discovered simply by looking at the malloc dump. A watchdog crash with ‘ut dm’ was pinpointed. Also it uncovered the very large number of allocations performed by the Truetype font engine, leading to a simple optimisation to reduce strain on the heap.

    With this series U-Boot establishes the strong base for foundational diagnostics, transforming the challenge of memory debugging in a constrained environment into a manageable, data-driven process.




    An update on mouse support

    Over the last few months (and since the last post) the mouse support in U-Boot Concept has matured quite a bit. The various performance improvements have had a big impact and the UI is now smooth and useable. Here’s a video:

    So what’s next? Let’s look at a few topics.

    Touchpads

    So far touchpads are only supported in the EFI app, assuming that the underlying firmware enables this. On real hardware, such as the Qualcomm x1e laptops, this seems to work OK.

    Support on other platforms requires a driver. The USB mouse driver (CONFIG_USB_MOUSE) might work, or it might not. If the touchpad is attached via I2C or SPI, then a different driver would be needed, perhaps with a special protocol for the device.

    EFI app

    So how do you the mouse or touchpad running on the EFI app? Just make sure that CONFIG_EFI_MOUSE is enabled when you build the U-Boot app, and all should be well.

    Unfortunately the widely used ‘ovmf’ Debian package does not enable the mouse, or at least not properly. This is used for running EFI apps under QEMU. U-Boot’s build-efi script passes the required arguments, but this is not enough.

    If you would like to build a version of EDK2 which supports the mouse / touchpad, it is quite simple. Just change these two files.

    First, in OvmfPkg/Include/Dsc/UsbComponents.dsc.inc add this line, e.g. between UsbKbDxe and UsbMassStorageDxe so that it builds the mouse driver:

      MdeModulePkg/Bus/Usb/UsbMouseDxe/UsbMouseDxe.inf

    Second, in OvmfPkg/OvmfPkgX64.fdf add this line, e.g. between UsbKbDxe and UsbMassStorageDxe so that the driver ends up in the firmware volume:

    INF  MdeModulePkg/Bus/Usb/UsbMouseDxe/UsbMouseDxe.inf

    Clicking in more places

    So far, clicking in a lineedit object just places the cursor at the end of that object. Really it should set the cursor to the clicked position. This will become more important when the test editor is finished, since there may be a lot of text on the screen.

    This is fairly simple to implement and should appear early in the new year.

    Selecting text

    A more ambitious feature would be text selection, where the mouse can be used to select part of a lineedit. This would need to be done as part of a copy/paste feature. It would make the text editor a little more functional, but it is not really core feature. If you are interested in implementing that, send an email to the U-Boot Concept mailing list!




    The RISC OS mouse pointer

    My first ARM machine was an Archimedes way back in about 1987. I received the first unit sold in New Zealand. At some point my machine started running Acorn’s RISC OS. For me, some of the amazing things about RISC OS were anti-aliased outline fonts, a string-sound generator which worked without a data table and a really nice mouse pointer. There is a fairly recent article on The Register which talks about how it came to be.

    Anyway, for U-Boot I’ve chosen to use this same pointer for the mouse. It has good contrast and has a cheerful colour schema. Of course you can change it if you like, but see what you think!




    Some thoughts on Claude

    I’ve been experimenting with AI coding tools, mostly Claude, for the last few months. Here are some thoughts on my experience so far.

    Things I have tried

    So far I have tried using Claude for many different tasks:

    • Solving a coding problem, such as creating a Python script to scan header files
    • Writing tests for existing code
    • Checking patches before I send them
    • Writing drafts for blog posts
    • Figuring out bugs
    • Adjusting code to make tests pass

    First, a caveat. I am a beginner at using this technology and do not spend much time following leaders in this space. I started using Claude based on a family recommendation and am far from being any sort of expert.

    The good

    When I first tried out creating a new command in Claude, it produced code which compiled and looked like U-Boot code. This seemed quite magical to me and i left the computer with a feeling of awe and bewilderment. I have a basic understanding of how AI operates but it is amazing to see it producing code like this. Also the inference is very fast, sometimes producing hundreds of lines of code in a minute or less.

    Claude works within your project directory. It can read and write commits, run tests, etc. For a code base like U-Boot with lots of tests, it is relatively easy to get it to make changes while ensuring that the tests still pass. It can execute most commands that you can. It asks first but you can add rules to a JSON file to give it permission to run things itself.

    In most cases, Claude is able to solve the problem. For example, I wanted to build U-Boot as a library. I actually had some old patches for this, but I decided to give the task to Claude. I provided a lot of guidance on exactly how it should work, but Claude figured out the Makefile rules, iterating until things worked and fixing bugs that I found. I was able to make changes myself, tell Claude and it would build on them.

    The easiest tasks for Claude are to extend an existing feature and the associated tests. The hardest are to refactor code in non-trivial ways.

    One interesting case where Claude does well is dealing with conflicts when rebasing a series or applying patches. I have only tried this a few times but it has not made any mistakes yet. It is fairly slow, though.

    The bad

    Claude is a long way from perfect. It feels to me like a very junior but knowledgeable software engineer. It happily writes 600 lines of code when 300 would do. For larger tasks, the code feels extremely ‘flabby’ and a bit painful to read. Variables names are too long. Useless comments are added to obvious code. It happily creates highly redundant code instead of setting up a helper function first.

    But with all of these problems, you can guide it to perform better. You can review the code and ask it to make changes and it happily does so. It might not be able to conceive of a refactoring that would improve code, but it can execute it.

    For one tasks, Claude was able to create a 1200-line Python script in about an hour (with lots of prompting, etc.) that would have likely taken me 1-2 days (it is hard to be sure, since I didn’t do it). I then spent about 6 hours refacoring and rewriting the script after which I felt that the code wasn’t too bad. The end result was likely not as good as I would have done if I had written it myself from scratch, but it was fine.

    Sometimes Claude gets stuck. It tends to happen when I am trying to get it to do something I’m not to sure about. It sometimes just gets the wrong end of the stick, or solves a slightly different problem. This is most common when first firing it up. After a bit of back and forth it settles in and makes progress

    The ugly

    What is Claude bad at? When running programs it can see the console output, but it cannot interact with the program, or at least I’m not sure how to make it do that. So for example it can run U-Boot sandbox with a timeout and it can even pass it commands to run. But it doesn’t know how to ‘type’ commands interactively.

    Claude can sometimes change the test to fit the code, when the tasks was to fix the code. It tends to write non-deterministic tests, such as asserting that an integer is greater than zero, rather than checking for an exact value.

    Claude can produce commit messages. It tends to be verbose by default but you can ask it to be brief. But I have not had much success in getting Claude to create its own commits after making a whole lot of changes.

    Getting things done can sometimes just be very slow. I have not really compared the time to create high-quality, finished patches myself versus with Claude. That is something I must try, now that I am learning more about it.

    My workflow

    How do I actually use Claude? Well, first I figure out a design for what I want to do, typically a diagram or some notes. Then I ask Claude to perform an initial change to make a start. Then I build on that piece by piece until I have something working. I have not tried giving it a large challenge.

    In some cases I create commits as I go, or even ask Claude to do this, but mostly I just create the whole thing and then manually build the commits afterwards. For code that Claude created in whole or part I add a Co-developed-by tag.

    Breaking things into chunks saves time, I think. Cleaning up hundreds of lines of AI-generated code is very time-consuming and tedious. It is easier to tweak it a bit as I go.

    A note on Gemini

    I have tried Gemini CLI and sadly so far it has not been useful except in a few small areas. It is quite slow and seems to go into the weeds with any sort of moderate challenge. I expect this to change rapidly, so I keep trying it every now and then. I also use Gemini to create most of the featured images.

    I have not tried OpenAI so far.

    What do you think?

    Leave a message in the comments if you have any thoughts or suggestions!




    The pytest / board Integration

    The integration of pytest with real boards (test.py) was written by Stephen Warren of Nvidia, some 9 years ago. It has certainly stood the test of time. The original code has been tweaked for various purposes over the years, but considering the number of tests added in that time, the changes are very small. Here is a diffstat for the changes up until a recent rename:

     test/py/multiplexed_log.css           |  11 +-
     test/py/multiplexed_log.py            | 133 ++++++++++---
     test/py/test.py                       |  31 ++--
     test/py/u_boot_console_base.py        | 341 ++++++++++++++++++++++++++++------
     test/py/u_boot_console_exec_attach.py |  40 ++--
     test/py/u_boot_console_sandbox.py     |  54 ++++--
     test/py/u_boot_spawn.py               | 212 ++++++++++++++++++---
     test/py/u_boot_utils.py               | 197 ++++++++++++++++++--
     8 files changed, 848 insertions(+), 171 deletions(-)
    

    When Stephen wrote the code, there was no Gitlab system in U-Boot (it used Travis). Tom Rini added Gitlab in 2019: test.py mostly just worked in that environment. One of the reasons the code has proven so stable is that it deals with boards at the console level, simply relying on shell-script hooks to talk start up and communicate with boards. These scripts can be made to do a lot of different things, such as powering boards on and off, sending U-Boot over USB, etc.

    But perhaps it might be time to make a few changes. Let me give a bit of background first.

    In 2020 I decided to try to get my collection of boards into some sort of lab. Picking out a board to manually test with it was quite annoying. I wrote Labman, a Python program which created various files based on a yaml description of the lab. Labman generates udev rules and an /etc/fstab file. It also creates small Python programs which know how to build U-Boot and write it to a board, including dealing with the reset/recovery sequences, SD-wire, etc. With all that in place, Tbot provides a way to get an interactive session on a board. It also provides a way to run U-Boot tests.

    Early last year I decided to take another look at this. The best things about Labman were its unified lab description (including understanding how many ports each USB hub has and the address of each) and a ‘labman check’ option which quickly pointed to connection problems. The bad thing about Labman was…well, everything else. It was annoying to re-run the scripts and restart udev after each lab change. The Python code-generation was a strange way of dealing with the board-specific logic.

    Tom Rini suggested looking at Labgrid. After a bit of investigation, it looked good to me. The specification of hubs is somewhat primitive and the split between the exporter and the environment is confusing. But the structure of it (coordinator, exporters and clients) is much better than Labman. The approach to connecting to boards (ssh) is better as well, since it starts ser2net automatically. Labman is a thin layer of code over some existing services. Labman is much better designed.

    So overall I was pretty enthusiastic and set to work on creating an integration for U-Boot. So I can again build U-Boot, write it to a board and start it up with a simple command:

    ellesmere:~/u$ ub-int rock5b
    Building U-Boot in sourcedir for rock5b-rk3588
    Bootstrapping U-Boot from dir /tmp/b/rock5b-rk3588
    Writing U-Boot using method rockchip
    DDR 9fa84341ce typ 24/09/06-09:51:11,fwver: v1.18
    
    <...much unfortunate spam from secret binaries here...>
    
    U-Boot Concept 2025.01-rc3-01976-g290829cc0d20 (Jul 20 2025 - 20:10:36 -0600)
    
    Model: Radxa ROCK 5B
    SoC:   RK3588
    DRAM:  4 GiB
    Core:  362 devices, 34 uclasses, devicetree: separate
    MMC:   mmc@fe2c0000: 1, mmc@fe2d0000: 2, mmc@fe2e0000: 0
    Loading Environment from nowhere... OK
    In:    serial@feb50000
    Out:   serial@feb50000
    Err:   serial@feb50000
    Model: Radxa ROCK 5B
    SoC:   RK3588
    Net:   No ethernet found.
    Hit any key to stop autoboot:  0 
    => 
    

    I’ve also used this integration to make my lab accessible to gitlab, so that any branch or pull-request can be tested on the lab, to make sure it has not broken U-Boot.

    So, back to the topic. The Labgrid integration supports test.py and it works fine. A minor improvement is ‘lab mode’, where Labgrid handles getting U-Boot to a prompt, making it work with boards like the Beagleplay, which has a special autoboot message.

    But the test.py interface is (at last) showing its age. It’s only real interface to Labgrid is via the u-boot-test-console script, which just runs the Labgrid client. Some tests restart the board, perhaps because they boot and OS or do something destructive to the running U-Boot. This results in U-Boot being built again, flashed to the board again and started again. When something breaks, it could be a lab failure or a test failure, but all we can do is show the output and let the user figure it out. The current lab works remarkably well given its fairly basic setup, but it is certainly not reliable. Sometimes a board will fail a test, but trying it again will pass, for example.

    So I am thinking that it might make sense to integrate test.py and Labgrid a little more closely. Both are written in Python, so test.py could import some Labgrid modules, get the required target, start up the console and then let the tests run. If a test wants to restart, a function can do this in the most efficient and reliable way possible.

    This might be more efficient and it might also provide better error messages. We would then not need the hook functions for the Labgrid case.




    New U-Boot CI Lab Page

    U-Boot has a new continuous integration (CI) lab page that provides a real-time look at the status of various development boards. The page, located at https://lab.u-boot.org/, offers a simple and clean interface that allows developers and curious people to quickly check on the health and activity of each board in the lab.

    When you first visit the page, you’ll see a grid of all the available boards. Each board’s card displays its name and current status, making it easy to see which boards are online and which are not. A single click on any board will show a console view, taken from the last health check. This allows you see why boards are failing, for example.

    This new lab page is a nice resource for the U-Boot community. It provides a transparent and accessible way to monitor this part of the CI system.

    Check it out and get in touch if you have any suggestions or feedback! 🧪




    Streamlining Emulation in U-Boot: A Kconfig Cleanup 🧹

    In the world of software development, consistency is key. A recent update to U-Boot Concept takes a solid step in that direction by restructuring how it handles emulation targets. This change makes life easier for developers working across different processor architectures.

    Previously there were inconsistencies in the configuration system (Kconfig). For example, enabling QEMU emulation for ARM systems used the ARCH_QEMU symbol, while x86 systems used VENDOR_EMULATION for a similar purpose. This could create confusion and added complexity when managing board configurations.

    To resolve this, a new, architecture-neutral symbol, MACH_QEMU, has been introduced. This single, unified option replaces the separate symbols for both ARM and x86 emulation targets.

    This small but important merge tidies up the codebase, creating a more consistent and intuitive developer experience. It also sets the stage for future work, with the potential to extend this unified approach to other architectures. It’s a great example of the continuous effort to keep U-Boot clean, efficient, and easy to maintain for everyone involved.