Closing the Loop: Pickman Now Fixes Its Own CI Failures

Pickman automates cherry-picking commits from upstream U-Boot into a downstream branch, packaging each batch as a GitLab merge request. Until now, when a CI pipeline failed on one of those MRs — a build error caused by a missing context change, a renamed symbol — a human had to read the logs, find the problem, amend the commit and push a fix. This can create quite a bit of overhead to the process.

The latest round of improvements teaches pickman to diagnose and fix pipeline failures automatically.

How it works

During each step or poll cycle, after processing review comments, pickman inspects every open MR for a failed head pipeline. For each failure it has not already seen it:

  1. Checks whether a rebase is needed first. This feature has been in place for a while. A stale base branch is the most common cause of spurious failures. If GitLab reports that the MR is behind the target, pickman rebases the branch and pushes, which triggers a fresh pipeline. No agent is needed for this case.
  2. Fetches the failed-job logs from GitLab, keeping the last 200 lines of each job trace.
  3. Sends the logs to a Claude agent with a prompt that includes the MR description (for context from prior work), instructions to identify the responsible commit, amend it via interactive rebase, and verify the fix with a sandbox build and buildman.
  4. Pushes the result, posts an MR comment summarising what was fixed, appends the full agent conversation to the MR description, and records the fix in .pickman-history

Each attempt is tracked per pipeline ID in a database table, so the same failure is never reprocessed. A new pipeline triggered by a rebase or comment fix is treated independently.

Retry limits

The --fix-retries / -F flag (default 3) caps how many times pickman will attempt to fix a given MR. When the limit is reached, it posts a comment requesting manual intervention and stops trying. Setting --fix-retries 0 disables the feature entirely.

Example session

$ pickman poll us/next -r ci -m 10 -F 3
...
MR !42: pipeline 12345 failed, attempting fix (attempt 1/3)...
Starting Claude agent to fix 1 failed pipeline job(s) (attempt 1)...
  ...agent diagnoses missing #include, amends commit, builds sandbox...
MR !42: pipeline fix pushed (attempt 1)
...

Behind the scenes the agent checks out the branch, correlates the failing file with the commit that touched it, uses uman‘s rf N to start an interactive rebase, amends the responsible commit, verifies with um build sandbox and buildman, then leaves the result on a local branch for pickman to push.

Supporting changes

Several other improvements landed alongside the pipeline-fix feature:

  • Subtree-merge handling: pickman now recognises dts/upstream subtree merges and applies them directly to the target branch instead of creating an MR. Subtree updates are processed even when the maximum number of MRs is reached, since they do not create new MRs.
  • Codebase refactoring: large functions like do_rewind()do_next_merges()decompose_mega_merge() and handle_already_applied() have been split into smaller helpers. The agent-message-streaming loop was extracted into run_agent_collect() to remove duplication across the three agent entry points. File I/O now uses the u_boot_pylib tools helpers consistently.
  • Test coverage: new tests cover prepare_apply()run_agent_collect(), the pipeline-fix database table, the pipeline-fixing control flow, and the null-byte stripping in job logs.

Try it

The pipeline-fix feature is available via pickman step and pickman poll. Pass -F 0 to keep the old behaviour, or let the default of 3 retries take care of things automatically.