image_pdfimage_print

Introduction

U-Boot on x86_64 has traditionally relied on a Secondary Program Loader (SPL) to bootstrap into 64-bit mode. SPL starts in 16-bit real mode (as required by the x86 reset vector), transitions through 32-bit protected mode, sets up page tables, and finally jumps into the 64-bit U-Boot proper.

A recent series adds support for running U-Boot directly from ROM on x86_64 without SPL, using QEMU as the development platform. While this is a simpler configuration, getting it to work required solving several interesting architectural challenges at the intersection of x86 hardware, compiler conventions, and virtual-machine emulation.

Why Skip SPL?

The SPL adds complexity and boot time. On platforms like QEMU where the firmware image runs directly from a flat ROM, the extra SPL stage is unnecessary. A single-binary U-Boot that transitions from 16-bit to 64-bit mode internally is simpler to build, debug, and deploy.

This also serves as a foundation for future work on other x86_64 platforms that may not need SPL.

The Challenges

Read-only .data in ROM

When running from ROM (SPI flash), the .data section is in read-only flash memory. The existing x86_64 code stored the global data (gd) pointer in a global variable, which lives in .data. Writing to it before relocation to RAM would fault.

The solution was to use MSR_FS_BASE to hold the gd pointer address, mirroring how 32-bit x86 uses the FS segment descriptor base. This is a CPU register, so it works regardless of whether .data is writable:

    static inline void set_gd(volatile gd_t *gd_ptr)
    {
        gd_t *p = (gd_t *)gd_ptr;

        p->arch.gd_addr = p;
        asm volatile("wrmsr" : :
            "c" (MSR_FS_BASE),
            "a" ((unsigned int)(unsigned long)&p->arch.gd_addr),
            "d" ((unsigned int)((unsigned long)&p->arch.gd_addr >> 32))
            : "memory");
    }

This seemingly simple change had a knock-on effect: EFI runtime services that accesses gd->relocaddr crashes when the Linux kernel called SetVirtualAddressMap(), because the kernel repurposes the FS register for its own per-CPU data. The fix was to cache gd->relocaddr in an __efi_runtime_data variable during U-Boot initialisation, before the kernel takes over.

Mixing 32-bit and 64-bit Code

The x86 reset vector runs in 16-bit real mode. U-Boot’s existing 16-bit startup code (start16.S, resetvec.S) is designed for 32-bit builds. On x86_64, the main binary is compiled as position-independent 64-bit code (PIE), which is fundamentally incompatible with 16-bit/32-bit startup code.

The solution was to compile the 16-bit startup objects as 32-bit code and link them into a separate 32-bit ELF, then extract just the raw binary from it. The build system includes this binary in the final ROM image at the correct reset vector address. This required several build-system changes:

  • Moving 16-bit binary rules from the top-level Makefile to arch/x86/Makefile
  • Compiling startup objects with explicit -m32 flags on x86_64
  • Linking them into a separate 32-bit ELF (u-boot-x86-start16.elf) distinct from the 64-bit main binary

32-bit to 64-bit Transition

A new assembly file (start_from_32.S) handles the transition from the 32-bit startup environment to 64-bit long mode:

  1. Build identity-mapping page tables in RAM (1 GiB pages for simplicity)
  2. Enable PAE (CR4.PAE) and load the page table base (CR3)
  3. Set the long-mode-enable bit in MSR_EFER
  4. Enable paging (CR0.PG), which activates long mode
  5. Load a 64-bit GDT and perform a far jump to the 64-bit entry point

Why not write this in C? Well, the page tables must be created before enabling 64-bit long mode, but the C code is compiled as 64-bit and cannot execute until long mode is active. Since the setup is just a few store-loops filling PML4 and PDPT entries, assembly is simpler than compiling and linking a separate 32-bit C function just for the page tables.

One subtle requirement emerged during testing with KVM: the GDT used for the mode transition must be in RAM, not ROM. The CPU performs an implicit data read from the GDT during the far jump to load the 64-bit code-segment descriptor. While normal instruction fetches from ROM work fine, KVM cannot service this implicit GDT read from the ROM region (an EPT mapping limitation?). The symptom is a silent hang at the far-jump instruction with no exception or output. The fix is to copy the GDT from ROM to RAM with rep movsl before loading it with lgdt.

SSE: A Hidden Requirement

x86_64 GCC assumes SSE2 is always available (it is part of the x86_64 baseline) and freely generates SSE instructions such as movq %xmm0. If the SSE control bits are not set in the CPU control registers, these instructions cause an invalid-opcode exception (#UD), manifesting as a triple fault and boot loop after relocation.

The startup code must set CR4.OSFXSR and clear CR0.EM before any compiler-generated code runs.

Regparm and Calling Conventions

The x86 32-bit builds use -mregparm=3 to pass function arguments in registers rather than on the stack, improving performance and code size. However, this is a 32-bit-only GCC option and is incompatible with x86_64 (which already uses registers by default per the System V AMD64 ABI). A new Kconfig option (X86_NO_REGPARM) allows disabling this for 32-bit builds. The provides better interoperability with Rust, for example.

The Result

The new qemu-x86_64_nospl board is a single U-Boot binary that boots directly from the QEMU ROM, transitions to 64-bit mode, and can launch an operating system directly or via EFI. It is tested in CI alongside the existing SPL-based configurations. CI tests confirm that it can boot Linux correctly.

To try it with the build-qemu script:

./scripts/build-qemu -a x86 -rsX       # TCG (software emulation)
./scripts/build-qemu -a x86 -rsXk      # KVM (hardware virtualisation) 

Series Overview

The series consists of 12 patches:

  1. Allow disabling regparm — adds X86_NO_REGPARM Kconfig option for x86_64 compatibility
  2. MSR_FS_BASE for gd pointer — eliminates the writable .data dependency for the global-data pointer on x86_64
  3. Cache gd->relocaddr for EFI — fixes the EFI runtime crash caused by the MSR_FS_BASE change
  4. Build-system changes — restructure 16-bit startup code compilation to support mixed 32/64-bit linking
  5. (continued)
  6. (continued)
  7. (continued)
  8. 32-to-64-bit startup code — the assembly that transitions from 32-bit protected mode to 64-bit long mode
  9. New defconfigqemu-x86_64_nospl board configuration
  10. MTRR setup — enable memory-type range register configuration for the no-SPL path
  11. build-qemu integration — add –no-spl (-X) option to the QEMU helper script
  12. CI coverage — add the new board to the continuous-integration test matrix

Debugging Tips

Debugging early x86 boot code in a virtual machine has its own set of tricks:

  • QEMU exception logging: qemu-system-x86_64 -d int -D /tmp/log.log logs all CPU exceptions. Search for v=06 (#UD), v=0e (#PF), v=0d (#GP).
  • Instruction tracing: add -d in_asm to trace executed instructions. Useful for finding where the CPU diverges from the expected path.
  • KVM limitation: -d int does not work with KVM. Use the QEMU monitor (-monitor telnet:localhost:4444,server,nowait) or serial-port output (outb to 0x3f8) instead.
  • Identifying SSE faults: look for v=06 (invalid opcode) in the exception log, then decode the instruction bytes (e.g. f3 0f 7e is the SSE movq instruction).

Author

  • Simon Glass is a primary author of U-Boot, with around 10K commits. He is maintainer of driver model and various other subsystems in U-Boot.

Leave a Reply

Your email address will not be published. Required fields are marked *