Copyright popovicu

If you carefully read the Linux kernel docs, you will find an interesting statement: Linux has also been ported to itself. You can now run the kernel as a userspace application - this is called UserMode Linux (UML). Today, we’ll explore how you can start an unconventional VM by running a Linux kernel as a process within the Linux kernel itself. This approach doesn’t require installing virtualization software like QEMU, nor does it need root privileges, which opens up some intriguing possibilities. Table of contents Kernel’s Hardware Abstraction A fundamental responsibility of the kernel is to abstract hardware and offer a consistent interface to userspace. This includes managing shared resources like the CPU and memory for multiple tasks. The kernel determines the underlying hardware (e.g., through a device tree on some platforms, which lists system components) and connects the appropriate drivers. This hardware can also be entirely virtual. In a QEMU virtual machine, for instance, resources like memory and attached disks are virtualized by the QEMU userspace application, incurring a certain performance overhead. The CPU presents an interesting case, as it too can be virtualized in userspace, particularly when emulating a different architecture. A fascinating aspect of drivers for virtualized hardware is that they can be enlightened — or, more formally, paravirtualized. This means the drivers are aware they’re running on virtualized hardware and can leverage this by communicating with the hardware in specialized ways. While the specifics are complex, one can imagine drivers interacting with virtual hardware in ways not feasible with physical counterparts. Online sources suggest that paravirtualization can achieve performance levels close to those of physical devices using traditional drivers. UML - Kernel in a Userspace Process Personally, I view UML as a paravirtualized kernel configuration. Instead of running directly on bare metal, the UML kernel operates atop an existing kernel instance, leveraging some of its userspace functionalities. For instance, rather than linking the console driver to a physical UART, it can utilize standard userspace input/output. Similarly, a block device driver can target a file on the host’s filesystem instead of a physical disk. In this setup, UML is essentially a userspace process that cleverly employs concepts like files and sockets to launch a new Linux kernel instance capable of running its own processes. The exact mapping of these processes to the host — specifically, how the CPU is virtualized — is something I’m not entirely clear on, and I’d welcome insights in the comments. One could envision an implementation where guest threads and processes map to host counterparts but with restricted system visibility, akin to containers, yet still operating within a nested Linux kernel. This page from the kernel’s documentation has a pretty good illustration of what this looks like: I highly recommend checking out that page for more detailed documentation, particularly for the compelling reasons listed for its usefulness. The final point is especially appealing: It’s extremely fun. And that’s precisely why we’re diving into it today! Building a UML Kernel First things first: it’s crucial to understand that a UML kernel can run only on x86 platforms. You can layer an x86 UML kernel on top of an existing x86 kernel; as far as I know, no other configurations are supported. Next, we’ll build the UML binary. The configuration process starts with: You can configure the kernel much like you normally would. You’ll immediately notice several UML-specific options on the initial configuration page. I tend to think of these as “enlightened” drivers, designed to use the host’s userspace facilities as virtual hardware. For this demonstration, I specifically enabled the BLK_DEV_UBD option. The documentation explains: The User-Mode Linux port includes a driver called UBD which will let you access arbitrary files on the host computer as block devices. Unless you know that you do not need such virtual block devices, say Y here. This option wasn’t enabled by default (which surprised me a bit), so I recommend setting it to Y. Once you’ve finalized your configuration, building is straightforward: And this produces a linux binary right there! Interestingly, it’s dynamically linked to the C standard library: Building Userspace To do anything meaningful within our nested kernel, we need a userspace. For simplicity, I chose to download the latest Buildroot and build it for x86/64. If you’re feeling adventurous and want to try building a minimal userspace from scratch but aren’t sure where to begin, pairing this with the micro Linux distro exercise could be a lot of fun. Running the Nested Kernel To make things interesting, I decided to provide a block device to the nested kernel, write some data to it, and then verify that data from the host system. First, let’s create the disk image: Next, we’ll format it with ext4: Now, it’s time to fire up the kernel in userspace. I’ll use the Buildroot image (an ext2 file provided by Buildroot) as the root filesystem: And just like that, we’re greeted by a very familiar kernel boot sequence! and at the end, we have the Buildroot login: The boot process was surprisingly quick. Now, let’s create a mountpoint for our disk within the UML instance: Then, we mount the second UBD device (ubdb) to this mountpoint: With the disk mounted, we can write a test file: I can now shut down the UML VM: which gives On my host system: This little experiment confirms that we successfully ran a VM using UML, wrote data to a block device within it, and those changes persisted, accessible from the host system. Throughout this article, I’ve referred to UML as a VM, and you’d be right to raise an eyebrow. On one hand, it embodies the idea of hardware virtualization via host userspace facilities, and the environment gets its own distinct kernel. On the other hand, this guest kernel is intrinsically linked to the host’s kernel. While it aims for isolation, it doesn’t achieve the same level you’d expect from a QEMU VM powered by KVM. What’s the real-world utility here? Is UML suitable for running isolated workloads? My educated guess is: probably not for most production scenarios. I believe UML’s primary strength lies in kernel debugging, rather than serving as a full-fledged, production-ready virtualization stack. For robust VM needs, KVM virtualization (operating at a different architectural layer) is far more battle-tested. Of course, containers offer another alternative if sharing the host kernel is acceptable for your workloads. UML carves out an interesting niche between these two: offering a separate kernel instance while still maintaining a unique connection to the host kernel. It’s a fascinating concept. Perhaps in the future, this intriguing technology will garner more attention and see wider adoption. For now, though, it’s a fantastic tool for experimentation and, at the very least, a lot of fun to play with! Happy hacking! For updates, please consider following me on Twitter/X and LinkedIn.