-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Summary
Evaluate and implement vfkit as the macOS hypervisor backend, replacing Cloud Hypervisor (CH) which is Linux-only. vfkit wraps Apple's Virtualization.framework and is used in production by Podman 5.0+, minikube 1.35+, and CRC.
Feasibility Analysis
Feature Mapping: CH → vfkit
Boot Methods — Fully Mappable
--kernel+--initramfs+--cmdline→--bootloader linux,kernel=...,initrd=...,cmdline=...(direct mapping)--firmware CLOUDHV.fd→--bootloader efi,variable-store=...,create(uses Apple built-in EFI)- macOS guest:
--bootloader macos,machineIdentifierPath=...,hardwareModelPath=...,auxImagePath=...(Apple Silicon only, requires IPSW restore image)
Note: On Apple Silicon, vfkit linux boot requires an uncompressed kernel (gzip/lz4 vmlinuz won't work). EFI boot has no such restriction.
REST API — Sufficient Coverage
PUT /api/v1/vm.shutdown→POST /vm/state {"state":"Stop"}PUT /api/v1/vm.power-button→POST /vm/state {"state":"Stop"}(equivalent ACPI)GET /api/v1/vm.info(query PTY path) →GET /vm/inspect--api-socket /path→--restful-uri unix:///path
CPU / Memory — Mostly Mappable
--cpus boot=N,max=M→--cpus N(no max_vcpus, no CPU hotplug)--memory size=BYTES,hugepages=on→--memory Nin MiB (no hugepages on macOS, unit conversion needed)--balloon size=...,deflate_on_oom=...→--device virtio-balloon(exists but no fine-grained params)--watchdog→ N/A (can skip)--rng src=/dev/urandom→--device virtio-rng(direct mapping)
Console — Needs Adjustment
--console pty(OCI boot) →--device virtio-serial,pty(direct mapping, PTY path via REST API)--serial socket=console.sock(UEFI boot) → No socket mode (incompatible, must use PTY instead)
Blocking Issues
1. Storage — No qcow2 Support (Severe)
vfkit virtio-blk only supports raw format. No qcow2.
cocoonv2 cloudimg path currently depends on:
qemu-img create -f qcow2 -b base.qcow2 overlay.qcow2
Recommended: APFS clonefile
macOS APFS provides native block-level COW clone — instant, zero extra space. Equivalent to cp -c base.raw overlay.raw. Behaves similarly to qcow2 overlays but handled at the filesystem layer. Requires base images in raw format.
OCI boot path (raw COW + EROFS layers) is unaffected. CH disk serial maps to vfkit deviceId.
2. Networking — Completely Different Architecture (Severe)
macOS has no network namespaces, tap devices, or CNI. The entire network stack is inapplicable.
- netns + CNI + bridge + IPAM → vmnet-helper (shared/bridged/host modes)
- tap + TC redirect → vfkit
--device virtio-net,fd=Nvia socketpair from vmnet-helper - Fixed IP (CNI host-local) → vmnet DHCP or
--start-address/--end-addressrange control
Recommended: vmnet-helper
Supports shared (NAT) / bridged / host modes. Requires root on macOS 15 and below; unrestricted on macOS 26+. 10x faster than socket_vmnet.
Simplest path (Phase 1): --device virtio-net,nat — zero config but no fixed IP.
3. Disk I/O Tuning Unavailable (Low Impact)
vfkit does not expose num_queues, queue_size, direct, sparse, or network offload params. All controlled internally by Virtualization.framework. Acceptable for macOS dev/test scenarios.
macOS Guest VM Support
vfkit supports macOS guest VMs via --bootloader macos (Apple Silicon only, macOS 12+).
Setup Flow
- Download IPSW restore image (via
VZMacOSRestoreImage.latestSupportedAPI or manually) - Create blank raw disk image
- Install macOS from IPSW into disk (generates
MachineIdentifier,HardwareModel,AuxiliaryStorage) - Boot with
--bootloader macos,machineIdentifierPath=...,hardwareModelPath=...,auxImagePath=...
Headless Operation
Virtualization.framework requires a VZMacGraphicsDeviceConfiguration for macOS guests — this is a hard framework-level constraint. However, headless operation is achievable by creating the graphics device but not rendering any host window (Tart does this via NSApplication.setActivationPolicy(.prohibited)). The VM runs normally; access via SSH or VNC.
Concurrency Limits
Hard limit: 2 macOS VMs simultaneously, enforced at XNU kernel level (hv_apple_isa_vm_quota counter). This is NOT a Virtualization.framework limitation — it's in the kernel's hypervisor trap handler.
- Linux VMs are not subject to this limit (unlimited)
- The macOS EULA also permits only 2 additional VM instances per physical Mac
- Workaround: Apple KDK development kernel +
hv_apple_isa_vm_quota=0xFF(up to 255 VMs, but breaks system updates, requires SIP disabled — not practical for production) - Scaling beyond 2: requires multiple physical Macs with orchestration (e.g., Cirrus Labs' Orchard)
macOS Guest Capabilities
Works: Metal GPU (paravirtualized, compute perf = native), general macOS apps, networking, storage, iCloud (macOS 15+)
Does NOT work: App Store apps, FairPlay DRM, nested virtualization, Touch ID
GPU Access in VMs
Three approaches with very different capabilities:
macOS Guest — Metal Paravirtualized GPU
VZMacGraphicsDeviceConfiguration exposes a paravirtualized Metal GPU to the guest.
- GPU compute performance: identical to native (100% active residency, same frequency/power)
- Graphics rendering: works well (80-84% GPU utilization)
- CoreML / MLX: theoretically functional (both use Metal compute)
- Limitation: virtual GPU presents as unrecognized device — some apps doing hardware checks may refuse
Linux Guest (AVF) — No GPU Acceleration
VZVirtioGraphicsDeviceConfiguration is 2D framebuffer only. CPU renders, host displays. No 3D, no compute.
Linux Guest (libkrun/krunkit) — Vulkan via Venus + MoltenVK
Completely separate stack (Red Hat, uses Hypervisor.framework not Virtualization.framework):
Guest: App → Vulkan → Mesa Venus driver → virtio-gpu shared memory
Host: virglrenderer → MoltenVK → Metal → Apple GPU
- Requires macOS 14+, Apple Silicon
- llama.cpp ggml-vulkan: 77% of native Metal (20.84 vs 27 tokens/sec)
- Newer ggml-remoting (Sep 2025): bypasses Vulkan, forwards tensor ops directly to host ggml-metal — 95-100% native speed
- No GPU passthrough exists on Apple Silicon (unified memory SoC)
Code Architecture
hypervisor/
├── hypervisor.go # Interface (unchanged)
├── cloudhypervisor/ # Linux backend (unchanged)
└── vfkit/ # macOS backend (new)
├── vfkit.go
├── conf.go # CLI arg builder
├── start.go
├── stop.go
├── create.go # APFS clone instead of qcow2
└── helper.go # REST API client
network/
├── network.go # Interface (unchanged)
├── cni/ # Linux (unchanged)
└── vmnet/ # macOS (new, vmnet-helper based)
Bonus: vfkit is written in Go and provides github.com/crc-org/vfkit/pkg/config — can be used as a Go library instead of exec.
Implementation Plan
| Phase | Scope | Estimate |
|---|---|---|
| Phase 1 | EFI boot + NAT networking + APFS clone storage | 1–2 weeks |
| Phase 2 | vmnet-helper networking (fixed IP, external reachability) | 1 week |
| Phase 3 | OCI direct boot (uncompressed kernel) | 3–5 days |
| Phase 4 | macOS guest support (IPSW install, headless, Metal GPU) | 1 week |
| Phase 5 | Edge cases + testing | 1 week |
Phase 1 delivers a working vm create + vm start + vm stop + vm console on macOS with cloud images.