Add juliac binary server for AOT-compiled LU factorization#107
Add juliac binary server for AOT-compiled LU factorization#107ChrisRackauckas-Claude wants to merge 2 commits intoJuliaLinearAlgebra:masterfrom
Conversation
Adds support for using a juliac-compiled (Julia 1.12+) standalone executable to perform LU factorization via a subprocess communicating over pipes. This enables using RecursiveFactorization without loading the full Julia package and its dependency tree at runtime. Key components: - juliac/shim_exe.jl: Executable server using raw fd I/O (libc read/write) with a binary protocol for Float64/Float32 LU factorization - juliac/shim.jl: @ccallable shared library entry points (for C consumers) - src/juliac_server.jl: Server process management, pipe communication, build_binary() function, and juliac_lu!/juliac_lu API - Auto-starts server in __init__ if binary exists, graceful shutdown on exit The juliac binary trims to ~5MB with only 2 verifier warnings (HostCPUFeatures). A key finding is that juliac shared libraries cannot be called from within a Julia process (jl_adopt_thread conflicts with existing TLS), so the subprocess approach via executable + pipes is used instead. Usage: RecursiveFactorization.build_binary() # one-time, requires Julia 1.12+ F = RecursiveFactorization.juliac_lu(A) # uses server if available Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detailed investigation notes: failed approaches and findingsThis PR is the result of extensive experimentation with juliac on Julia 1.12. Here's a complete record of what was tried, what failed, and why. Approach 1: Shared library with
|
| Approach | Build | From C | From Julia | Reason for failure |
|---|---|---|---|---|
| Shared lib + direct ccall | ✅ 5MB | ✅ | ❌ crash | jl_adopt_thread TLS conflict |
| Shared lib + pthread bridge | ✅ | ✅ | ❌ segfault | Symbol conflict (jl_get_abi_converter) |
Shared lib + dlmopen |
✅ | ✅ | ❌ segfault | Can't init Julia runtime in isolated namespace |
| Shared lib no-trim (307MB) | ✅ | ✅ | ❌ crash | Same TLS conflict (not a trimming issue) |
Executable + Core.stdin |
❌ | — | — | Core.stdin stripped in trimmed binary |
| Executable + raw fd I/O | ✅ 5MB | — | ✅ | Works after set_blocking! fix |
Implications for juliac ecosystem
-
juliac shared libraries are currently unusable from Julia processes. The
jl_adopt_threadmechanism makes it impossible to call@ccallablefunctions from any thread that already has Julia TLS. This is a fundamental limitation, not a bug in user code. -
juliac executables work well as subprocess servers, but require raw libc I/O (not
Core.stdin) and explicitfcntlto reset non-blocking fds. -
The shared library works perfectly from C/C++ programs that don't have their own Julia runtime — this is a valid use case for non-Julia consumers.
- Replace raw juliac.jl shell-out with JuliaC.jl ImageRecipe/LinkRecipe API - Add optional bundle=true kwarg for relocatable binaries - Fix compiled binary crash: __init__ was trying to start a subprocess server inside the compiled binary itself (which IS the server), causing setup_stdio MethodError. Fixed by setting RECFACT_SERVER env var in shim_exe.jl and checking it in _init_juliac_server. - Wrap _start_server in try-catch for robustness in edge cases All 3165 tests pass (3120 LU + 21 butterfly + 24 juliac server). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update: JuliaC.jl build pipeline + init crash fixChanges in 0086b531. Build pipeline now uses JuliaC.jl instead of shelling out to raw img = JuliaC.ImageRecipe(; output_type="--output-exe", trim_mode="unsafe-warn", file=shim_src, project=build_dir)
JuliaC.compile_products(img)
link = JuliaC.LinkRecipe(; image_recipe=img, outname=binary_path)
JuliaC.link_products(link)Also added optional 2. Fixed a crash in the compiled binary. The previous binary would crash on startup with: Root cause: Verified locally
|
MWE: juliac
|
Summary
__init__dlopen, harmless)juliac_lu!/juliac_luAPI that auto-starts the server on package init if a prebuilt binary is found, with graceful fallback to pure-Julia codeArchitecture
The juliac shared library approach (
@ccallable+dlopen) was investigated first but cannot work from within a Julia process — thejl_adopt_threadmechanism in juliac@ccallableprologues fatally conflicts with existing Julia TLS (jl_init_threadtlsabort). Multiple workarounds were attempted (raw pthread bridge,dlmopennamespace isolation) — all crash at the Julia runtime level.The working solution uses a subprocess executable communicating via stdin/stdout pipes with a binary protocol:
cmd(UInt8) + m(Int64) + n(Int64) + A(T[m*n])info(Int64) + A(T[m*n]) + ipiv(Int64[min(m,n)])0x00=f64,0x01=f32,0x02=f64_threaded,0x03=f32_threaded,0x04=f64_nopiv,0x05=f32_nopiv,0xff=exitA key discovery: juliac's libuv runtime sets stdin/stdout to non-blocking mode, which breaks raw
read()syscalls. Fixed by callingfcntl(fd, F_SETFL, flags & ~O_NONBLOCK)at startup.New files
juliac/shim_exe.jljuliac/shim.jl@ccallableshared library entry points (for C consumers)juliac/Project.tomlsrc/juliac_server.jlbuild_binary(),juliac_lu!/juliac_luUsage
Test plan
🤖 Generated with Claude Code