Skip to content

Add juliac binary server for AOT-compiled LU factorization#107

Open
ChrisRackauckas-Claude wants to merge 2 commits intoJuliaLinearAlgebra:masterfrom
ChrisRackauckas-Claude:juliac-binary-server
Open

Add juliac binary server for AOT-compiled LU factorization#107
ChrisRackauckas-Claude wants to merge 2 commits intoJuliaLinearAlgebra:masterfrom
ChrisRackauckas-Claude:juliac-binary-server

Conversation

@ChrisRackauckas-Claude
Copy link

Summary

  • Adds a juliac-compiled standalone executable server that performs LU factorization via subprocess pipes, enabling use of RecursiveFactorization without loading the full Julia package dependency tree at runtime
  • The juliac binary trims to ~5MB with only 2 verifier warnings (HostCPUFeatures __init__ dlopen, harmless)
  • Provides juliac_lu!/juliac_lu API that auto-starts the server on package init if a prebuilt binary is found, with graceful fallback to pure-Julia code

Architecture

The juliac shared library approach (@ccallable + dlopen) was investigated first but cannot work from within a Julia process — the jl_adopt_thread mechanism in juliac @ccallable prologues fatally conflicts with existing Julia TLS (jl_init_threadtls abort). Multiple workarounds were attempted (raw pthread bridge, dlmopen namespace isolation) — all crash at the Julia runtime level.

The working solution uses a subprocess executable communicating via stdin/stdout pipes with a binary protocol:

  • Request: cmd(UInt8) + m(Int64) + n(Int64) + A(T[m*n])
  • Response: info(Int64) + A(T[m*n]) + ipiv(Int64[min(m,n)])
  • Commands: 0x00=f64, 0x01=f32, 0x02=f64_threaded, 0x03=f32_threaded, 0x04=f64_nopiv, 0x05=f32_nopiv, 0xff=exit

A key discovery: juliac's libuv runtime sets stdin/stdout to non-blocking mode, which breaks raw read() syscalls. Fixed by calling fcntl(fd, F_SETFL, flags & ~O_NONBLOCK) at startup.

New files

File Purpose
juliac/shim_exe.jl Executable server source (raw libc fd I/O, binary protocol)
juliac/shim.jl @ccallable shared library entry points (for C consumers)
juliac/Project.toml Build-time dependencies
src/juliac_server.jl Server management, pipe communication, build_binary(), juliac_lu!/juliac_lu

Usage

using RecursiveFactorization

# Build the binary (one-time, requires Julia 1.12+)
RecursiveFactorization.build_binary()

# Use the juliac-accelerated path (falls back to pure Julia if no binary)
F = RecursiveFactorization.juliac_lu(rand(100, 100))
F = RecursiveFactorization.juliac_lu!(A)  # mutating version

Test plan

  • All 3120 existing LU factorization tests pass
  • All 21 butterfly tests pass
  • 24 new juliac server tests pass (Float64/Float32, square/rectangular, mutating, singular, consistency with pure-Julia path)
  • Server gracefully skips when binary not present (tests report "skipping binary path tests")
  • CI will not have juliac binary, so juliac server tests will be skipped gracefully

🤖 Generated with Claude Code

Adds support for using a juliac-compiled (Julia 1.12+) standalone executable
to perform LU factorization via a subprocess communicating over pipes. This
enables using RecursiveFactorization without loading the full Julia package
and its dependency tree at runtime.

Key components:
- juliac/shim_exe.jl: Executable server using raw fd I/O (libc read/write)
  with a binary protocol for Float64/Float32 LU factorization
- juliac/shim.jl: @ccallable shared library entry points (for C consumers)
- src/juliac_server.jl: Server process management, pipe communication,
  build_binary() function, and juliac_lu!/juliac_lu API
- Auto-starts server in __init__ if binary exists, graceful shutdown on exit

The juliac binary trims to ~5MB with only 2 verifier warnings (HostCPUFeatures).
A key finding is that juliac shared libraries cannot be called from within a
Julia process (jl_adopt_thread conflicts with existing TLS), so the subprocess
approach via executable + pipes is used instead.

Usage:
  RecursiveFactorization.build_binary()  # one-time, requires Julia 1.12+
  F = RecursiveFactorization.juliac_lu(A)  # uses server if available

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisRackauckas-Claude
Copy link
Author

Detailed investigation notes: failed approaches and findings

This PR is the result of extensive experimentation with juliac on Julia 1.12. Here's a complete record of what was tried, what failed, and why.


Approach 1: Shared library with @ccallable + dlopen from Julia

Idea: Build a shared library (librecursivefactorization.so) with @ccallable entry points and load it directly from Julia via ccall.

Result: ✅ Builds and works from C, ❌ Crashes when called from Julia.

The shared library built successfully at 5MB with --trim=unsafe-warn, exporting 6 symbols:

  • recursive_lu_f64!, recursive_lu_f32! (pivoted, single-threaded)
  • recursive_lu_f64_threaded!, recursive_lu_f32_threaded! (pivoted, multi-threaded)
  • recursive_lu_f64_nopiv!, recursive_lu_f32_nopiv! (no pivoting)

C test (test_noinit.c) passed perfectly — loaded the .so with dlopen, called recursive_lu_f64! on a 100×100 matrix, got residual 2.41e-15 (well within tolerance 4.40e-13).

From Julia, instant crash:

signal 6: Aborted
in expression starting at REPL[3]:1
jl_init_threadtls at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-12/src/threading.c:445
jl_adopt_thread at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-12/src/threading.c:477

Root cause: Every @ccallable function's prologue calls jl_adopt_thread(), which calls jl_init_threadtls(). This conflicts with the existing Julia TLS on the calling thread. The juliac library embeds its own Julia runtime, and the two runtimes' TLS mechanisms are fundamentally incompatible on the same thread.


Approach 2: Raw pthread bridge (bridge.c)

Idea: Create a C bridge library that forwards calls to the juliac .so from a freshly-created pthread (which has no Julia TLS), avoiding the TLS conflict.

Implementation: bridge.c — loads the juliac .so via dlopen, spawns a new pthread for each LU call, the pthread calls the @ccallable function, signals completion via condvar.

Result: ❌ Segfault

Thread 3 "julia" received signal SIGSEGV
0x00007fffd6cab010 in jl_get_abi_converter ()

Even though the pthread has no Julia TLS, the juliac library's symbols (like jl_get_abi_converter) conflict with the host Julia's symbols loaded in the process. The dynamic linker resolves the juliac library's internal Julia symbols to the host Julia's versions, causing corruption.


Approach 3: dlmopen with separate linker namespace (bridge_dlmopen.c)

Idea: Use Linux's dlmopen(LM_ID_NEWLM, ...) to load the juliac .so in a completely separate linker namespace, preventing symbol conflicts between the embedded Julia runtime and the host Julia.

Result: ❌ Segfault during init

The juliac .so has a complex dependency chain (libjulia-internal, libjulia, libopenblas, etc.) that fails to properly initialize in an isolated linker namespace. The separate namespace approach can't handle the full Julia runtime dependency graph.


Approach 4: Building without trim (307MB library)

Idea: Maybe trimming is removing something essential. Try --trim=no to see if the full library works.

Result: ❌ Same crash. The 307MB untrimmed library has the exact same jl_adopt_thread/jl_init_threadtls conflict. This confirmed the issue is fundamental to juliac's architecture, not a trimming artifact.


Approach 5: Executable with Core.stdin/Core.stdout

Idea: Since shared libraries can't be called from Julia, build a standalone executable and communicate via pipes.

Result:Core.stdin not available in trimmed binaries.

ERROR: UndefVarError: `stdin` not defined in `Core`

Core.stdin is an untyped global that gets stripped during trimming. The Julia I/O subsystem (Base.stdin, Base.stdout) relies on libuv initialization that isn't available in trimmed executables.


Approach 6 (final, working): Executable with raw libc fd I/O

Idea: Bypass Julia's I/O entirely. Use raw ccall(:read, ...) and ccall(:write, ...) on file descriptors 0 (stdin) and 1 (stdout).

Result: ✅ Works from bash pipes, but ❌ initially failed from Julia-spawned processes.

Discovery: Non-blocking fd issue. When Julia spawns a child process, the juliac runtime's libuv event loop sets stdin/stdout to non-blocking mode. Raw read() then returns -1 with EAGAIN instead of blocking.

Diagnosed by building a debug binary that reports:

# From bash pipe:
fstat(0) returned: 0
read(0,1) returned: 1, byte=255   # ✅ works

# From Julia-spawned process:
fstat(0) returned: 0
read(0,1) returned: -1             # ❌ EAGAIN

Fix: Call fcntl(fd, F_SETFL, flags & ~O_NONBLOCK) at the start of @main to reset fds to blocking mode:

function set_blocking!(fd::Cint)::Nothing
    flags = ccall(:fcntl, Cint, (Cint, Cint), fd, 3)  # F_GETFL
    new_flags = flags & ~Cint(0x800)  # ~O_NONBLOCK
    ccall(:fcntl, Cint, (Cint, Cint, Cint), fd, 4, new_flags)  # F_SETFL
    nothing
end

After this fix, the executable works perfectly from Julia via pipes.


Summary table

Approach Build From C From Julia Reason for failure
Shared lib + direct ccall ✅ 5MB ❌ crash jl_adopt_thread TLS conflict
Shared lib + pthread bridge ❌ segfault Symbol conflict (jl_get_abi_converter)
Shared lib + dlmopen ❌ segfault Can't init Julia runtime in isolated namespace
Shared lib no-trim (307MB) ❌ crash Same TLS conflict (not a trimming issue)
Executable + Core.stdin Core.stdin stripped in trimmed binary
Executable + raw fd I/O ✅ 5MB Works after set_blocking! fix

Implications for juliac ecosystem

  1. juliac shared libraries are currently unusable from Julia processes. The jl_adopt_thread mechanism makes it impossible to call @ccallable functions from any thread that already has Julia TLS. This is a fundamental limitation, not a bug in user code.

  2. juliac executables work well as subprocess servers, but require raw libc I/O (not Core.stdin) and explicit fcntl to reset non-blocking fds.

  3. The shared library works perfectly from C/C++ programs that don't have their own Julia runtime — this is a valid use case for non-Julia consumers.

- Replace raw juliac.jl shell-out with JuliaC.jl ImageRecipe/LinkRecipe API
- Add optional bundle=true kwarg for relocatable binaries
- Fix compiled binary crash: __init__ was trying to start a subprocess
  server inside the compiled binary itself (which IS the server), causing
  setup_stdio MethodError. Fixed by setting RECFACT_SERVER env var in
  shim_exe.jl and checking it in _init_juliac_server.
- Wrap _start_server in try-catch for robustness in edge cases

All 3165 tests pass (3120 LU + 21 butterfly + 24 juliac server).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChrisRackauckas-Claude
Copy link
Author

Update: JuliaC.jl build pipeline + init crash fix

Changes in 0086b53

1. Build pipeline now uses JuliaC.jl instead of shelling out to raw juliac.jl:

img = JuliaC.ImageRecipe(; output_type="--output-exe", trim_mode="unsafe-warn", file=shim_src, project=build_dir)
JuliaC.compile_products(img)
link = JuliaC.LinkRecipe(; image_recipe=img, outname=binary_path)
JuliaC.link_products(link)

Also added optional bundle=true kwarg for creating relocatable binaries via JuliaC.BundleRecipe.

2. Fixed a crash in the compiled binary. The previous binary would crash on startup with:

Core.InitError(mod=:RecursiveFactorization, error=Core.MethodError(f=Base.var"#setup_stdio"(), ...))

Root cause: RecursiveFactorization.__init__ calls _init_juliac_server() which tries to open() a subprocess server — but the compiled binary IS the server, and setup_stdio gets trimmed. Fix: shim_exe.jl now sets ENV["RECFACT_SERVER"] = "1" before importing, and _init_juliac_server checks for this. Also added try-catch for robustness.

Verified locally

  • Binary builds successfully: 5.5MB ELF executable
  • Direct pipe test: LU factors match stdlib exactly
  • Full test suite: 3120 LU + 21 butterfly + 24 juliac server = 3165 tests pass

@ChrisRackauckas-Claude
Copy link
Author

MWE: juliac @ccallable shared library crashes when called from Julia

The subprocess server architecture is required because @ccallable shared libraries cannot be called from a Julia process. Here are minimal reproducers.

Files

All in juliac_mwe/ — run with julia build_and_test.jl (requires Julia 1.12+, JuliaC.jl, gcc).

lib_entry.jl — minimal shared library (8 lines)

module LibEntry

import Base.@ccallable

@ccallable function double_it(x::Float64)::Float64
    return 2.0 * x
end

end

exe_entry.jl — same function as subprocess server

function set_blocking!(fd::Cint)::Nothing
    flags = ccall(:fcntl, Cint, (Cint, Cint), fd, 3)  # F_GETFL
    new_flags = flags & ~Cint(0x800)  # ~O_NONBLOCK
    ccall(:fcntl, Cint, (Cint, Cint, Cint), fd, 4, new_flags)  # F_SETFL
    nothing
end

function fd_read!(fd::Cint, buf::Ptr{UInt8}, nbytes::Int)::Nothing
    remaining = nbytes
    offset = 0
    while remaining > 0
        n = ccall(:read, Cssize_t, (Cint, Ptr{UInt8}, Csize_t), fd, buf + offset, remaining)
        n <= 0 && error("read failed")
        remaining -= n
        offset += n
    end
    nothing
end

function fd_write(fd::Cint, buf::Ptr{UInt8}, nbytes::Int)::Nothing
    remaining = nbytes
    offset = 0
    while remaining > 0
        n = ccall(:write, Cssize_t, (Cint, Ptr{UInt8}, Csize_t), fd, buf + offset, remaining)
        n <= 0 && error("write failed")
        remaining -= n
        offset += n
    end
    nothing
end

function (@main)(args::Vector{String})
    fdin = Cint(0)
    fdout = Cint(1)
    set_blocking!(fdin)
    set_blocking!(fdout)

    while true
        cmd_ref = Ref{UInt8}()
        GC.@preserve cmd_ref fd_read!(fdin, Ptr{UInt8}(pointer_from_objref(cmd_ref)), 1)
        cmd_ref[] == 0xff && break

        x_ref = Ref{Float64}()
        GC.@preserve x_ref fd_read!(fdin, Ptr{UInt8}(pointer_from_objref(x_ref)), 8)

        result_ref = Ref{Float64}(2.0 * x_ref[])
        GC.@preserve result_ref fd_write(fdout, Ptr{UInt8}(pointer_from_objref(result_ref)), 8)
    end
    return 0
end

test_from_c.c — C caller (works)

#include <stdio.h>
#include <dlfcn.h>

int main() {
    void *handle = dlopen("./libdouble.so", RTLD_NOW | RTLD_GLOBAL);
    if (!handle) { fprintf(stderr, "dlopen: %s\n", dlerror()); return 1; }
    double (*double_it)(double) = dlsym(handle, "double_it");
    if (!double_it) { fprintf(stderr, "dlsym: %s\n", dlerror()); return 1; }
    printf("double_it(21.0) = %f\n", double_it(21.0));
    dlclose(handle);
    return 0;
}

Build (using JuliaC.jl)

using JuliaC

# Shared library
img = JuliaC.ImageRecipe(; output_type="--output-lib", trim_mode="unsafe-warn",
    file="lib_entry.jl", add_ccallables=true)
JuliaC.compile_products(img)
JuliaC.link_products(JuliaC.LinkRecipe(; image_recipe=img, outname="libdouble"))

# Executable
img2 = JuliaC.ImageRecipe(; output_type="--output-exe", trim_mode="unsafe-warn",
    file="exe_entry.jl")
JuliaC.compile_products(img2)
JuliaC.link_products(JuliaC.LinkRecipe(; image_recipe=img2, outname="double_server"))

Test results on Julia 1.12.4

Test 1 — from C: WORKS

$ gcc -o test_from_c test_from_c.c -ldl && LD_LIBRARY_PATH=. ./test_from_c
dlopen succeeded
dlsym succeeded
double_it(21.0) = 42.000000

Test 2 — from Julia via ccall: CRASHES

using Libdl
handle = Libdl.dlopen("libdouble.so")  # OK
ptr = Libdl.dlsym(handle, "double_it")  # OK
ccall(ptr, Float64, (Float64,), 21.0)   # ABORT
signal 6 (-6): Aborted
jl_init_threadtls at threading.c:324
ijl_adopt_thread at threading.c:443
double_it at libdouble.so

Test 3 — from Julia via subprocess: WORKS

proc = open(`./double_server`, write=true, read=true)
write(proc, UInt8(0x00)); write(proc, Float64(21.0)); flush(proc)
buf = Vector{UInt8}(undef, 8); readbytes!(proc, buf, 8)
reinterpret(Float64, buf)[1]  # 42.0

Root cause

Every @ccallable function's prologue calls jl_adopt_thread() (threading.c:443) → jl_init_threadtls() (threading.c:324). When the calling thread already has Julia TLS (because it's a Julia thread), jl_init_threadtls calls abort(). The juliac shared library embeds its own Julia runtime, and the two runtimes' TLS are fundamentally incompatible on the same thread.

Workarounds attempted and failed:

  • pthread bridge (new thread, no Julia TLS) → segfault from symbol conflicts (jl_get_abi_converter)
  • dlmopen(LM_ID_NEWLM) (separate linker namespace) → segfault during Julia runtime init in isolated namespace
  • --trim=no (307MB untrimmed library) → same TLS abort, confirming it's not a trimming artifact

The subprocess executable is the only working path until the Julia runtime supports loading juliac shared libraries into an existing Julia process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants