We present a grand equivalence between three non-iterative training paradigms that make model fitting instantaneous on a broad class of architectures: (1) Data–Model Duality encodes the dataset directly as parameters and decodes for inference; (2) Closed-Form Optima solve for the optimal weights in one shot; (3) Holistic Collapse jumps to the same fixed point via a non-local, dataset-wide update. We prove and implement numerical identities showing that, on the training support, these three views produce identical predictions, and that the collapse fixed point equals the closed-form solution. Our system validates these equivalences with executable assertions at near machine precision across linear/ridge, kernel ridge (RBF), deep linear factorization, and an ELM (random hidden layer) architecture. Practical engineering completes the bridge from theory to runtime: SPD systems are solved via Cholesky, inputs are whitened to reduce condition numbers, kernels are fully vectorized, and Chebyshev–Lobatto nodes shrink 1D interpolation error to numerical noise. A one-liner API exposes drop-in training, and a timing harness shows instantaneous methods winning wall-clock versus iterative gradient descent even at small step counts, while matching or exceeding accuracy. These results support a unifying view: training can be treated as an explicit, reversible encoding or a one-shot fixed-point computation rather than a trajectory of gradient updates.
For all training inputs
- Data–Model Duality:
$D(E(D), x)$ - Closed-Form Optimum:
$G(F(D, A), x)$ - Holistic Collapse:
$G(Fix_\theta[\theta + f(D, A, \theta)], x)$
We verify numerically that:
-
$D(E(D), x_i) = y_i$ for all training pairs$(x_i, y_i)$ $G(F(D, A), x) = G(Fix_\theta[\theta + f(D, A, \theta)], x)$ - Idempotence: reapplying any branch is a no-op, and
$\theta^\star + f(D, A, \theta^\star) = \theta^\star$
- Data–Model Duality
- Exact dictionary encoding with nearest-neighbor fallback off-support
- 1D barycentric interpolation; Chebyshev–Lobatto nodes for near–machine-precision equality
- Closed-Form Maps (F)
- Linear and Ridge regression
- Kernel Ridge Regression (RBF) with vectorized Gram/cross-kernel
- Extreme Learning Machine (ELM) with random hidden layer and closed-form output
- Deep ELM (stacked random features + closed-form head)
- Holistic Collapse (f) and Fixed-Point
- Linear/ridge collapse equals closed-form optimum
- Deep linear networks via balanced SVD factorization across L layers
- Linear autoencoders via PCA; collapse equals SVD baseline reconstruction
- Numerics & Stability
- SPD solves via Cholesky (SciPy cho_factor/cho_solve or NumPy cholesky with triangular solves)
- Input whitening with configurable epsilon (whiten_eps) to reduce condition numbers
- Float64 everywhere; executable assertions with tight tolerances
- Performance
- Timing harness shows instantaneous methods vs. 10/50-step GD baselines; prints relative MSE gap vs. closed-form
# From repo root
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip numpy scipy
# Run the demo suite (saves logs to results.txt)
python instant_train.py
cat results.txtYou should see “[✔] All checks passed” and an absolute path to the generated results.txt.
Run the dataset demo for the Transformer scaffold (local import via PYTHONPATH):
PYTHONPATH=$(pwd) python examples/train_on_datasets.py \
--copy_num 64 --copy_len 128 --copy_vocab 64 \
--br_num 64 --br_len 128 --br_depth 32 --seed 0This will print copy-task accuracy (~1.0 with ridge head) and bracket-depth MSE.
pip install -e .
# Version
exactstep --version
# Copy dataset demo
ti copy --n 64 --L 128 --V 64
# Bracket-depth demo
ti bracket --n 64 --L 128 --depth 32
# Deep ELM sweep (plots saved under transformer_instant/examples/figures)
exactstep deep-elm-bench \
--depths 2,3,4 \
--hidden 128,256,512 \
--lambdas 1e-4,1e-3,1e-2 \
--n-train 400 --n-test 300 --d 8 --seed 7docker build -t exactstep:cpu .
docker run --rm exactstep:cpu
# Save generated figures to host
CID=$(docker create exactstep:cpu); docker start -a "$CID"; \
mkdir -p docker_figures; \
docker cp "$CID":/app/transformer_instant/examples/figures ./docker_figures; \
docker rm "$CID"
# Or bind-mount a host directory for figures
mkdir -p docker_figures
docker run --rm -v "$(pwd)"/docker_figures:/app/transformer_instant/examples/figures exactstep:cpu# Docs (Sphinx)
pip install -e .[docs]
sphinx-build -b html docs docs/build/html
# Tests & coverage
pytest -q
coverage run -m pytest && coverage report -mfrom instant_train import instant, ipredict
import numpy as np
X = np.random.randn(200, 5).astype(np.float64)
y = (np.sin(X[:,0]) + X[:,1]**2).astype(np.float64)
Xtest = np.random.randn(20, 5).astype(np.float64)
# Closed-form linear
Y_lin = instant(X, X @ np.random.randn(5,2), Xtest, arch="linear", fit_intercept=False)
# Ridge
Y_ridge = instant(X, X @ np.random.randn(5,2), Xtest, arch="ridge", lambda_reg=1e-2)
# Kernel ridge (RBF)
Y_krr = instant(X, y, Xtest, arch="kernel_ridge", length_scale=0.8, variance=1.0, lambda_reg=1e-3, whiten_eps=1e-12)
# ELM (closed-form) — set random_seed for parity with collapse
Y_elm = instant(X, y, Xtest, arch="elm", hidden_units=256, activation="tanh", lambda_reg=1e-3, random_seed=7)
# Duality dictionary (exact memory + NN fallback off-support)
Y_dual = instant(X, X @ np.random.randn(5,2), Xtest, arch="dict")
# Collapse variants
Y_coll_lin = ipredict(X, X @ np.random.randn(5,2), Xtest, arch="collapse:linear")
Y_coll_ridge = ipredict(X, X @ np.random.randn(5,2), Xtest, arch="collapse:ridge", lambda_reg=1e-2)
Y_coll_krr = ipredict(X, y, Xtest, arch="collapse:kernel_ridge", length_scale=0.8, variance=1.0, lambda_reg=1e-3, whiten_eps=1e-12)
Y_coll_elm = ipredict(X, y, Xtest, arch="collapse:elm", hidden_units=256, activation="tanh", lambda_reg=1e-3, random_seed=7)instant_train.py executes a battery of checks with numpy.testing.assert_allclose:
- Duality: D(E(D), x_i) = y_i on train support
- Linear/ridge: closed-form = collapse on train and random test matrices
- Kernel Ridge (RBF): closed-form = collapse (same dual α)
- ELM: closed-form = collapse given identical random_seed/activation/scale
- Deep Linear: SVD-based L-layer factorization equals optimal end-to-end map
- Autoencoder: collapse reconstruction equals PCA (SVD) baseline
- Noisy data sanity: closed-form ≡ collapse; duality returns NN fallback off-support
- Timing harness: prints runtime and relative MSE gap of GD-10/GD-50 vs. closed-form
At completion, the script prints “[✔] All checks passed” and the absolute path to results.txt.
- Duality:
dict,barycentric_1d - Closed-form:
linear,ridge,kernel_ridge,elm,deep_elm - Collapse:
collapse:linear,collapse:ridge,collapse:kernel_ridge,collapse:elm,collapse:deep_elm,collapse:autoencoder,autoencoder
Key kwargs:
lambda_reg: small L2 stabilizer (e.g., 1e-6 to 1e-3)length_scale,variance: RBF kernel paramshidden_units,activation(tanh|relu),weight_scale,random_seed: ELM paramswhiten_eps: epsilon floor for per-feature std in whitening
- SPD Cholesky solves for normal equations (linear/ridge), KRR dual, and ELM output
- Vectorized RBF Gram and cross-kernel; no Python loops
- Whitening reduces condition numbers; all computations in float64
- Barycentric 1D uses Chebyshev–Lobatto nodes in demos; uniform-node tests use a looser tolerance
instant_train.py: Facade, one-liner API, demo assertions, timing harnessclosed_form_training.py: F(D, A) closed-form solvers (linear/ridge, KRR, ELM)holistic_update.py: non-local collapse (linear/ridge; deep linear SVD; PCA autoencoder)data_model_duality.py: exact dictionary model; 1D barycentric with robust errorsexactstep/: thin wrapper package re-exporting the public API and CLI entrypointtransformer_instant/: hook-based, model-agnostic Transformer scaffold (CPU)utils.py: whitening, SPD solves, ridge/KRR/ELM, SVD helpersclosed_form_head.py: frozen features → closed-form head (ridge/KRR/ELM)attention_solver.py: explicit Q/K/V, A=softmax(QK^T/√d), Z=AV, solveW_Olora_svd.py: one-shot LoRA via truncated SVD (rank-r update)pipeline.py: trainers for head, block collapse, and LoRA; attention wrapperhooks.py: optional PyTorch forward-hook utilitiesdatasets/: long-range synthetic datasets (copy, bracket depth, reverse, running sum)examples/train_on_datasets.py: CPU demo training on the datasets
tests/test_transformer_instant.py: unit tests for the Transformer scaffold
- Exploring broader kernels and structured features without losing vectorization
- Integrating cross-validation for regularization and kernel parameters in one shot
This project is released under the PolyForm Noncommercial 1.0.0 license. You may use, modify, and distribute the code for noncommercial purposes only. Commercial use requires a separate license from the authors.
See LICENSE for details and definitions: https://polyformproject.org/licenses/noncommercial/1.0.0/
For common questions, see LICENSE-FAQ.md. For commercial licensing, contact itsparedezadrian@outlook.com.