Out-of-core (OOC) Pardiso — limits and tuning#

DirectMklPardisoSolver(A, ooc=True) switches MKL’s sparse Cholesky / LU factor onto its disk-backed path (iparm(60) = 2) — the factor + its scratch are spilled to a scratch file and streamed back in chunks at solve time. This keeps problems solvable on hosts where the in-core factor would exhaust RAM, at the cost of ~3.5× wall time and the requirement that the MKL OOC knobs be configured correctly.

This page documents what we’ve tested, where MKL stops cooperating, and the error-code → fix table.

Required environment #

MKL reads two environment variables at Pardiso construction time. Neither has a runtime-override equivalent inside the library, so they must be set before femorph_solver imports anything that touches MKL — in practice that means before any import femorph_solver line that resolves to the C extension.

MKL_PARDISO_OOC_PATH: Directory (or path prefix) where MKL writes the factor spill files. Must be writable. Each factor creates files named pardiso_ooc_* in that directory. Default . — the current working directory — which is rarely what you want.
MKL_PARDISO_OOC_MAX_CORE_SIZE: In-memory budget (MB) for the current frontal block. Below MKL’s floor (~500 MB in 2024+ builds) any non-trivial 3D front returns error -9. A safe heuristic is half the estimated factor size; perf/bench_ooc_vs_incore.py auto- picks this from the in-core pass’s reported factor_nnz.

A third variable, MKL_PARDISO_OOC_KEEP_FILES, is honoured by MKL but not used by DirectMklPardisoSolver — the solver always unlinks its spill files on teardown (phase -1).

Error-code table #

DirectMklPardisoSolver maps every documented MKL Pardiso negative return to a human hint before raising RuntimeError. The same hint text appears in _MKL_ERROR_MESSAGES for callers that want to pattern-match programmatically.

Code	Condition	Usual fix
`-1`	Input inconsistent.	Check `indptr` / `indices` dtype (must be int32), sorted indices, and the CSR shape.
`-2`	Not enough memory for the in-core factor.	Re-run with `ooc=True` or on a host with more RAM.
`-3`	Reordering problem.	Check METIS availability; `iparm[1]` default is METIS but a corrupt install can surface here.
`-4`	Zero pivot during numerical factorisation.	The matrix has a zero (or machine-zero) pivot at `iparm[19] - 1`. Most commonly a BC misspec that leaves a row with no stiffness; inspect the row index MKL reports.
`-8`	32-bit integer overflow.	The matrix CSR exceeds int32 indexing. Not resolvable with Pardiso’s current binding; switch to a 64-bit-integer backend (MUMPS int64 build) or reduce the problem.
`-9`	Not enough memory for OOC.	Raise `MKL_PARDISO_OOC_MAX_CORE_SIZE`. `bench_ooc_vs_incore` doubles the value on this error up to a 32 GB ceiling.
`-10`	Can’t open OOC files.	`MKL_PARDISO_OOC_PATH` doesn’t exist or isn’t writable. The pre-flight check in `DirectMklPardisoSolver` catches this before MKL does and raises a descriptive `RuntimeError` — if you see `-10` directly, the pre-flight was bypassed.
`-11`	Read/write error on OOC files.	Disk out of space, filesystem errored, or permissions changed mid-factor. The pre-flight check verifies ~20 × `A.nnz × 12` bytes free at construct time; deltas between estimate and actual can still trip this on very fragmented filesystems.

Validated scales (2026-04-24)#

The perf/bench_ooc_vs_incore.py bench has verified the OOC path at the following HEX20 plate sizes (4-thread MKL, P-core pinned, fresh subprocess per size so ru_maxrss is the true watermark):

Mesh	n_dof	in-core wall	in-core peak	OOC wall	OOC peak	MAX_CORE auto
192×192×2	1 221 120	19.4 s	15.25 GB	67.9 s	11.96 GB	5 312 MB
224×224×2	1 660 716	27.8 s	21.41 GB	97.4 s	16.72 GB	7 781 MB
256×256×2	2 166 528	36.6 s	28.18 GB	128.9 s	21.99 GB	10 397 MB

Every size in that table worked first-try with the auto-sized MAX_CORE_SIZE; no retry-on--9 doubling fired. The -22 % peak / +3.5× wall trade-off is consistent across the band — there is no “sweet spot” where OOC gets cheaper.

Known limitations #

No bit-exact repeatability across MKL versions. The OOC path’s tile-size heuristics have shifted between oneMKL 2024.1, 2024.2 and 2025.3. Wall-time deltas of ±10 % across a minor MKL bump are normal; frequency results stay identical to the in-core factor at every version we’ve tested.

Can’t combine with mixed precision. iparm[27] = 1 (single-precision factor) and iparm[59] = 2 (OOC) both claim the in-core working buffer; MKL returns -9 consistently when both are set. ooc=True takes precedence and leaves mixed precision off — mixed precision doesn’t actually halve our factor on this MKL build anyway (the flag is silently no-op’d; iparm[27] reads back 0 after factorize).

Can’t combine with the “improved two-level” parallel factor / solve. iparm[23] = 1 + iparm[24] = 2 allocate per-thread scratch the OOC path refuses to spill. MKL returns -2 (“not enough memory”) under that combination on any front larger than ~1 k DOFs. When ooc=True the solver drops to the classical parallel path (iparm[23] = 0, iparm[24] = 0) — this is what gives the consistent +3.5× wall-time trade; losing that combination alone is worth ~2× per solve.

Factor spill size is not ``factor_nnz × 12``. The spill file contains the factor + analysis scratch + recycled frontal tiles. On our 192-256² measurements the spill was ~3× the factor_nnz × 12 estimate the pre-flight uses. The pre-flight’s safety factor (A.nnz × 20) is tuned to accommodate that ratio for 3D SPD meshes; for 2D meshes or highly-loaded unsymmetric systems it may over- or under-shoot.

Further work #

Every item below is the stress-test work that the landed DirectMklPardisoSolver(ooc=True) path needs next — calling them out here so the documentation stays a good TODO anchor rather than a static description:

Push perf/bench_ooc_vs_incore.py past 3 M DOFs — our 128 GB box can still fit 256×256×2 HEX20 in-core, so we haven’t observed any sizes where OOC is the only option. Once we cross that boundary, the wall-time cost of OOC versus the alternative (“crash with -2”) stops being a tradeoff and starts being the whole answer.
Validate against a non-MKL-default locale (Turkish, German) — MKL’s env-var parser has historically had issues with non-ASCII filesystem paths.
Measure spill-file growth vs factor_nnz empirically at multiple sizes and refit the pre-flight’s × 20 safety factor if needed.

Out-of-core (OOC) Pardiso — limits and tuning#

Required environment#

Error-code table#

Validated scales (2026-04-24)#

Known limitations#

Further work#

Required environment #

Error-code table #

Known limitations #

Further work #