Out-of-core (OOC) Pardiso — limits and tuning#

DirectMklPardisoSolver(A, ooc=True) switches MKL’s sparse Cholesky / LU factor onto its disk-backed path (iparm(60) = 2) — the factor + its scratch are spilled to a scratch file and streamed back in chunks at solve time. This keeps problems solvable on hosts where the in-core factor would exhaust RAM, at the cost of ~3.5× wall time and the requirement that the MKL OOC knobs be configured correctly.

This page documents what we’ve tested, where MKL stops cooperating, and the error-code → fix table.

Required environment#

MKL reads two environment variables at Pardiso construction time. Neither has a runtime-override equivalent inside the library, so they must be set before femorph_solver imports anything that touches MKL — in practice that means before any import femorph_solver line that resolves to the C extension.

MKL_PARDISO_OOC_PATH

Directory (or path prefix) where MKL writes the factor spill files. Must be writable. Each factor creates files named pardiso_ooc_* in that directory. Default . — the current working directory — which is rarely what you want.

MKL_PARDISO_OOC_MAX_CORE_SIZE

In-memory budget (MB) for the current frontal block. Below MKL’s floor (~500 MB in 2024+ builds) any non-trivial 3D front returns error -9. A safe heuristic is half the estimated factor size; perf/bench_ooc_vs_incore.py auto- picks this from the in-core pass’s reported factor_nnz.

A third variable, MKL_PARDISO_OOC_KEEP_FILES, is honoured by MKL but not used by DirectMklPardisoSolver — the solver always unlinks its spill files on teardown (phase -1).

Error-code table#

DirectMklPardisoSolver maps every documented MKL Pardiso negative return to a human hint before raising RuntimeError. The same hint text appears in _MKL_ERROR_MESSAGES for callers that want to pattern-match programmatically.

Code

Condition

Usual fix

-1

Input inconsistent.

Check indptr / indices dtype (must be int32), sorted indices, and the CSR shape.

-2

Not enough memory for the in-core factor.

Re-run with ooc=True or on a host with more RAM.

-3

Reordering problem.

Check METIS availability; iparm[1] default is METIS but a corrupt install can surface here.

-4

Zero pivot during numerical factorisation.

The matrix has a zero (or machine-zero) pivot at iparm[19] - 1. Most commonly a BC misspec that leaves a row with no stiffness; inspect the row index MKL reports.

-8

32-bit integer overflow.

The matrix CSR exceeds int32 indexing. Not resolvable with Pardiso’s current binding; switch to a 64-bit-integer backend (MUMPS int64 build) or reduce the problem.

-9

Not enough memory for OOC.

Raise MKL_PARDISO_OOC_MAX_CORE_SIZE. bench_ooc_vs_incore doubles the value on this error up to a 32 GB ceiling.

-10

Can’t open OOC files.

MKL_PARDISO_OOC_PATH doesn’t exist or isn’t writable. The pre-flight check in DirectMklPardisoSolver catches this before MKL does and raises a descriptive RuntimeError — if you see -10 directly, the pre-flight was bypassed.

-11

Read/write error on OOC files.

Disk out of space, filesystem errored, or permissions changed mid-factor. The pre-flight check verifies ~20 × A.nnz × 12 bytes free at construct time; deltas between estimate and actual can still trip this on very fragmented filesystems.

Validated scales (2026-04-24)#

The perf/bench_ooc_vs_incore.py bench has verified the OOC path at the following SOLID186 plate sizes (4-thread MKL, P-core pinned, fresh subprocess per size so ru_maxrss is the true watermark):

Mesh

n_dof

in-core wall

in-core peak

OOC wall

OOC peak

MAX_CORE auto

192×192×2

1 221 120

19.4 s

15.25 GB

67.9 s

11.96 GB

5 312 MB

224×224×2

1 660 716

27.8 s

21.41 GB

97.4 s

16.72 GB

7 781 MB

256×256×2

2 166 528

36.6 s

28.18 GB

128.9 s

21.99 GB

10 397 MB

Every size in that table worked first-try with the auto-sized MAX_CORE_SIZE; no retry-on--9 doubling fired. The -22 % peak / +3.5× wall trade-off is consistent across the band — there is no “sweet spot” where OOC gets cheaper.

Known limitations#

No bit-exact repeatability across MKL versions. The OOC path’s tile-size heuristics have shifted between oneMKL 2024.1, 2024.2 and 2025.3. Wall-time deltas of ±10 % across a minor MKL bump are normal; frequency results stay identical to the in-core factor at every version we’ve tested.

Can’t combine with mixed precision. iparm[27] = 1 (single-precision factor) and iparm[59] = 2 (OOC) both claim the in-core working buffer; MKL returns -9 consistently when both are set. ooc=True takes precedence and leaves mixed precision off — mixed precision doesn’t actually halve our factor on this MKL build anyway (the flag is silently no-op’d; iparm[27] reads back 0 after factorize).

Can’t combine with the “improved two-level” parallel factor / solve. iparm[23] = 1 + iparm[24] = 2 allocate per-thread scratch the OOC path refuses to spill. MKL returns -2 (“not enough memory”) under that combination on any front larger than ~1 k DOFs. When ooc=True the solver drops to the classical parallel path (iparm[23] = 0, iparm[24] = 0) — this is what gives the consistent +3.5× wall-time trade; losing that combination alone is worth ~2× per solve.

Factor spill size is not ``factor_nnz × 12``. The spill file contains the factor + analysis scratch + recycled frontal tiles. On our 192-256² measurements the spill was ~3× the factor_nnz × 12 estimate the pre-flight uses. The pre-flight’s safety factor (A.nnz × 20) is tuned to accommodate that ratio for 3D SPD meshes; for 2D meshes or highly-loaded unsymmetric systems it may over- or under-shoot.

Further work#

Every item below is the stress-test work that the landed DirectMklPardisoSolver(ooc=True) path needs next — calling them out here so the documentation stays a good TODO anchor rather than a static description:

  1. Push perf/bench_ooc_vs_incore.py past 3 M DOFs — our 128 GB box can still fit 256×256×2 SOLID186 in-core, so we haven’t observed any sizes where OOC is the only option. Once we cross that boundary, the wall-time cost of OOC versus the alternative (“crash with -2”) stops being a tradeoff and starts being the whole answer.

  2. Validate against a non-MKL-default locale (Turkish, German) — MKL’s env-var parser has historically had issues with non-ASCII filesystem paths.

  3. Measure spill-file growth vs factor_nnz empirically at multiple sizes and refit the pre-flight’s × 20 safety factor if needed.