Conditioning and performance#

Two related questions every user asks at some point: “why is this matrix so ill-conditioned?” and “why is the solve so slow?” The answers usually overlap.

Matrix conditioning#

The condition number \(\kappa(\mathbf{K}) = \lambda_\max / \lambda_\min\) for the symmetric-positive-definite stiffness is the fundamental sensitivity-of-the-solve number. Above ~1e10 most direct backends produce noticeable rounding error; above 1e14 they fail outright.

Sources of poor conditioning, in descending frequency:

  • Geometry / material disparity. A model that mixes a steel beam with a soft-rubber pad inside the same solve routinely hits \(\kappa \sim 10^9\) — the stiffness ratio goes straight into the eigenvalue spread. Solve the two parts as separate problems where possible.

  • Near-zero-Jacobian elements. The local stiffness blows up as \(\det(\mathbf{J}) \to 0\); the global condition number follows. See Mesh quality.

  • Insufficient BCs. An under-constrained model has a near-zero eigenvalue (the would-be rigid-body mode); the condition number is effectively infinite. The Boundary-condition pitfalls rigid-body-mode check catches this before the solve.

  • Sliver elements (very thin cells next to thick neighbours) — local-to-global stiffness mismatch. Refine or split.

Reading the diagnostic#

When a direct solve fails or returns garbage, the linear backend reports a condition-number estimate (often as rcond — the reciprocal). Pardiso, CHOLMOD, and MUMPS all expose this through the Result.diagnostics dict. Reading rcond < 1e-10 is the same signal as \(\kappa > 10^{10}\).

Performance tuning#

For the linear-elastic structural slice femorph-solver ships today, the wall-time bottleneck is almost always the linear factorisation. The factor cost depends on three things:

  1. Backend choice (Pardiso / CHOLMOD / MUMPS / SuperLU).

  2. Thread count (OMP_NUM_THREADS and friends).

  3. Matrix structure (DOF count and connectivity / fill-in).

When to switch backends#

  • Default — femorph-solver’s auto-chain picks the fastest installed backend (Pardiso > CHOLMOD > MUMPS > SuperLU). Most workflows don’t need to override.

  • Pardiso — fastest on Intel-class CPUs (the BLAS-pinned configuration ARC pods + most laptops use). Two-pass factorisation outperforms CHOLMOD on dense connectivity (3-D solid models).

  • CHOLMOD — competitive with Pardiso on sparse-graph models (beam-shell meshes), but pays a constant fill-in overhead Pardiso doesn’t. Pick CHOLMOD when AMD CPUs are in play and Pardiso’s MKL is an issue.

  • MUMPS — preferred when out-of-core (large models that don’t fit in RAM); supports disk spillover. Slower than Pardiso / CHOLMOD on in-RAM problems.

  • SuperLU — fallback only. Pure-SciPy, no external dep, ~2-3× slower than the optimised backends.

Switch via Model.solve_static(linear_solver="cholmod") etc. See Linear-solver backends for the full backend-selection API.

Thread pinning#

The library reads OMP_NUM_THREADS, MKL_NUM_THREADS, and OPENBLAS_NUM_THREADS for BLAS-pool sizing. CI pods pin all four to 4 because the cgroup quota gives ~4 vCPUs — larger pools just thrash from scheduler contention.

Local guidance:

  • Laptop / 8-core workstation — set OMP_NUM_THREADS=4 for most workloads. Leave 4 cores for the OS / your IDE.

  • Beefy workstation / cluster node (32+ cores) — set OMP_NUM_THREADS=8 or 16; diminishing returns above 16 for typical structural-mechanics K matrices.

Memory#

Direct factor memory scales as \(O(N^{1.5})\) for 3-D solid meshes (Saad 2003 §3.6). For 100 k-cell HEX8 meshes that’s typically 8-15 GB — fits in workstation RAM. Beyond 500 k cells the factor outgrows 32 GB and you’ll need to either move to MUMPS out-of-core or drop to an iterative backend (CG with AMG preconditioner — currently shipped via linear_solver="cg+pyamg").