Conditioning and performance#
Two related questions every user asks at some point: “why is this matrix so ill-conditioned?” and “why is the solve so slow?” The answers usually overlap.
Matrix conditioning#
The condition number \(\kappa(\mathbf{K}) = \lambda_\max / \lambda_\min\) for the symmetric-positive-definite stiffness is the fundamental sensitivity-of-the-solve number. Above ~1e10 most direct backends produce noticeable rounding error; above 1e14 they fail outright.
Sources of poor conditioning, in descending frequency:
Geometry / material disparity. A model that mixes a steel beam with a soft-rubber pad inside the same solve routinely hits \(\kappa \sim 10^9\) — the stiffness ratio goes straight into the eigenvalue spread. Solve the two parts as separate problems where possible.
Near-zero-Jacobian elements. The local stiffness blows up as \(\det(\mathbf{J}) \to 0\); the global condition number follows. See Mesh quality.
Insufficient BCs. An under-constrained model has a near-zero eigenvalue (the would-be rigid-body mode); the condition number is effectively infinite. The Boundary-condition pitfalls rigid-body-mode check catches this before the solve.
Sliver elements (very thin cells next to thick neighbours) — local-to-global stiffness mismatch. Refine or split.
Reading the diagnostic#
When a direct solve fails or returns garbage, the linear
backend reports a condition-number estimate (often as rcond
— the reciprocal). Pardiso, CHOLMOD, and MUMPS all expose
this through the Result.diagnostics dict. Reading
rcond < 1e-10 is the same signal as \(\kappa > 10^{10}\).
Performance tuning#
For the linear-elastic structural slice femorph-solver ships today, the wall-time bottleneck is almost always the linear factorisation. The factor cost depends on three things:
Backend choice (Pardiso / CHOLMOD / MUMPS / SuperLU).
Thread count (
OMP_NUM_THREADSand friends).Matrix structure (DOF count and connectivity / fill-in).
When to switch backends#
Default — femorph-solver’s auto-chain picks the fastest installed backend (Pardiso > CHOLMOD > MUMPS > SuperLU). Most workflows don’t need to override.
Pardiso — fastest on Intel-class CPUs (the BLAS-pinned configuration ARC pods + most laptops use). Two-pass factorisation outperforms CHOLMOD on dense connectivity (3-D solid models).
CHOLMOD — competitive with Pardiso on sparse-graph models (beam-shell meshes), but pays a constant fill-in overhead Pardiso doesn’t. Pick CHOLMOD when AMD CPUs are in play and Pardiso’s MKL is an issue.
MUMPS — preferred when out-of-core (large models that don’t fit in RAM); supports disk spillover. Slower than Pardiso / CHOLMOD on in-RAM problems.
SuperLU — fallback only. Pure-SciPy, no external dep, ~2-3× slower than the optimised backends.
Switch via Model.solve_static(linear_solver="cholmod") etc. See
Linear-solver backends for the full
backend-selection API.
Thread pinning#
The library reads OMP_NUM_THREADS, MKL_NUM_THREADS,
and OPENBLAS_NUM_THREADS for BLAS-pool sizing. CI pods
pin all four to 4 because the cgroup quota gives ~4 vCPUs
— larger pools just thrash from scheduler contention.
Local guidance:
Laptop / 8-core workstation — set
OMP_NUM_THREADS=4for most workloads. Leave 4 cores for the OS / your IDE.Beefy workstation / cluster node (32+ cores) — set
OMP_NUM_THREADS=8or16; diminishing returns above 16 for typical structural-mechanics K matrices.
Modal solves#
The eigensolver dominates wall-time on modal analyses.
scipy.sparse.linalg.eigsh calls the linear backend once
per Lanczos iteration — so eigensolver speed is dominated by
the linear backend choice. All the linear-backend guidance
above applies to modal solves the same way.
For very large models (> 1 M DOF) consider
Model.solve_modal(eigen_solver="lobpcg") which avoids the
explicit factorisation but has slower convergence on
clustered low frequencies.
Memory#
Direct factor memory scales as \(O(N^{1.5})\) for 3-D
solid meshes (Saad 2003 §3.6). For 100 k-cell HEX8 meshes
that’s typically 8-15 GB — fits in workstation RAM. Beyond
500 k cells the factor outgrows 32 GB and you’ll need to
either move to MUMPS out-of-core or drop to an iterative
backend (CG with AMG preconditioner — currently shipped via
linear_solver="cg+pyamg").