Conditioning and performance ============================ Two related questions every user asks at some point: "why is this matrix so ill-conditioned?" and "why is the solve so slow?" The answers usually overlap. Matrix conditioning ------------------- The condition number :math:`\kappa(\mathbf{K}) = \lambda_\max / \lambda_\min` for the symmetric-positive-definite stiffness is the fundamental sensitivity-of-the-solve number. Above ~1e10 most direct backends produce noticeable rounding error; above 1e14 they fail outright. Sources of poor conditioning, in descending frequency: * **Geometry / material disparity.** A model that mixes a steel beam with a soft-rubber pad inside the same solve routinely hits :math:`\kappa \sim 10^9` — the stiffness ratio goes straight into the eigenvalue spread. Solve the two parts as separate problems where possible. * **Near-zero-Jacobian elements.** The local stiffness blows up as :math:`\det(\mathbf{J}) \to 0`; the global condition number follows. See :doc:`mesh-quality`. * **Insufficient BCs.** An under-constrained model has a near-zero eigenvalue (the would-be rigid-body mode); the condition number is effectively infinite. The :doc:`bc-pitfalls` rigid-body-mode check catches this before the solve. * **Sliver elements** (very thin cells next to thick neighbours) — local-to-global stiffness mismatch. Refine or split. Reading the diagnostic ~~~~~~~~~~~~~~~~~~~~~~ When a direct solve fails or returns garbage, the linear backend reports a condition-number estimate (often as ``rcond`` — the reciprocal). Pardiso, CHOLMOD, and MUMPS all expose this through the ``Result.diagnostics`` dict. Reading ``rcond < 1e-10`` is the same signal as :math:`\kappa > 10^{10}`. Performance tuning ------------------ For the linear-elastic structural slice femorph-solver ships today, the wall-time bottleneck is almost always the linear factorisation. The factor cost depends on three things: 1. **Backend choice** (Pardiso / CHOLMOD / MUMPS / SuperLU). 2. **Thread count** (``OMP_NUM_THREADS`` and friends). 3. **Matrix structure** (DOF count and connectivity / fill-in). When to switch backends ~~~~~~~~~~~~~~~~~~~~~~~ * **Default** — femorph-solver's auto-chain picks the fastest installed backend (Pardiso > CHOLMOD > MUMPS > SuperLU). Most workflows don't need to override. * **Pardiso** — fastest on Intel-class CPUs (the BLAS-pinned configuration ARC pods + most laptops use). Two-pass factorisation outperforms CHOLMOD on dense connectivity (3-D solid models). * **CHOLMOD** — competitive with Pardiso on sparse-graph models (beam-shell meshes), but pays a constant fill-in overhead Pardiso doesn't. Pick CHOLMOD when AMD CPUs are in play and Pardiso's MKL is an issue. * **MUMPS** — preferred when out-of-core (large models that don't fit in RAM); supports disk spillover. Slower than Pardiso / CHOLMOD on in-RAM problems. * **SuperLU** — fallback only. Pure-SciPy, no external dep, ~2-3× slower than the optimised backends. Switch via ``Model.solve_static(linear_solver="cholmod")`` etc. See :doc:`/reference/solvers/linear_backends` for the full backend-selection API. Thread pinning ~~~~~~~~~~~~~~ The library reads ``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, and ``OPENBLAS_NUM_THREADS`` for BLAS-pool sizing. CI pods pin all four to ``4`` because the cgroup quota gives ~4 vCPUs — larger pools just thrash from scheduler contention. Local guidance: * **Laptop / 8-core workstation** — set ``OMP_NUM_THREADS=4`` for most workloads. Leave 4 cores for the OS / your IDE. * **Beefy workstation / cluster node (32+ cores)** — set ``OMP_NUM_THREADS=8`` or ``16``; diminishing returns above 16 for typical structural-mechanics K matrices. Modal solves ~~~~~~~~~~~~ The eigensolver dominates wall-time on modal analyses. ``scipy.sparse.linalg.eigsh`` calls the linear backend once per Lanczos iteration — so eigensolver speed is dominated by the linear backend choice. All the linear-backend guidance above applies to modal solves the same way. For very large models (> 1 M DOF) consider ``Model.solve_modal(eigen_solver="lobpcg")`` which avoids the explicit factorisation but has slower convergence on clustered low frequencies. Memory ------ Direct factor memory scales as :math:`O(N^{1.5})` for 3-D solid meshes (Saad 2003 §3.6). For 100 k-cell HEX8 meshes that's typically 8-15 GB — fits in workstation RAM. Beyond 500 k cells the factor outgrows 32 GB and you'll need to either move to MUMPS out-of-core or drop to an iterative backend (CG with AMG preconditioner — currently shipped via ``linear_solver="cg+pyamg"``).