Solvers ======= femorph-solver separates the *assembly* step from the *linear* and *eigen* solves, and every solve runs against a registry of pluggable backends. This page documents what's registered, how to select a backend, and the measured trade-offs between them. .. contents:: :local: :depth: 2 Pipeline -------- A full static or modal call goes through three stages: 1. **Assembly** — :class:`femorph_solver.Model` builds the global ``K`` (and ``M`` for modal / transient) directly in CSR format via ``femorph_solver._core.build_csr_pattern`` + ``femorph_solver._core.scatter_into_csr``. Both are nanobind entry points that run on batched per-element matrices returned by the element kernels (``solid18X_ke_batch``, ``solid18X_me_*_batch``, …). The symbolic pattern is cached on the model, so ``M`` after ``K`` skips the symbolic pass entirely. 2. **Linear solve** — :func:`femorph_solver.solvers.static.solve_static` partitions ``K u = F`` into free/fixed DOF blocks and dispatches ``u_f`` to a registered linear backend. 3. **Generalized eigensolve** — :func:`femorph_solver.solvers.modal.solve_modal` dispatches to a registered eigen backend (default: ARPACK in shift-invert mode with σ = 0), which itself calls a registered *linear* backend for its inner ``(K - σM)`` factorisation. The registries are exposed as two module-level lookup helpers plus a default-resolver: - :func:`femorph_solver.solvers.linear.list_linear_solvers` — rows of ``{name, available, kind, spd_only, install_hint}`` - :func:`femorph_solver.solvers.eigen.list_eigen_solvers` — same shape - :func:`femorph_solver.solvers.linear.select_default_linear_solver` — resolves ``"auto"`` to the fastest available SPD backend. Linear backends --------------- ``auto`` is the default in every place that takes a ``linear_solver`` keyword (``Model.solve``, ``Model.modal_solve``, :func:`~femorph_solver.solvers.static.solve_static`, :func:`~femorph_solver.solvers.modal.solve_modal`, and every eigen backend's shift-invert). Resolution is size-aware: above 50 000 free DOFs the preference is **pardiso → cholmod → umfpack → superlu** (Pardiso's threaded factor wins by 3–5×); below that threshold Pardiso is demoted behind CHOLMOD because its ~200 ms setup cost erases the factor-time lead on small systems. Users who install an optional backend later pick it up with no code change. If a solve crosses 50 000 DOFs on a machine where Pardiso *would* have been chosen but ``pypardiso`` is not installed, femorph-solver emits a one-shot :class:`UserWarning` pointing at ``pip install femorph_solver[pardiso]``. The warning fires at most once per process, so a modal sweep or sensitivity study with many solves won't flood the logs. .. list-table:: :widths: 15 15 10 40 20 :header-rows: 1 * - Name - Kind - SPD - Notes - Install * - ``superlu`` - direct - - SciPy's built-in SuperLU. Always available. Baseline accuracy and memory behaviour; good fallback. - none * - ``umfpack`` - direct - - SuiteSparse UMFPACK. General (non-SPD-restricted) LU with AMD ordering. Slightly faster than SuperLU on unsymmetric systems. - ``pip install femorph_solver[umfpack]`` * - ``cholmod`` - direct - ✓ - SuiteSparse supernodal Cholesky. Dramatically lower fill-in than LU on SPD stiffness matrices — typically ~2× faster and ~2× less peak RSS than SuperLU on femorph-solver's modal pipeline. - ``pip install femorph_solver[cholmod]`` * - ``pardiso`` - direct - - Intel MKL PARDISO via pypardiso. Fastest when installed on Intel hardware; multithreaded out of the box. First in the ``auto`` preference order. - ``pip install femorph_solver[pardiso]`` * - ``cg`` - iterative - ✓ - SciPy conjugate gradient with Jacobi preconditioner. Not competitive on stiffness matrices without a stronger preconditioner — avoid for ARPACK shift-invert on real models. - none * - ``gmres`` - iterative - - SciPy GMRES with Jacobi preconditioner. Same caveat as CG. - none * - ``pyamg`` - iterative - ✓ - Smoothed-aggregation algebraic multigrid. Useful research target; needs per-problem tolerance tuning to match direct solver accuracy. - ``pip install femorph_solver[pyamg]`` Select explicitly by name: .. code-block:: python import femorph_solver m = femorph_solver.Model() # ... build mesh + BCs ... res = m.solve(linear_solver="cholmod") # explicit modal = m.modal_solve(n_modes=10, linear_solver="auto") # preference chain Eigen backends -------------- Modal solves live in :mod:`femorph_solver.solvers.modal`. Below ``DENSE_CUTOFF`` (500 free DOFs) femorph-solver always uses dense LAPACK (:func:`scipy.linalg.eigh`); above it, the ``eigen_solver`` kwarg selects one of the registered backends. .. list-table:: :widths: 15 25 60 :header-rows: 1 * - Name - Kind - Notes * - ``arpack`` - sparse Lanczos - Default. Uses :func:`scipy.sparse.linalg.eigsh` in shift-invert mode with σ = 0. The ``linear_solver`` kwarg controls the inner factorisation. * - ``lobpcg`` - sparse iterative - :func:`scipy.sparse.linalg.lobpcg` with a pluggable preconditioner (``preconditioner="factor" | "jacobi" | "none"`` or a user-supplied LinearOperator). Ships with SciPy. The ``"factor"`` default matches the historical behaviour (full direct factor of K — same peak RSS as ARPACK shift-invert). Switch to ``"jacobi"`` for a **factor-free** path when K is too big to factor in RAM — storage drops to ``O(n)`` floats, at the cost of 2–5× more iterations on structural SPD. * - ``primme`` - Davidson - Optional — requires :mod:`primme`. Good alternative when ARPACK's memory footprint is a constraint. * - ``dense`` - LAPACK dense - Small problems only. Materialises ``K`` and ``M`` as dense arrays, calls :func:`scipy.linalg.eigh` on the pair. Measured trade-offs ------------------- Numbers below are from the solver-framework bench (``perf/bench_solvers.py``) on flat HEX8 plates at two sizes, ``OMP_NUM_THREADS=4`` (a typical 4-core machine). See :doc:`performance` for the full progression and the larger pipeline sweep. The relative ordering is size-dependent: SuiteSparse CHOLMOD wins on small problems (its setup is close to free), and Intel MKL PARDISO wins on medium/large problems (its parallel factorisation recoups its heavier setup). SuperLU is always a reasonable zero-install baseline. **Linear backends inside ARPACK shift-invert** (σ = 0, 10 lowest modes, tol = 1e-10): .. list-table:: :widths: 14 10 6 12 12 12 34 :header-rows: 2 * - backend - kind - SPD - 60×60×2 plate - 100×100×4 plate - - notes * - - - - 32 940 DOFs - 151 500 DOFs - Δf₁ (Hz) - * - superlu - direct - - 1.10 s - 24.7 s - 1.4 × 10⁻⁷ - SciPy default — always available * - cholmod - direct - ✓ - **1.03 s** - 23.1 s - 1.2 × 10⁻¹⁴ - fastest at ~30k DOFs; falls behind PARDISO past ~100k * - **pardiso** - direct - - 1.63 s - **7.3 s** - 3.5 × 10⁻⁷ - **3.4× faster than SuperLU/CHOLMOD** at 150k DOFs * - cg - iterative - ✓ - >120 s - >120 s - — - Jacobi insufficient for stiffness matrices * - gmres - iterative - - >120 s - >120 s - — - same caveat as CG * - pyamg - iterative - ✓ - 65.9 s - — - 8.9 × 10¹ - wrong eigenvalues at default tol; tuning required **Standalone eigen backends** (without wrapping ARPACK): .. list-table:: :widths: 16 20 12 12 12 28 :header-rows: 1 * - backend - kind - 60×60×2 wall - 100×100×4 wall - Δf₁ (Hz) - notes * - arpack - sparse Lanczos - 1.14 s - 23.9 s - 7 × 10⁻¹⁴ - default; SuperLU shift-invert inside * - lobpcg - sparse iterative - 35.5 s - 17.1 s - 3.4 × 10⁻⁷ - ILU preconditioner; erratic at small N * - primme - Davidson - — - — - — - not installed in the bench * - dense - dense - — - — - — - disabled for N > 5 000 The PARDISO crossover above ~100k DOFs is the single biggest reason to install ``femorph_solver[pardiso]`` on MKL hosts — the per-call setup cost amortises quickly once the direct factorisation has real parallel work to do. Below ~50k DOFs its overhead is net-negative, which is why the default ``auto`` chain prefers it *only* when it is installed, and CHOLMOD / SuperLU stay close behind as fallbacks. Recommendations --------------- - **Default (structural modal / static)** — leave ``linear_solver`` at ``"auto"`` and install ``femorph_solver[cholmod]``. At <50k DOFs CHOLMOD matches or beats the alternatives; at >100k DOFs it still beats SuperLU roughly 1:1 in wall time while using less memory. Falls back to SuperLU automatically if CHOLMOD is removed. - **Intel MKL hosts, ≥100k DOFs** — add ``femorph_solver[pardiso]``. PARDISO rises to the top of the ``auto`` chain and is typically 3×–5× faster than SuperLU / CHOLMOD on the shift-invert inner solve. - **Unsymmetric / non-SPD systems** — pass ``linear_solver="auto"`` with ``spd=False`` (or pick ``umfpack`` / ``superlu`` by name). - **Iterative solvers** — reserve for research. The registered Krylov + AMG backends exist for benchmarking, not as drop-in replacements for CHOLMOD / PARDISO on stiffness matrices. Install extras -------------- .. list-table:: :widths: 25 30 45 :header-rows: 1 * - extra - pulls in - enables * - *(none)* - SciPy - ``superlu``, ``cg``, ``gmres``, ``arpack``, ``lobpcg``, ``dense`` * - ``femorph_solver[cholmod]`` - ``scikit-sparse`` - ``cholmod`` * - ``femorph_solver[umfpack]`` - ``scikit-umfpack`` - ``umfpack`` * - ``femorph_solver[pardiso]`` - ``pypardiso`` (+ MKL) - ``pardiso`` * - ``femorph_solver[pyamg]`` - ``pyamg`` - ``pyamg`` * - ``femorph_solver[primme]`` - ``primme`` - ``primme`` * - ``femorph_solver[solvers]`` - all of the above - every registered backend Missing backends raise :class:`femorph_solver.solvers.linear.SolverUnavailableError` with the matching install hint, so users see *how* to enable them.