Solvers
=======

femorph-solver separates the *assembly* step from the *linear* and *eigen*
solves, and every solve runs against a registry of pluggable backends.
This page documents what's registered, how to select a backend, and
the measured trade-offs between them.

.. contents::
   :local:
   :depth: 2

Pipeline
--------

A full static or modal call goes through three stages:

1. **Assembly** — :class:`femorph_solver.Model` builds the global ``K`` (and
   ``M`` for modal / transient) directly in CSR format via
   ``femorph_solver._core.build_csr_pattern`` + ``femorph_solver._core.scatter_into_csr``.
   Both are nanobind entry points that run on batched per-element
   matrices returned by the element kernels (``solid18X_ke_batch``,
   ``solid18X_me_*_batch``, …).  The symbolic pattern is cached on
   the model, so ``M`` after ``K`` skips the symbolic pass entirely.
2. **Linear solve** — :func:`femorph_solver.solvers.static.solve_static`
   partitions ``K u = F`` into free/fixed DOF blocks and dispatches
   ``u_f`` to a registered linear backend.
3. **Generalized eigensolve** — :func:`femorph_solver.solvers.modal.solve_modal`
   dispatches to a registered eigen backend (default: ARPACK in
   shift-invert mode with σ = 0), which itself calls a registered
   *linear* backend for its inner ``(K - σM)`` factorisation.

The registries are exposed as two module-level lookup helpers plus a
default-resolver:

- :func:`femorph_solver.solvers.linear.list_linear_solvers` — rows of
  ``{name, available, kind, spd_only, install_hint}``
- :func:`femorph_solver.solvers.eigen.list_eigen_solvers` — same shape
- :func:`femorph_solver.solvers.linear.select_default_linear_solver` —
  resolves ``"auto"`` to the fastest available SPD backend.

Linear backends
---------------

``auto`` is the default in every place that takes a ``linear_solver``
keyword (``Model.solve``, ``Model.modal_solve``,
:func:`~femorph_solver.solvers.static.solve_static`,
:func:`~femorph_solver.solvers.modal.solve_modal`, and every eigen backend's
shift-invert).  Resolution is size-aware: above 50 000 free DOFs the
preference is **pardiso → cholmod → umfpack → superlu** (Pardiso's
threaded factor wins by 3–5×); below that threshold Pardiso is demoted
behind CHOLMOD because its ~200 ms setup cost erases the factor-time
lead on small systems.  Users who install an optional backend later
pick it up with no code change.

If a solve crosses 50 000 DOFs on a machine where Pardiso *would* have
been chosen but ``pypardiso`` is not installed, femorph-solver emits a one-shot
:class:`UserWarning` pointing at ``pip install femorph_solver[pardiso]``.  The
warning fires at most once per process, so a modal sweep or sensitivity
study with many solves won't flood the logs.

.. list-table::
   :widths: 15 15 10 40 20
   :header-rows: 1

   * - Name
     - Kind
     - SPD
     - Notes
     - Install
   * - ``superlu``
     - direct
     -
     - SciPy's built-in SuperLU. Always available. Baseline accuracy
       and memory behaviour; good fallback.
     - none
   * - ``umfpack``
     - direct
     -
     - SuiteSparse UMFPACK. General (non-SPD-restricted) LU with AMD
       ordering. Slightly faster than SuperLU on unsymmetric systems.
     - ``pip install femorph_solver[umfpack]``
   * - ``cholmod``
     - direct
     - ✓
     - SuiteSparse supernodal Cholesky.  Dramatically lower fill-in
       than LU on SPD stiffness matrices — typically ~2× faster and
       ~2× less peak RSS than SuperLU on femorph-solver's modal pipeline.
     - ``pip install femorph_solver[cholmod]``
   * - ``pardiso``
     - direct
     -
     - Intel MKL PARDISO via pypardiso.  Fastest when installed on
       Intel hardware; multithreaded out of the box.  First in the
       ``auto`` preference order.
     - ``pip install femorph_solver[pardiso]``
   * - ``cg``
     - iterative
     - ✓
     - SciPy conjugate gradient with Jacobi preconditioner.  Not
       competitive on stiffness matrices without a stronger
       preconditioner — avoid for ARPACK shift-invert on real models.
     - none
   * - ``gmres``
     - iterative
     -
     - SciPy GMRES with Jacobi preconditioner.  Same caveat as CG.
     - none
   * - ``pyamg``
     - iterative
     - ✓
     - Smoothed-aggregation algebraic multigrid.  Useful research
       target; needs per-problem tolerance tuning to match direct
       solver accuracy.
     - ``pip install femorph_solver[pyamg]``

Select explicitly by name:

.. code-block:: python

   import femorph_solver

   m = femorph_solver.Model()
   # ... build mesh + BCs ...
   res = m.solve(linear_solver="cholmod")            # explicit
   modal = m.modal_solve(n_modes=10,
                         linear_solver="auto")       # preference chain

Eigen backends
--------------

Modal solves live in :mod:`femorph_solver.solvers.modal`.  Below
``DENSE_CUTOFF`` (500 free DOFs) femorph-solver always uses dense LAPACK
(:func:`scipy.linalg.eigh`); above it, the ``eigen_solver`` kwarg
selects one of the registered backends.

.. list-table::
   :widths: 15 25 60
   :header-rows: 1

   * - Name
     - Kind
     - Notes
   * - ``arpack``
     - sparse Lanczos
     - Default.  Uses :func:`scipy.sparse.linalg.eigsh` in shift-invert
       mode with σ = 0.  The ``linear_solver`` kwarg controls the inner
       factorisation.
   * - ``lobpcg``
     - sparse iterative
     - :func:`scipy.sparse.linalg.lobpcg` with a pluggable
       preconditioner (``preconditioner="factor" | "jacobi" | "none"``
       or a user-supplied LinearOperator).  Ships with SciPy.  The
       ``"factor"`` default matches the historical behaviour (full
       direct factor of K — same peak RSS as ARPACK shift-invert).
       Switch to ``"jacobi"`` for a **factor-free** path when K is too
       big to factor in RAM — storage drops to ``O(n)`` floats, at
       the cost of 2–5× more iterations on structural SPD.
   * - ``primme``
     - Davidson
     - Optional — requires :mod:`primme`.  Good alternative when
       ARPACK's memory footprint is a constraint.
   * - ``dense``
     - LAPACK dense
     - Small problems only.  Materialises ``K`` and ``M`` as dense
       arrays, calls :func:`scipy.linalg.eigh` on the pair.

Measured trade-offs
-------------------

Numbers below are from the solver-framework bench
(``perf/bench_solvers.py``) on flat HEX8 plates at two sizes,
``OMP_NUM_THREADS=4`` (a typical 4-core machine).  See
:doc:`performance` for the full progression and the larger pipeline
sweep.

The relative ordering is size-dependent: SuiteSparse CHOLMOD wins on
small problems (its setup is close to free), and Intel MKL PARDISO
wins on medium/large problems (its parallel factorisation recoups its
heavier setup).  SuperLU is always a reasonable zero-install baseline.

**Linear backends inside ARPACK shift-invert** (σ = 0, 10 lowest
modes, tol = 1e-10):

.. list-table::
   :widths: 14 10 6 12 12 12 34
   :header-rows: 2

   * - backend
     - kind
     - SPD
     - 60×60×2 plate
     - 100×100×4 plate
     -
     - notes
   * -
     -
     -
     - 32 940 DOFs
     - 151 500 DOFs
     - Δf₁ (Hz)
     -
   * - superlu
     - direct
     -
     - 1.10 s
     - 24.7 s
     - 1.4 × 10⁻⁷
     - SciPy default — always available
   * - cholmod
     - direct
     - ✓
     - **1.03 s**
     - 23.1 s
     - 1.2 × 10⁻¹⁴
     - fastest at ~30k DOFs; falls behind PARDISO past ~100k
   * - **pardiso**
     - direct
     -
     - 1.63 s
     - **7.3 s**
     - 3.5 × 10⁻⁷
     - **3.4× faster than SuperLU/CHOLMOD** at 150k DOFs
   * - cg
     - iterative
     - ✓
     - >120 s
     - >120 s
     - —
     - Jacobi insufficient for stiffness matrices
   * - gmres
     - iterative
     -
     - >120 s
     - >120 s
     - —
     - same caveat as CG
   * - pyamg
     - iterative
     - ✓
     - 65.9 s
     - —
     - 8.9 × 10¹
     - wrong eigenvalues at default tol; tuning required

**Standalone eigen backends** (without wrapping ARPACK):

.. list-table::
   :widths: 16 20 12 12 12 28
   :header-rows: 1

   * - backend
     - kind
     - 60×60×2 wall
     - 100×100×4 wall
     - Δf₁ (Hz)
     - notes
   * - arpack
     - sparse Lanczos
     - 1.14 s
     - 23.9 s
     - 7 × 10⁻¹⁴
     - default; SuperLU shift-invert inside
   * - lobpcg
     - sparse iterative
     - 35.5 s
     - 17.1 s
     - 3.4 × 10⁻⁷
     - ILU preconditioner; erratic at small N
   * - primme
     - Davidson
     - —
     - —
     - —
     - not installed in the bench
   * - dense
     - dense
     - —
     - —
     - —
     - disabled for N > 5 000

The PARDISO crossover above ~100k DOFs is the single biggest reason
to install ``femorph_solver[pardiso]`` on MKL hosts — the per-call setup cost
amortises quickly once the direct factorisation has real parallel
work to do.  Below ~50k DOFs its overhead is net-negative, which is
why the default ``auto`` chain prefers it *only* when it is
installed, and CHOLMOD / SuperLU stay close behind as fallbacks.

Recommendations
---------------

- **Default (structural modal / static)** — leave ``linear_solver``
  at ``"auto"`` and install ``femorph_solver[cholmod]``.  At <50k DOFs CHOLMOD
  matches or beats the alternatives; at >100k DOFs it still beats
  SuperLU roughly 1:1 in wall time while using less memory.  Falls
  back to SuperLU automatically if CHOLMOD is removed.
- **Intel MKL hosts, ≥100k DOFs** — add ``femorph_solver[pardiso]``.  PARDISO
  rises to the top of the ``auto`` chain and is typically 3×–5×
  faster than SuperLU / CHOLMOD on the shift-invert inner solve.
- **Unsymmetric / non-SPD systems** — pass ``linear_solver="auto"``
  with ``spd=False`` (or pick ``umfpack`` / ``superlu`` by name).
- **Iterative solvers** — reserve for research.  The registered
  Krylov + AMG backends exist for benchmarking, not as drop-in
  replacements for CHOLMOD / PARDISO on stiffness matrices.

Install extras
--------------

.. list-table::
   :widths: 25 30 45
   :header-rows: 1

   * - extra
     - pulls in
     - enables
   * - *(none)*
     - SciPy
     - ``superlu``, ``cg``, ``gmres``, ``arpack``, ``lobpcg``, ``dense``
   * - ``femorph_solver[cholmod]``
     - ``scikit-sparse``
     - ``cholmod``
   * - ``femorph_solver[umfpack]``
     - ``scikit-umfpack``
     - ``umfpack``
   * - ``femorph_solver[pardiso]``
     - ``pypardiso`` (+ MKL)
     - ``pardiso``
   * - ``femorph_solver[pyamg]``
     - ``pyamg``
     - ``pyamg``
   * - ``femorph_solver[primme]``
     - ``primme``
     - ``primme``
   * - ``femorph_solver[solvers]``
     - all of the above
     - every registered backend

Missing backends raise :class:`femorph_solver.solvers.linear.SolverUnavailableError`
with the matching install hint, so users see *how* to enable them.