Source code for femorph_solver.solvers.linear._mumps

"""MUMPS multifrontal sparse direct solver via ``python-mumps``.

MUMPS is the go-to direct solver when Pardiso is unavailable or when the
redistributable-license / out-of-core story matters:

* **Licence** — CeCILL-C (LGPL-like).  Unlike MKL Pardiso, MUMPS can be
  shipped with a commercial redistribution without the MKL EULA.
* **Out-of-core** — mature OOC mode via ``factor(ooc=True)`` / MUMPS's
  ``ICNTL(22)``; spills the factor to disk so RAM stops being the
  ceiling on single-node solves.  Pardiso has an OOC mode too but it is
  less battle-tested.
* **Distributed memory** — MUMPS is the on-ramp to MPI-distributed
  solves across nodes, which neither Pardiso nor SuperLU offer.

On a single shared-memory node at ptr.cdb scale (≤1 M DOFs, SPD) MUMPS
is usually 1.5–3× slower than Pardiso and uses ~10–30 % more factor
memory.  Keep it available, but ``auto`` still prefers Pardiso > CHOLMOD
> MUMPS on SPD workloads.

Implementation notes for the python-mumps binding (``pip install
python-mumps``, import ``mumps``):

* The ``Context`` constructor takes no ``sym=`` argument; symmetry is
  declared on :meth:`Context.set_matrix` as ``symmetric=True``, which
  internally maps to MUMPS's ``sym=2`` (general-symmetric LDLᵀ).  The
  wrapper does not expose MUMPS's ``sym=1`` "strictly SPD" mode because
  python-mumps 0.0.6 doesn't — sym=2 handles SPD input correctly, just
  without the small memory win of skipping pivoting.
* The Context docstring warns "only complex numbers supported", but the
  underlying code dispatches on the input dtype (``dmumps`` for
  float64, ``zmumps`` for complex128).  Float64 works fine.

References
----------
- Amestoy, P. R., Duff, I. S., L'Excellent, J.-Y., & Koster, J.
  "A Fully Asynchronous Multifrontal Solver Using Distributed
  Dynamic Scheduling."  SIAM J. Matrix Anal. Appl. 23 (1), 15-41
  (2001).  [the MUMPS multifrontal + distributed-scheduling
  algorithm]
- Amestoy, P. R., Guermouche, A., L'Excellent, J.-Y., & Pralet, S.
  "Hybrid scheduling for the parallel solution of linear systems."
  Parallel Computing 32 (2), 136-156 (2006).  [hybrid
  static/dynamic scheduler shipped in MUMPS 4.x+]
- Agullo, E., Guermouche, A., & L'Excellent, J.-Y.  "Reducing the
  I/O volume in sparse out-of-core multifrontal methods."  SIAM J.
  Scientific Computing 31 (6), 4774-4794 (2010).  [the OOC path
  exposed here via ``ICNTL(22)``]
"""

from __future__ import annotations

import importlib.util
from typing import ClassVar

import numpy as np
import scipy.sparse as sp

from ._base import LinearSolverBase

#: Cheap availability probe — does not load the native MUMPS library.
_HAVE: bool = importlib.util.find_spec("mumps") is not None



[docs]
class MUMPSSolver(LinearSolverBase):
    """MUMPS sparse direct solver via python-mumps (multifrontal)."""

    name: ClassVar[str] = "mumps"
    kind: ClassVar[str] = "direct"
    spd_only: ClassVar[bool] = False
    install_hint: ClassVar[str] = (
        "`pip install python-mumps` (requires libmumps — e.g. apt install "
        "libmumps-scotch-dev, brew install mumps, or conda install -c conda-forge mumps)"
    )

    def __init__(
        self,
        A: sp.spmatrix | sp.sparray,
        *,
        assume_spd: bool = False,
        mixed_precision: bool | None = False,
        ooc: bool = False,
        ordering: str = "auto",
    ) -> None:
        """``ooc=True`` enables MUMPS out-of-core factorisation —
        the factor spills to ``$MUMPS_TMPDIR`` (or ``/tmp``) so peak
        RAM no longer includes the full L factor.  Trades wallclock
        for memory; useful when the factor wouldn't otherwise fit.

        ``ordering`` is MUMPS's reordering heuristic: ``"auto"``,
        ``"amd"``, ``"amf"``, ``"scotch"``, ``"pord"``, ``"metis"``,
        or ``"qamd"`` (availability depends on how libmumps was built).
        """
        self._ooc = ooc
        self._ordering = ordering
        super().__init__(A, assume_spd=assume_spd, mixed_precision=mixed_precision)


[docs]
    @staticmethod
    def available() -> bool:
        return _HAVE


    def _factor(self, A):
        import mumps

        # python-mumps wants a coo sparse array; set_matrix(symmetric=True)
        # drops the strictly-lower triangle internally so we don't need
        # to pre-triu the matrix ourselves.  When the assembler already
        # handed us upper-triangular storage (``femorph_triu``) we pass it
        # straight through with ``symmetric=False`` — otherwise MUMPS
        # would truncate it twice.
        if self._assume_spd:
            if getattr(A, "femorph_triu", False):
                A_coo = sp.coo_matrix(A).astype(np.float64, copy=False)
                symmetric_flag = False
            else:
                A_coo = sp.coo_matrix(A).astype(np.float64, copy=False)
                symmetric_flag = True
        else:
            A_coo = sp.coo_matrix(A).astype(np.float64, copy=False)
            symmetric_flag = False
        self._A = A_coo

        self._ctx = mumps.Context()
        self._ctx.set_matrix(self._A, symmetric=symmetric_flag)
        self._ctx.factor(ordering=self._ordering, ooc=self._ooc)


[docs]
    def solve(self, b: np.ndarray) -> np.ndarray:
        return self._ctx.solve(np.ascontiguousarray(b, dtype=np.float64))


    def __del__(self) -> None:
        # Context holds libmumps state; release it deterministically so
        # tests that churn through many factors don't accumulate native
        # allocations.  ``Context.__exit__`` runs the ``job=-2`` cleanup
        # but isn't invoked on plain GC.
        ctx = getattr(self, "_ctx", None)
        if ctx is not None and getattr(ctx, "mumps_instance", None) is not None:
            try:
                ctx.mumps_instance.job = -2
                ctx.mumps_instance.call()
            except Exception:
                pass
            ctx.mumps_instance = None