.. _estimating:

Estimating wall time + memory before you solve
==============================================

``Model.estimate_solve()`` returns predicted ``(wall_s,
peak_rss_mb)`` for a modal or static solve **without** running it.
Useful when you want to:

- Check a job fits in your SLURM time / memory budget before
  submitting.
- Pick between in-core and OOC up front.
- Compare how a problem scales with ``n_dof`` without the
  measurement overhead.

.. contents::
   :local:
   :depth: 2

How it works
------------

The estimator fits a simple log-log power law per
``(host_signature, linear_solver)`` bucket — on 3D SPD meshes
under sparse-Cholesky ordering, factor fill and solve cost both
scale as :math:`n^{4/3}` (George 1973), so ``log(wall_s)`` is
nearly linear in ``log(n_dof)`` for any fixed host + backend.

Training data comes from the TA-6 benchmark module
(:mod:`femorph_solver.benchmark`): every
``femorph-benchmark-*.json`` file in the current working
directory is loaded, broken into training rows, and grouped by
``(host_signature, linear_solver)``.  Buckets with
≥ 2 non-OOC rows get their own coefficient fit; everything else
falls back to a shared cross-host fit, and if no training data is
available at all the estimator uses a shape-of-universe prior
tuned from the repo's own Intel Core i9-14900KF benchmark sweeps
(with a ``p95 = 2 × p50`` confidence band to signal you're looking
at an uncalibrated prediction).

Usage
-----

.. code-block:: python

    import numpy as np, pyvista as pv
    import femorph_solver

    # Build your model as usual...
    m = femorph_solver.Model.from_grid(grid)
    m.et(1, "SOLID185")
    m.mp("EX", 1, 2.0e11); m.mp("PRXY", 1, 0.3); m.mp("DENS", 1, 7850)
    # ... apply BCs

    est = m.estimate_solve(n_modes=10)
    print(est)
    # Estimate(wall_s=p50 3.47 / p95 4.16,
    #          peak_rss_mb=p50 892 / p95 1070,
    #          bucket='Intel(R) Core(TM) i9-14900KF|94|Linux|x86_64|mkl_direct',
    #          n_training=8)

    if est.wall_s_p95 > 60:
        print("This is a long-running solve — consider OOC or a bigger host.")

Training on your own host
-------------------------

Run the TA-6 benchmark once to seed training data:

.. code-block:: bash

    python -m femorph_solver.benchmark --level basic --out ./

Then subsequent ``estimate_solve`` calls read the
``femorph-benchmark-basic-*.json`` file and fit per-host
coefficients for the backends the ``basic`` level exercised
(``auto`` + ``arpack``).  Run the ``full`` level to cover more
backends:

.. code-block:: bash

    python -m femorph_solver.benchmark --level full --out ./

Extending the feature vector
----------------------------

:class:`~femorph_solver.estimators.HostSpec` carries an
``extras`` dict — arbitrary key/value strings from the TA-6
benchmark's host report.  Future retraining passes can add
features (memory bandwidth benchmark, BLAS thread count,
disk type for OOC) by extending that dict and bumping the
estimator's feature-extraction step; historical JSON files
stay loadable because the loader reads every field via
``.get()`` with defaults.

Limitations
-----------

- **Per-host fits need data.**  On a first run with no training
  JSON, the estimator falls back to the repo's prior, which was
  tuned on a 14900KF + Intel MKL.  Expect wide ``p95`` bands
  (2× ``p50``) until you've run the benchmark at least once on
  your hardware.
- **Extrapolation beyond training range is unreliable.**  If your
  training rows top out at 200 k DOFs and you ask for 1 M, the
  power-law prediction might still be in the ballpark — but the
  confidence band doesn't widen enough.  Run the benchmark at a
  size close to your target problem for a tight estimate.
- **Only ``n_dof`` is currently used as a feature.**  Future
  revisions will add ``nnz`` / mesh aspect ratio / mode count as
  features — the schema is already append-only, so expanded
  training data flows through without a loader change.

References
----------

- George, A. "Nested dissection of a regular finite element
  mesh." SIAM J. Numer. Anal. 10 (1973), pp. 345-363.  Origin of
  the :math:`n^{4/3}` factor-fill bound for structured 3D grids
  — the power-law basis for the log-log fit.