.. _benchmark:

Benchmarking femorph-solver on your machine
===========================================

``femorph_solver.benchmark.run_benchmark()`` drives a standard
sweep of modal solves and writes both machine-readable (JSON) and
human-readable (HTML) outputs.  The intended uses:

* **Report back to maintainers.**  When you file an issue about
  wall-time or memory, attach the JSON from a ``basic``-level run —
  it captures the full host signature + per-row measurements.
* **Training data for the estimator.**  The JSON schema is stable
  (append-only); the TA-2 time/memory estimator loads every
  historical file it finds to refit its model across hardware.
* **Quick "is my install set up right?" check.**  The ``fast`` flag
  runs one 16×16×2 plate and verifies the whole pipeline works in
  under 10 s — useful after installing or upgrading MKL / CHOLMOD.

.. contents::
   :local:
   :depth: 2

Effort levels
-------------

Users pick one of three presets, each scoped to a time budget.  The
runner trims its size ladder as rows accumulate so a slow box sees
fewer rows rather than a wall-clock time-over.

.. list-table::
   :widths: 12 14 14 14 46
   :header-rows: 1

   * - Level
     - Sizes
     - Mode counts
     - Budget
     - When to use
   * - ``basic``
     - 3 (16²-64²)
     - 10
     - ≤ 15 min
     - Issue triage; CI smoke after install.  One-row ``--fast``
       variant lands in ~5 s.
   * - ``full``
     - 5 (16²-96²)
     - 10, 20
     - ≤ 1 h
     - Per-host scorecard.  Shows the backend ordering that
       ``auto`` would pick on your machine, plus how eigsh scales
       with mode count.
   * - ``exhaustive``
     - 8 (16²-192²)
     - 10, 20, 50
     - multi-hour
     - Overnight training-data pass for the TA-2 estimator.
       Includes OOC rows at sizes ≥ 500 k DOFs so the trade-off
       scatter is complete.

Each level enumerates a full matrix of ``(size, n_modes,
linear_solver, eigen_solver)``; unavailable backends (no MKL, no
CHOLMOD, ...) are skipped cleanly instead of failing the sweep.

Running from Python
-------------------

.. code-block:: python

    from femorph_solver.benchmark import run_benchmark, BenchmarkLevel

    result = run_benchmark(
        BenchmarkLevel.BASIC,
        out_dir="./bench-out/",
    )
    print(result.json_path)    # → bench-out/femorph-benchmark-basic-<ts>.json
    print(result.html_path)    # → bench-out/femorph-benchmark-basic-<ts>.html
    print(f"{len(result.rows)} rows in {result.total_wall_s:.1f} s")

Running from the command line
-----------------------------

.. code-block:: bash

    python -m femorph_solver.benchmark --level basic --out ./bench-out/

    # --fast trims to one row + 60 s cap (smoke / CI).
    python -m femorph_solver.benchmark --level basic --out . --fast

A ``femorph-solver benchmark`` console-script wrapper will land
alongside TA-7 — same interface.

Output format
-------------

**JSON** (``femorph-benchmark-<level>-<timestamp>.json``) — primary
machine-readable output.  Top-level keys:

.. code-block:: text

    schema_version        "1.0" — bumped additively when fields change
    level                 "basic" | "full" | "exhaustive"
    description           one-liner rendered at the top of HTML
    started_at            ISO-8601 timestamp
    finished_at           ISO-8601 timestamp
    total_wall_s          aggregate sweep duration
    budget_s              preset's wall budget
    budget_reached        bool — did the runner bail early?
    host_report           the ``femorph_solver.Report()`` string
    preset                full preset config echoed back for reproducibility
    rows                  list of per-row measurements (see below)

Each row has:

.. code-block:: text

    spec                  dict of (nx, ny, nz, n_modes, linear_solver,
                          eigen_solver, ooc, timeout_s)
    ok                    True / False — did the subprocess exit cleanly?
    wall_s                total wall-time including assembly + BC reduce
    eig_s                 eigsh-only wall-time
    assemble_s            _bc_reduce_and_release wall-time
    peak_rss_mb           ru_maxrss in MB (monotonic within the subprocess)
    n_dof                 free DOF count
    frequencies           first up-to-50 modal frequencies (Hz)
    error                 None on success; one-line error on failure

**HTML** (``femorph-benchmark-<level>-<timestamp>.html``) — styled
standalone report.  Includes a summary box, a full row table with
fail highlighting, and the host Report embedded verbatim for
copy-paste into issues.

Schema guarantees
-----------------

The JSON schema is **append-only** across releases:

- Every field documented in the table above will still be present
  and carry its current semantic in future versions.
- New fields may be added; older loaders should ``.get()`` them
  with a sensible default.
- ``schema_version`` bumps only when a field's semantic changes
  (never yet happened).

This stability is what lets the TA-2 estimator load every
historical ``femorph-benchmark-*.json`` that users contribute
without care-and-feeding for version drift.

Relation to the perf/ benches
-----------------------------

:mod:`femorph_solver.benchmark` is the **user-facing** benchmark.
For developer-facing tooling see :file:`perf/bench_pipeline.py`
(size-sweep microbench with per-stage timing) and
:file:`perf/bench_ooc_vs_incore.py` (live MAPDL OOC-vs-in-core
head-to-head).  Both of those emit their own JSON + markdown but
aren't part of the stable schema — the benchmark module is.