Test-suite layout
=================

Where each kind of test lives, what it asserts, and the conventions
that keep the suite scaling.  Companion to
:doc:`verification_manual_spec` (which covers VM-corpus tests
specifically); this page covers the broader test landscape.

.. contents:: Page contents
    :local:
    :depth: 2

The four test categories
------------------------

Every test in ``tests/`` falls into exactly one of four categories.
Choosing the right category is the most important authoring
decision; it determines run cadence, fixture conventions, and what
the test is allowed to assume.

.. list-table::
    :header-rows: 1
    :widths: 18 16 18 48

    * - Category
      - Path
      - Default ``pytest``
      - What it asserts
    * - Unit
      - ``tests/`` (top level + most subdirs)
      - runs by default
      - One function / one method / one tightly-scoped behaviour.
        Constructs the smallest possible model in Python; no fixture
        files.  Fast (< 1 s typical).
    * - Element kernel
      - ``tests/elements/<element>/``
      - runs by default
      - One element kernel against an analytical or per-element
        sanity check.  May build a 1-element or N-element model in
        Python.
    * - Interop / reader
      - ``tests/interop/<vendor>/``
      - runs by default
      - One reader function against a fixture deck under
        ``tests/interop/<vendor>/fixtures/``.  Asserts model
        structure (cells, materials, real constants), **not** solve
        results.
    * - Cross-solver / verification harness
      - ``tests/cross_solver/``
      - runs by default
      - One published benchmark; reads a fixture pair through the
        reader, solves, asserts published reference within
        tolerance.  Driven by the registry in
        ``_verification_registry.py``.
    * - Validation / convergence study
      - ``tests/validation/``
      - excluded by default
      - Multi-refinement convergence study against a textbook closed
        form.  Slow (minutes); runs in main-branch CI only.

The four categories don't overlap.  A test that ingests a fixture
**and** asserts a closed form belongs in ``tests/cross_solver/`` (via
the registry), not in ``tests/interop/`` *and* ``tests/validation/``.

Test discovery + exclusion
--------------------------

The suite uses ``pytest`` defaults plus a ``norecursedirs`` rule
in ``pyproject.toml`` that excludes ``tests/validation/`` from the
default sweep.  The exclusion exists because the validation suite
runs multi-mesh convergence studies and easily hits 10-min wall time;
running it on every PR is wasteful.

Validation runs:

#. **Main-branch CI** — a dedicated job runs ``pytest
   tests/validation`` on every push to ``main``.  Failures block the
   docs build but do not block PR merges (to keep the PR lane fast).
#. **Ad-hoc locally** — when you touch
   ``src/femorph_solver/validation/`` or add a new problem to the
   catalogue, run the suite manually:

   .. code-block:: bash

       pytest tests/validation -v

Cross-solver harness:

* Registry-driven, parametrised; auto-discovers fixture pairs.
* Runs in the regular ``pytest`` sweep.  138+ cases at the time of
  writing.  Each case is fast (≤ 2 s) so the whole sweep finishes
  in tens of seconds.
* The harness is split out into its own CI job (since #538) so
  failures surface independently.

Fixture conventions
-------------------

Vendor-format fixtures live under ``tests/interop/<vendor>/fixtures/``
and are governed by the rules in :doc:`fixtures_and_decks`.  Two
hard rules:

#. **Fixtures are immutable** in the test repo.  If a deck doesn't
   round-trip, the fix is on the *reader* or *kernel* side, not the
   deck.
#. **Either re-author from problem statement or preserve the vendor
   ``/COM`` provenance**.  Don't half-edit a vendor deck.

Python-built models inside test files (no fixture file) are fine for
unit tests and element-kernel tests — the smallest model you can
construct that exercises the behaviour under test.  Don't import a
fixture-file path into a unit test; if you find yourself doing that,
the test belongs in cross-solver or interop.

Naming
------

* Unit tests: ``tests/<module>/test_<thing>.py`` mirroring the source
  layout.  Function: ``test_<behaviour_under_test>``.
* Element-kernel tests:
  ``tests/elements/<element>/test_<element>_<aspect>.py``
  (e.g. ``test_beam2_distributed_load.py``,
  ``test_quad4_shell_drilling.py``).
* Interop unit tests: one file per reader-card / phase
  (e.g. ``test_bdf_reader_phase2b.py``,
  ``test_inp_reader_phase2.py``).
* Cross-solver harness: ``test_verification_round_trip.py`` is the
  *single* parametrised file.  Don't add per-problem files in
  ``tests/cross_solver/`` unless the registry can't yet express the
  assertion (multi-quantity, stress, every-node check); see
  :ref:`vm-spec-reader-pending` in the spec.
* Validation: one file per problem;
  ``tests/validation/test_<problem>.py``.

Authoring patterns
------------------

Parametrisation (preferred over loop-and-assert)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Write parametrised tests so each input is its own case and pytest
prints the failing input.  Auto-discovered parametrisation (from a
registry / yaml / fixture directory) is preferred over hand-listing
inputs:

.. code-block:: python

    @pytest.mark.parametrize(("stem", "fmt"), _params())
    def test_verification_round_trip(stem, fmt):
        ...

vs. the discouraged form:

.. code-block:: python

    # avoid — failure prints "test_foo[params0]" with no useful name
    @pytest.mark.parametrize("input", [1, 2, 3])
    def test_foo(input):
        ...

Tolerances
~~~~~~~~~~

Every numerical assertion should declare both a relative and an
absolute tolerance (or pick one that the test's invariant clearly
warrants).  ``rtol=1e-9, atol=1e-12`` is appropriate for "should be
identical to floating-point determinism".  Engineering tolerances
(1 %, 5 %) are appropriate when comparing to a closed form on a
discretised mesh.

Never use ``assert a == b`` for floats.  Use ``np.testing.assert_allclose``
or ``pytest.approx``.

XFAIL vs. skip
~~~~~~~~~~~~~~

* ``pytest.mark.xfail(reason=..., strict=False)`` — the test is
  expected to fail because of a tracked bug or feature gap.  Use
  when the test is *meaningful* and should run, but failure should
  not block the suite.  ``strict=False`` so accidental fixes don't
  surprise-fail.
* ``pytest.mark.skipif(condition, reason=...)`` — the test cannot run
  in this environment (missing optional dependency, missing local
  data file).  Don't use ``skipif`` to mask known bugs.

The verification harness has a custom split: ``xfail`` for the
round-trip assertion, ``xfail_agreement`` for the cross-format
agreement test (see :doc:`verification_manual_spec`).  Mirror the
pattern when adding new test families that have multiple
independent assertions per row.

What to assert (and what not to)
--------------------------------

A good test asserts *one thing* and that thing is the smallest
behaviour that can fail in a recognisable way.

Bad:

.. code-block:: python

    def test_solve():
        m = build_cantilever()
        r = m.solve_static()
        assert r.displacement.shape == (n,)
        assert np.allclose(r.displacement, expected, rtol=1e-3)
        assert r.reaction.shape == (n,)
        assert m.element_count() == 10

Good:

.. code-block:: python

    def test_cantilever_tip_displacement_matches_eb():
        m = build_cantilever()
        r = m.solve_static()
        np.testing.assert_allclose(r.displacement[-1, 1], 3.2e-3, rtol=1e-3)

    def test_cantilever_reaction_balances_load():
        m = build_cantilever()
        r = m.solve_static()
        np.testing.assert_allclose(r.reaction.sum(axis=0), -applied_total, atol=1e-9)

Each test names its invariant; when one fails you immediately know
which physical assumption broke.

Pre-commit + CI
---------------

The repo runs ``pre-commit`` hooks on every commit (``ruff``,
trailing whitespace, end-of-file fixers, large-file check, etc.).
CI runs the full test matrix on every PR:

* ``Tests`` workflow — unit + interop + element-kernel + cross-solver
  harness.  Pytest under ``-n auto`` (xdist) for parallelism.
* ``docs`` workflow — Sphinx build with strict mode (warnings as
  errors) + linkcheck.
* ``Verification manual tests`` workflow (since #538) — split out so
  the cross-solver harness has its own status check.
* ``Auto-update PRs behind main`` workflow — keeps green PRs current
  with main; see ``.github/workflows/auto-update-prs.yml`` (a
  ``ci_and_merge`` page is planned for this section).

When a CI step fails:

* Reproduce locally.  ``pytest tests/<failing-path> -v``.
* Diagnose the failure on its own merits — never just retry CI.
* If the failure is platform-specific, raise it on the CI workflow
  rather than ``skipif``-masking the test.

Performance baselines
---------------------

The repo carries a running performance log at ``PERFORMANCE.md`` plus
detailed snapshots under ``perf/``.  When you add or change a hot
path:

* Drop a ``perf/snapshots/<short-name>.md`` capturing before/after.
* Update ``perf/latest_<area>.md`` if the change moves a tracked
  metric.
* Add a comparison plot to ``perf/trend/`` if the change is large
  enough to warrant a series.

These files are markdown-by-design (machine-greppable; Sphinx-tree
incompatible) and live outside ``doc/source/``.  See ``perf/README.md``
for the full layout.