Test-suite layout ================= Where each kind of test lives, what it asserts, and the conventions that keep the suite scaling. Companion to :doc:`verification_manual_spec` (which covers VM-corpus tests specifically); this page covers the broader test landscape. .. contents:: Page contents :local: :depth: 2 The four test categories ------------------------ Every test in ``tests/`` falls into exactly one of four categories. Choosing the right category is the most important authoring decision; it determines run cadence, fixture conventions, and what the test is allowed to assume. .. list-table:: :header-rows: 1 :widths: 18 16 18 48 * - Category - Path - Default ``pytest`` - What it asserts * - Unit - ``tests/`` (top level + most subdirs) - runs by default - One function / one method / one tightly-scoped behaviour. Constructs the smallest possible model in Python; no fixture files. Fast (< 1 s typical). * - Element kernel - ``tests/elements//`` - runs by default - One element kernel against an analytical or per-element sanity check. May build a 1-element or N-element model in Python. * - Interop / reader - ``tests/interop//`` - runs by default - One reader function against a fixture deck under ``tests/interop//fixtures/``. Asserts model structure (cells, materials, real constants), **not** solve results. * - Cross-solver / verification harness - ``tests/cross_solver/`` - runs by default - One published benchmark; reads a fixture pair through the reader, solves, asserts published reference within tolerance. Driven by the registry in ``_verification_registry.py``. * - Validation / convergence study - ``tests/validation/`` - excluded by default - Multi-refinement convergence study against a textbook closed form. Slow (minutes); runs in main-branch CI only. The four categories don't overlap. A test that ingests a fixture **and** asserts a closed form belongs in ``tests/cross_solver/`` (via the registry), not in ``tests/interop/`` *and* ``tests/validation/``. Test discovery + exclusion -------------------------- The suite uses ``pytest`` defaults plus a ``norecursedirs`` rule in ``pyproject.toml`` that excludes ``tests/validation/`` from the default sweep. The exclusion exists because the validation suite runs multi-mesh convergence studies and easily hits 10-min wall time; running it on every PR is wasteful. Validation runs: #. **Main-branch CI** — a dedicated job runs ``pytest tests/validation`` on every push to ``main``. Failures block the docs build but do not block PR merges (to keep the PR lane fast). #. **Ad-hoc locally** — when you touch ``src/femorph_solver/validation/`` or add a new problem to the catalogue, run the suite manually: .. code-block:: bash pytest tests/validation -v Cross-solver harness: * Registry-driven, parametrised; auto-discovers fixture pairs. * Runs in the regular ``pytest`` sweep. 138+ cases at the time of writing. Each case is fast (≤ 2 s) so the whole sweep finishes in tens of seconds. * The harness is split out into its own CI job (since #538) so failures surface independently. Fixture conventions ------------------- Vendor-format fixtures live under ``tests/interop//fixtures/`` and are governed by the rules in :doc:`fixtures_and_decks`. Two hard rules: #. **Fixtures are immutable** in the test repo. If a deck doesn't round-trip, the fix is on the *reader* or *kernel* side, not the deck. #. **Either re-author from problem statement or preserve the vendor ``/COM`` provenance**. Don't half-edit a vendor deck. Python-built models inside test files (no fixture file) are fine for unit tests and element-kernel tests — the smallest model you can construct that exercises the behaviour under test. Don't import a fixture-file path into a unit test; if you find yourself doing that, the test belongs in cross-solver or interop. Naming ------ * Unit tests: ``tests//test_.py`` mirroring the source layout. Function: ``test_``. * Element-kernel tests: ``tests/elements//test__.py`` (e.g. ``test_beam2_distributed_load.py``, ``test_quad4_shell_drilling.py``). * Interop unit tests: one file per reader-card / phase (e.g. ``test_bdf_reader_phase2b.py``, ``test_inp_reader_phase2.py``). * Cross-solver harness: ``test_verification_round_trip.py`` is the *single* parametrised file. Don't add per-problem files in ``tests/cross_solver/`` unless the registry can't yet express the assertion (multi-quantity, stress, every-node check); see :ref:`vm-spec-reader-pending` in the spec. * Validation: one file per problem; ``tests/validation/test_.py``. Authoring patterns ------------------ Parametrisation (preferred over loop-and-assert) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Write parametrised tests so each input is its own case and pytest prints the failing input. Auto-discovered parametrisation (from a registry / yaml / fixture directory) is preferred over hand-listing inputs: .. code-block:: python @pytest.mark.parametrize(("stem", "fmt"), _params()) def test_verification_round_trip(stem, fmt): ... vs. the discouraged form: .. code-block:: python # avoid — failure prints "test_foo[params0]" with no useful name @pytest.mark.parametrize("input", [1, 2, 3]) def test_foo(input): ... Tolerances ~~~~~~~~~~ Every numerical assertion should declare both a relative and an absolute tolerance (or pick one that the test's invariant clearly warrants). ``rtol=1e-9, atol=1e-12`` is appropriate for "should be identical to floating-point determinism". Engineering tolerances (1 %, 5 %) are appropriate when comparing to a closed form on a discretised mesh. Never use ``assert a == b`` for floats. Use ``np.testing.assert_allclose`` or ``pytest.approx``. XFAIL vs. skip ~~~~~~~~~~~~~~ * ``pytest.mark.xfail(reason=..., strict=False)`` — the test is expected to fail because of a tracked bug or feature gap. Use when the test is *meaningful* and should run, but failure should not block the suite. ``strict=False`` so accidental fixes don't surprise-fail. * ``pytest.mark.skipif(condition, reason=...)`` — the test cannot run in this environment (missing optional dependency, missing local data file). Don't use ``skipif`` to mask known bugs. The verification harness has a custom split: ``xfail`` for the round-trip assertion, ``xfail_agreement`` for the cross-format agreement test (see :doc:`verification_manual_spec`). Mirror the pattern when adding new test families that have multiple independent assertions per row. What to assert (and what not to) -------------------------------- A good test asserts *one thing* and that thing is the smallest behaviour that can fail in a recognisable way. Bad: .. code-block:: python def test_solve(): m = build_cantilever() r = m.solve_static() assert r.displacement.shape == (n,) assert np.allclose(r.displacement, expected, rtol=1e-3) assert r.reaction.shape == (n,) assert m.element_count() == 10 Good: .. code-block:: python def test_cantilever_tip_displacement_matches_eb(): m = build_cantilever() r = m.solve_static() np.testing.assert_allclose(r.displacement[-1, 1], 3.2e-3, rtol=1e-3) def test_cantilever_reaction_balances_load(): m = build_cantilever() r = m.solve_static() np.testing.assert_allclose(r.reaction.sum(axis=0), -applied_total, atol=1e-9) Each test names its invariant; when one fails you immediately know which physical assumption broke. Pre-commit + CI --------------- The repo runs ``pre-commit`` hooks on every commit (``ruff``, trailing whitespace, end-of-file fixers, large-file check, etc.). CI runs the full test matrix on every PR: * ``Tests`` workflow — unit + interop + element-kernel + cross-solver harness. Pytest under ``-n auto`` (xdist) for parallelism. * ``docs`` workflow — Sphinx build with strict mode (warnings as errors) + linkcheck. * ``Verification manual tests`` workflow (since #538) — split out so the cross-solver harness has its own status check. * ``Auto-update PRs behind main`` workflow — keeps green PRs current with main; see ``.github/workflows/auto-update-prs.yml`` (a ``ci_and_merge`` page is planned for this section). When a CI step fails: * Reproduce locally. ``pytest tests/ -v``. * Diagnose the failure on its own merits — never just retry CI. * If the failure is platform-specific, raise it on the CI workflow rather than ``skipif``-masking the test. Performance baselines --------------------- The repo carries a running performance log at ``PERFORMANCE.md`` plus detailed snapshots under ``perf/``. When you add or change a hot path: * Drop a ``perf/snapshots/.md`` capturing before/after. * Update ``perf/latest_.md`` if the change moves a tracked metric. * Add a comparison plot to ``perf/trend/`` if the change is large enough to warrant a series. These files are markdown-by-design (machine-greppable; Sphinx-tree incompatible) and live outside ``doc/source/``. See ``perf/README.md`` for the full layout.