Test-suite layout#

Where each kind of test lives, what it asserts, and the conventions that keep the suite scaling. Companion to Verification-manual spec (which covers VM-corpus tests specifically); this page covers the broader test landscape.

The four test categories #

Every test in tests/ falls into exactly one of four categories. Choosing the right category is the most important authoring decision; it determines run cadence, fixture conventions, and what the test is allowed to assume.

Category	Path	Default `pytest`	What it asserts
Unit	`tests/` (top level + most subdirs)	runs by default	One function / one method / one tightly-scoped behaviour. Constructs the smallest possible model in Python; no fixture files. Fast (< 1 s typical).
Element kernel	`tests/elements/<element>/`	runs by default	One element kernel against an analytical or per-element sanity check. May build a 1-element or N-element model in Python.
Interop / reader	`tests/interop/<vendor>/`	runs by default	One reader function against a fixture deck under `tests/interop/<vendor>/fixtures/`. Asserts model structure (cells, materials, real constants), not solve results.
Cross-solver / verification harness	`tests/cross_solver/`	runs by default	One published benchmark; reads a fixture pair through the reader, solves, asserts published reference within tolerance. Driven by the registry in `_verification_registry.py`.
Validation / convergence study	`tests/validation/`	excluded by default	Multi-refinement convergence study against a textbook closed form. Slow (minutes); runs in main-branch CI only.

The four categories don’t overlap. A test that ingests a fixture and asserts a closed form belongs in tests/cross_solver/ (via the registry), not in tests/interop/ and tests/validation/.

Test discovery + exclusion #

The suite uses pytest defaults plus a norecursedirs rule in pyproject.toml that excludes tests/validation/ from the default sweep. The exclusion exists because the validation suite runs multi-mesh convergence studies and easily hits 10-min wall time; running it on every PR is wasteful.

Validation runs:

Main-branch CI — a dedicated job runs pytest tests/validation on every push to main. Failures block the docs build but do not block PR merges (to keep the PR lane fast).
Ad-hoc locally — when you touch src/femorph_solver/validation/ or add a new problem to the catalogue, run the suite manually:
```
pytest tests/validation -v
```

Cross-solver harness:

Registry-driven, parametrised; auto-discovers fixture pairs.
Runs in the regular pytest sweep. 138+ cases at the time of writing. Each case is fast (≤ 2 s) so the whole sweep finishes in tens of seconds.
The harness is split out into its own CI job (since #538) so failures surface independently.

Fixture conventions #

Vendor-format fixtures live under tests/interop/<vendor>/fixtures/ and are governed by the rules in Fixtures and decks. Two hard rules:

Fixtures are immutable in the test repo. If a deck doesn’t round-trip, the fix is on the reader or kernel side, not the deck.
Either re-author from problem statement or preserve the vendor ``/COM`` provenance. Don’t half-edit a vendor deck.

Python-built models inside test files (no fixture file) are fine for unit tests and element-kernel tests — the smallest model you can construct that exercises the behaviour under test. Don’t import a fixture-file path into a unit test; if you find yourself doing that, the test belongs in cross-solver or interop.

Naming #

Unit tests: tests/<module>/test_<thing>.py mirroring the source layout. Function: test_<behaviour_under_test>.
Element-kernel tests: tests/elements/<element>/test_<element>_<aspect>.py (e.g. test_beam2_distributed_load.py, test_quad4_shell_drilling.py).
Interop unit tests: one file per reader-card / phase (e.g. test_bdf_reader_phase2b.py, test_inp_reader_phase2.py).
Cross-solver harness: test_verification_round_trip.py is the single parametrised file. Don’t add per-problem files in tests/cross_solver/ unless the registry can’t yet express the assertion (multi-quantity, stress, every-node check); see Reader-pending fallback in the spec.
Validation: one file per problem; tests/validation/test_<problem>.py.

Authoring patterns #

Parametrisation (preferred over loop-and-assert)#

Write parametrised tests so each input is its own case and pytest prints the failing input. Auto-discovered parametrisation (from a registry / yaml / fixture directory) is preferred over hand-listing inputs:

@pytest.mark.parametrize(("stem", "fmt"), _params())
def test_verification_round_trip(stem, fmt):
    ...

vs. the discouraged form:

# avoid — failure prints "test_foo[params0]" with no useful name
@pytest.mark.parametrize("input", [1, 2, 3])
def test_foo(input):
    ...

Tolerances #

Every numerical assertion should declare both a relative and an absolute tolerance (or pick one that the test’s invariant clearly warrants). rtol=1e-9, atol=1e-12 is appropriate for “should be identical to floating-point determinism”. Engineering tolerances (1 %, 5 %) are appropriate when comparing to a closed form on a discretised mesh.

Never use assert a == b for floats. Use np.testing.assert_allclose or pytest.approx.

XFAIL vs. skip #

pytest.mark.xfail(reason=..., strict=False) — the test is expected to fail because of a tracked bug or feature gap. Use when the test is meaningful and should run, but failure should not block the suite. strict=False so accidental fixes don’t surprise-fail.
pytest.mark.skipif(condition, reason=...) — the test cannot run in this environment (missing optional dependency, missing local data file). Don’t use skipif to mask known bugs.

The verification harness has a custom split: xfail for the round-trip assertion, xfail_agreement for the cross-format agreement test (see Verification-manual spec). Mirror the pattern when adding new test families that have multiple independent assertions per row.

What to assert (and what not to)#

A good test asserts one thing and that thing is the smallest behaviour that can fail in a recognisable way.

Bad:

def test_solve():
    m = build_cantilever()
    r = m.solve_static()
    assert r.displacement.shape == (n,)
    assert np.allclose(r.displacement, expected, rtol=1e-3)
    assert r.reaction.shape == (n,)
    assert m.element_count() == 10

Good:

def test_cantilever_tip_displacement_matches_eb():
    m = build_cantilever()
    r = m.solve_static()
    np.testing.assert_allclose(r.displacement[-1, 1], 3.2e-3, rtol=1e-3)

def test_cantilever_reaction_balances_load():
    m = build_cantilever()
    r = m.solve_static()
    np.testing.assert_allclose(r.reaction.sum(axis=0), -applied_total, atol=1e-9)

Each test names its invariant; when one fails you immediately know which physical assumption broke.

Pre-commit + CI #

The repo runs pre-commit hooks on every commit (ruff, trailing whitespace, end-of-file fixers, large-file check, etc.). CI runs the full test matrix on every PR:

Tests workflow — unit + interop + element-kernel + cross-solver harness. Pytest under -n auto (xdist) for parallelism.
docs workflow — Sphinx build with strict mode (warnings as errors) + linkcheck.
Verification manual tests workflow (since #538) — split out so the cross-solver harness has its own status check.
Auto-update PRs behind main workflow — keeps green PRs current with main; see .github/workflows/auto-update-prs.yml (a ci_and_merge page is planned for this section).

When a CI step fails:

Reproduce locally. pytest tests/<failing-path> -v.
Diagnose the failure on its own merits — never just retry CI.
If the failure is platform-specific, raise it on the CI workflow rather than skipif-masking the test.

Performance baselines #

The repo carries a running performance log at PERFORMANCE.md plus detailed snapshots under perf/. When you add or change a hot path:

Drop a perf/snapshots/<short-name>.md capturing before/after.
Update perf/latest_<area>.md if the change moves a tracked metric.
Add a comparison plot to perf/trend/ if the change is large enough to warrant a series.

These files are markdown-by-design (machine-greppable; Sphinx-tree incompatible) and live outside doc/source/. See perf/README.md for the full layout.