Test-suite layout#

Where each kind of test lives, what it asserts, and the conventions that keep the suite scaling. Companion to Verification-manual spec (which covers VM-corpus tests specifically); this page covers the broader test landscape.

The four test categories#

Every test in tests/ falls into exactly one of four categories. Choosing the right category is the most important authoring decision; it determines run cadence, fixture conventions, and what the test is allowed to assume.

Category

Path

Default pytest

What it asserts

Unit

tests/ (top level + most subdirs)

runs by default

One function / one method / one tightly-scoped behaviour. Constructs the smallest possible model in Python; no fixture files. Fast (< 1 s typical).

Element kernel

tests/elements/<element>/

runs by default

One element kernel against an analytical or per-element sanity check. May build a 1-element or N-element model in Python.

Interop / reader

tests/interop/<vendor>/

runs by default

One reader function against a fixture deck under tests/interop/<vendor>/fixtures/. Asserts model structure (cells, materials, real constants), not solve results.

Cross-solver / verification harness

tests/cross_solver/

runs by default

One published benchmark; reads a fixture pair through the reader, solves, asserts published reference within tolerance. Driven by the registry in _verification_registry.py.

Validation / convergence study

tests/validation/

excluded by default

Multi-refinement convergence study against a textbook closed form. Slow (minutes); runs in main-branch CI only.

The four categories don’t overlap. A test that ingests a fixture and asserts a closed form belongs in tests/cross_solver/ (via the registry), not in tests/interop/ and tests/validation/.

Test discovery + exclusion#

The suite uses pytest defaults plus a norecursedirs rule in pyproject.toml that excludes tests/validation/ from the default sweep. The exclusion exists because the validation suite runs multi-mesh convergence studies and easily hits 10-min wall time; running it on every PR is wasteful.

Validation runs:

  1. Main-branch CI — a dedicated job runs pytest tests/validation on every push to main. Failures block the docs build but do not block PR merges (to keep the PR lane fast).

  2. Ad-hoc locally — when you touch src/femorph_solver/validation/ or add a new problem to the catalogue, run the suite manually:

    pytest tests/validation -v
    

Cross-solver harness:

  • Registry-driven, parametrised; auto-discovers fixture pairs.

  • Runs in the regular pytest sweep. 138+ cases at the time of writing. Each case is fast (≤ 2 s) so the whole sweep finishes in tens of seconds.

  • The harness is split out into its own CI job (since #538) so failures surface independently.

Fixture conventions#

Vendor-format fixtures live under tests/interop/<vendor>/fixtures/ and are governed by the rules in Fixtures and decks. Two hard rules:

  1. Fixtures are immutable in the test repo. If a deck doesn’t round-trip, the fix is on the reader or kernel side, not the deck.

  2. Either re-author from problem statement or preserve the vendor ``/COM`` provenance. Don’t half-edit a vendor deck.

Python-built models inside test files (no fixture file) are fine for unit tests and element-kernel tests — the smallest model you can construct that exercises the behaviour under test. Don’t import a fixture-file path into a unit test; if you find yourself doing that, the test belongs in cross-solver or interop.

Naming#

  • Unit tests: tests/<module>/test_<thing>.py mirroring the source layout. Function: test_<behaviour_under_test>.

  • Element-kernel tests: tests/elements/<element>/test_<element>_<aspect>.py (e.g. test_beam2_distributed_load.py, test_quad4_shell_drilling.py).

  • Interop unit tests: one file per reader-card / phase (e.g. test_bdf_reader_phase2b.py, test_inp_reader_phase2.py).

  • Cross-solver harness: test_verification_round_trip.py is the single parametrised file. Don’t add per-problem files in tests/cross_solver/ unless the registry can’t yet express the assertion (multi-quantity, stress, every-node check); see Reader-pending fallback in the spec.

  • Validation: one file per problem; tests/validation/test_<problem>.py.

Authoring patterns#

Parametrisation (preferred over loop-and-assert)#

Write parametrised tests so each input is its own case and pytest prints the failing input. Auto-discovered parametrisation (from a registry / yaml / fixture directory) is preferred over hand-listing inputs:

@pytest.mark.parametrize(("stem", "fmt"), _params())
def test_verification_round_trip(stem, fmt):
    ...

vs. the discouraged form:

# avoid — failure prints "test_foo[params0]" with no useful name
@pytest.mark.parametrize("input", [1, 2, 3])
def test_foo(input):
    ...

Tolerances#

Every numerical assertion should declare both a relative and an absolute tolerance (or pick one that the test’s invariant clearly warrants). rtol=1e-9, atol=1e-12 is appropriate for “should be identical to floating-point determinism”. Engineering tolerances (1 %, 5 %) are appropriate when comparing to a closed form on a discretised mesh.

Never use assert a == b for floats. Use np.testing.assert_allclose or pytest.approx.

XFAIL vs. skip#

  • pytest.mark.xfail(reason=..., strict=False) — the test is expected to fail because of a tracked bug or feature gap. Use when the test is meaningful and should run, but failure should not block the suite. strict=False so accidental fixes don’t surprise-fail.

  • pytest.mark.skipif(condition, reason=...) — the test cannot run in this environment (missing optional dependency, missing local data file). Don’t use skipif to mask known bugs.

The verification harness has a custom split: xfail for the round-trip assertion, xfail_agreement for the cross-format agreement test (see Verification-manual spec). Mirror the pattern when adding new test families that have multiple independent assertions per row.

What to assert (and what not to)#

A good test asserts one thing and that thing is the smallest behaviour that can fail in a recognisable way.

Bad:

def test_solve():
    m = build_cantilever()
    r = m.solve_static()
    assert r.displacement.shape == (n,)
    assert np.allclose(r.displacement, expected, rtol=1e-3)
    assert r.reaction.shape == (n,)
    assert m.element_count() == 10

Good:

def test_cantilever_tip_displacement_matches_eb():
    m = build_cantilever()
    r = m.solve_static()
    np.testing.assert_allclose(r.displacement[-1, 1], 3.2e-3, rtol=1e-3)

def test_cantilever_reaction_balances_load():
    m = build_cantilever()
    r = m.solve_static()
    np.testing.assert_allclose(r.reaction.sum(axis=0), -applied_total, atol=1e-9)

Each test names its invariant; when one fails you immediately know which physical assumption broke.

Pre-commit + CI#

The repo runs pre-commit hooks on every commit (ruff, trailing whitespace, end-of-file fixers, large-file check, etc.). CI runs the full test matrix on every PR:

  • Tests workflow — unit + interop + element-kernel + cross-solver harness. Pytest under -n auto (xdist) for parallelism.

  • docs workflow — Sphinx build with strict mode (warnings as errors) + linkcheck.

  • Verification manual tests workflow (since #538) — split out so the cross-solver harness has its own status check.

  • Auto-update PRs behind main workflow — keeps green PRs current with main; see .github/workflows/auto-update-prs.yml (a ci_and_merge page is planned for this section).

When a CI step fails:

  • Reproduce locally. pytest tests/<failing-path> -v.

  • Diagnose the failure on its own merits — never just retry CI.

  • If the failure is platform-specific, raise it on the CI workflow rather than skipif-masking the test.

Performance baselines#

The repo carries a running performance log at PERFORMANCE.md plus detailed snapshots under perf/. When you add or change a hot path:

  • Drop a perf/snapshots/<short-name>.md capturing before/after.

  • Update perf/latest_<area>.md if the change moves a tracked metric.

  • Add a comparison plot to perf/trend/ if the change is large enough to warrant a series.

These files are markdown-by-design (machine-greppable; Sphinx-tree incompatible) and live outside doc/source/. See perf/README.md for the full layout.