Performance — profiling, snapshots, the perf tracker#

How performance work flows through the repo: where to profile, how to capture a snapshot, when to commit one to the perf tracker, and how to read the running history.

The repo treats performance as a first-class artefact — every optimisation that lands gets a dated row in PERFORMANCE.md so the history stays visible. Old rows are not collapsed; the point is to see numbers move over time.

Two harnesses #

Performance numbers come from one of two harnesses, both checked into the repo:

Harness	Path	Drives
Micro / assembly	`tests/benchmarks/`	`pytest --benchmark-only` over the standard 2 × 40 × 40 SOLID185 flat plate (5 043 nodes, 3 200 hex cells, 15 129 DOF — see `tests/integration/_flat_plate.py`). Per-element kernels run on a single unit cube / unit tet.
End-to-end pipeline	`perf/bench_pipeline.py`	Drives a parameterised SOLID185 flat plate (clamped at `X = 0`, 10 modes) through the full `Model.solve_modal` pipeline with per-stage timing and peak-RSS capture.

The micro harness reports median wall time per round (≥ 5 rounds via pytest-benchmark) so the noise floor is bounded. The pipeline harness reports per-stage breakdown so a slowdown landing in one stage is visible without re-running the whole suite.

The perf tracker — `PERFORMANCE.md`#

A markdown changelog at the repo root. Every meaningful perf change gets a new dated section.

What goes in:

Each landed optimisation — date, PR ref, the metric it moved (assembly time, peak RSS, modal solve, …), before / after numbers.
Cross-platform / cross-backend baselines — when a new solver backend lands, capture its numbers against the default so future comparisons start from a known floor.
Regression diagnoses — when a regression surfaces, the diagnosis lands in PERFORMANCE.md alongside the fix so the trail is auditable.

What does not go in:

Speculative numbers from a half-implemented PR. Wait until it merges, capture the final number, then write the row.
Numbers from a non-standard machine without explicit call-out. The flat-plate baseline assumes the maintainer’s reference machine; community contributions noting different hardware go in a separate “external timings” subsection.

How to capture a snapshot #

Snapshots are dated markdown files under perf/snapshots/ that pin a moment-in-time set of numbers. They’re cited from PERFORMANCE.md rows that need a richer breakdown than the one-line summary fits.

Naming: perf/snapshots/<short-name>_<YYYY-MM-DD>.md (the short name describes the change — mem_after_triu_k, solvers_baseline, etc.).

When to take one:

The change touches the assembly inner loop or any path that runs once per element. Even a 1 % shift compounds across millions of elements.
The change moves peak RSS by more than 5 %.
A new solver backend lands.

To take one:

# Micro / per-element kernel times
pytest tests/benchmarks --benchmark-only \
    --benchmark-save=<short-name> \
    -o python_files=bench_*.py

# End-to-end pipeline (timing + peak RSS)
python perf/bench_pipeline.py --output perf/snapshots/<short-name>_$(date +%F).md

The bench_pipeline output is a markdown table; drop it into a file under perf/snapshots/, add a one-paragraph summary at the top citing the change that moved the numbers, then link from a row in PERFORMANCE.md.

The trend tracker #

perf/trend/ holds longer-running series — one file per metric, appended to over many releases. Use when:

The metric you’re tracking has more than two data points and belongs on a chart.
The series spans multiple PRs (e.g. assembly time over a refactor that lands in three steps).

The format is one row per measurement with date, commit, value, and any notes. perf/trend/README.md codifies the conventions.

Profiling #

Three tools cover what the team profiles:

cProfile / pyinstrument — Python-side bottlenecks, per-function timing. Wrap the smallest reproducer that shows the slowdown; don’t profile the whole test suite.
py-spy — sampling profiler for processes you don’t want to restart. Useful for diagnosing a stuck solve_modal that’s been running for hours.
memray — peak-RSS and allocation tracking. The pipeline harness already wraps this when invoked with --memory.

Don’t commit profile traces. Save them under /tmp/ or your scratch dir; if the diagnosis is interesting, distil to a written summary in the PR description and a PERFORMANCE.md row.

The release-readiness regression suite #

The release-readiness GitHub Action (see .github/workflows/release-readiness.yml) runs on every push to main and emits a diff against the previous tagged release’s pipeline-bench numbers. Regressions ≥ 10 % block the docs deploy until investigated.

When a regression alert fires:

Pull the most recent perf/snapshots/ against the regressed metric.
Bisect with git bisect against the perf/bench_pipeline.py target.
The diagnosis lands in PERFORMANCE.md with the fix.

Common pitfalls #

Testing on a debug build. Always use the optimised install (uv pip install -e .). A debug build’s numbers bear no relation to production.
Letting the scheduler interfere. Before benchmarking, pin the process: taskset -c 0 ... on Linux. Background noise easily moves a 1-second benchmark by 10 %.
Using a stale virtualenv after dependency changes. Reinstall (uv pip install -e .) after touching pyproject.toml so the benchmark sees the same wheel as CI does.
Capturing a single run. The micro harness runs ≥ 5 rounds because single-run noise is real. Don’t paste a one-shot number into PERFORMANCE.md.
Not citing the change that moved the numbers. Every perf row references the PR / commit that caused the move. A row without provenance is folklore.

Where things live #

Concern	Path
Running tracker	`PERFORMANCE.md` (repo root)
Micro / assembly benchmarks	`tests/benchmarks/` (pytest-benchmark targets)
End-to-end pipeline harness	`perf/bench_pipeline.py`
Snapshots	`perf/snapshots/<short>_<YYYY-MM-DD>.md`
Trend series	`perf/trend/<metric>.md`
Latest run-of-record	`perf/latest_<area>.md`
Release-readiness CI	`.github/workflows/release-readiness.yml`