Performance — profiling, snapshots, the perf tracker#

How performance work flows through the repo: where to profile, how to capture a snapshot, when to commit one to the perf tracker, and how to read the running history.

The repo treats performance as a first-class artefact — every optimisation that lands gets a dated row in PERFORMANCE.md so the history stays visible. Old rows are not collapsed; the point is to see numbers move over time.

Two harnesses#

Performance numbers come from one of two harnesses, both checked into the repo:

Harness

Path

Drives

Micro / assembly

tests/benchmarks/

pytest --benchmark-only over the standard 2 × 40 × 40 SOLID185 flat plate (5 043 nodes, 3 200 hex cells, 15 129 DOF — see tests/integration/_flat_plate.py). Per-element kernels run on a single unit cube / unit tet.

End-to-end pipeline

perf/bench_pipeline.py

Drives a parameterised SOLID185 flat plate (clamped at X = 0, 10 modes) through the full Model.solve_modal pipeline with per-stage timing and peak-RSS capture.

The micro harness reports median wall time per round (≥ 5 rounds via pytest-benchmark) so the noise floor is bounded. The pipeline harness reports per-stage breakdown so a slowdown landing in one stage is visible without re-running the whole suite.

The perf tracker — PERFORMANCE.md#

A markdown changelog at the repo root. Every meaningful perf change gets a new dated section.

What goes in:

  • Each landed optimisation — date, PR ref, the metric it moved (assembly time, peak RSS, modal solve, …), before / after numbers.

  • Cross-platform / cross-backend baselines — when a new solver backend lands, capture its numbers against the default so future comparisons start from a known floor.

  • Regression diagnoses — when a regression surfaces, the diagnosis lands in PERFORMANCE.md alongside the fix so the trail is auditable.

What does not go in:

  • Speculative numbers from a half-implemented PR. Wait until it merges, capture the final number, then write the row.

  • Numbers from a non-standard machine without explicit call-out. The flat-plate baseline assumes the maintainer’s reference machine; community contributions noting different hardware go in a separate “external timings” subsection.

How to capture a snapshot#

Snapshots are dated markdown files under perf/snapshots/ that pin a moment-in-time set of numbers. They’re cited from PERFORMANCE.md rows that need a richer breakdown than the one-line summary fits.

Naming: perf/snapshots/<short-name>_<YYYY-MM-DD>.md (the short name describes the change — mem_after_triu_k, solvers_baseline, etc.).

When to take one:

  • The change touches the assembly inner loop or any path that runs once per element. Even a 1 % shift compounds across millions of elements.

  • The change moves peak RSS by more than 5 %.

  • A new solver backend lands.

To take one:

# Micro / per-element kernel times
pytest tests/benchmarks --benchmark-only \
    --benchmark-save=<short-name> \
    -o python_files=bench_*.py

# End-to-end pipeline (timing + peak RSS)
python perf/bench_pipeline.py --output perf/snapshots/<short-name>_$(date +%F).md

The bench_pipeline output is a markdown table; drop it into a file under perf/snapshots/, add a one-paragraph summary at the top citing the change that moved the numbers, then link from a row in PERFORMANCE.md.

The trend tracker#

perf/trend/ holds longer-running series — one file per metric, appended to over many releases. Use when:

  • The metric you’re tracking has more than two data points and belongs on a chart.

  • The series spans multiple PRs (e.g. assembly time over a refactor that lands in three steps).

The format is one row per measurement with date, commit, value, and any notes. perf/trend/README.md codifies the conventions.

Profiling#

Three tools cover what the team profiles:

  • cProfile / pyinstrument — Python-side bottlenecks, per-function timing. Wrap the smallest reproducer that shows the slowdown; don’t profile the whole test suite.

  • py-spy — sampling profiler for processes you don’t want to restart. Useful for diagnosing a stuck solve_modal that’s been running for hours.

  • memray — peak-RSS and allocation tracking. The pipeline harness already wraps this when invoked with --memory.

Don’t commit profile traces. Save them under /tmp/ or your scratch dir; if the diagnosis is interesting, distil to a written summary in the PR description and a PERFORMANCE.md row.

The release-readiness regression suite#

The release-readiness GitHub Action (see .github/workflows/release-readiness.yml) runs on every push to main and emits a diff against the previous tagged release’s pipeline-bench numbers. Regressions ≥ 10 % block the docs deploy until investigated.

When a regression alert fires:

  1. Pull the most recent perf/snapshots/ against the regressed metric.

  2. Bisect with git bisect against the perf/bench_pipeline.py target.

  3. The diagnosis lands in PERFORMANCE.md with the fix.

Common pitfalls#

  • Testing on a debug build. Always use the optimised install (uv pip install -e .). A debug build’s numbers bear no relation to production.

  • Letting the scheduler interfere. Before benchmarking, pin the process: taskset -c 0 ... on Linux. Background noise easily moves a 1-second benchmark by 10 %.

  • Using a stale virtualenv after dependency changes. Reinstall (uv pip install -e .) after touching pyproject.toml so the benchmark sees the same wheel as CI does.

  • Capturing a single run. The micro harness runs ≥ 5 rounds because single-run noise is real. Don’t paste a one-shot number into PERFORMANCE.md.

  • Not citing the change that moved the numbers. Every perf row references the PR / commit that caused the move. A row without provenance is folklore.

Where things live#

Concern

Path

Running tracker

PERFORMANCE.md (repo root)

Micro / assembly benchmarks

tests/benchmarks/ (pytest-benchmark targets)

End-to-end pipeline harness

perf/bench_pipeline.py

Snapshots

perf/snapshots/<short>_<YYYY-MM-DD>.md

Trend series

perf/trend/<metric>.md

Latest run-of-record

perf/latest_<area>.md

Release-readiness CI

.github/workflows/release-readiness.yml