Performance — profiling, snapshots, the perf tracker#
How performance work flows through the repo: where to profile, how to capture a snapshot, when to commit one to the perf tracker, and how to read the running history.
The repo treats performance as a first-class artefact — every
optimisation that lands gets a dated row in PERFORMANCE.md so
the history stays visible. Old rows are not collapsed; the
point is to see numbers move over time.
Two harnesses#
Performance numbers come from one of two harnesses, both checked into the repo:
Harness |
Path |
Drives |
|---|---|---|
Micro / assembly |
|
|
End-to-end pipeline |
|
Drives a parameterised SOLID185 flat plate (clamped at
|
The micro harness reports median wall time per round (≥ 5 rounds via pytest-benchmark) so the noise floor is bounded. The pipeline harness reports per-stage breakdown so a slowdown landing in one stage is visible without re-running the whole suite.
The perf tracker — PERFORMANCE.md#
A markdown changelog at the repo root. Every meaningful perf change gets a new dated section.
What goes in:
Each landed optimisation — date, PR ref, the metric it moved (assembly time, peak RSS, modal solve, …), before / after numbers.
Cross-platform / cross-backend baselines — when a new solver backend lands, capture its numbers against the default so future comparisons start from a known floor.
Regression diagnoses — when a regression surfaces, the diagnosis lands in
PERFORMANCE.mdalongside the fix so the trail is auditable.
What does not go in:
Speculative numbers from a half-implemented PR. Wait until it merges, capture the final number, then write the row.
Numbers from a non-standard machine without explicit call-out. The flat-plate baseline assumes the maintainer’s reference machine; community contributions noting different hardware go in a separate “external timings” subsection.
How to capture a snapshot#
Snapshots are dated markdown files under perf/snapshots/
that pin a moment-in-time set of numbers. They’re cited from
PERFORMANCE.md rows that need a richer breakdown than the
one-line summary fits.
Naming: perf/snapshots/<short-name>_<YYYY-MM-DD>.md (the
short name describes the change — mem_after_triu_k,
solvers_baseline, etc.).
When to take one:
The change touches the assembly inner loop or any path that runs once per element. Even a 1 % shift compounds across millions of elements.
The change moves peak RSS by more than 5 %.
A new solver backend lands.
To take one:
# Micro / per-element kernel times
pytest tests/benchmarks --benchmark-only \
--benchmark-save=<short-name> \
-o python_files=bench_*.py
# End-to-end pipeline (timing + peak RSS)
python perf/bench_pipeline.py --output perf/snapshots/<short-name>_$(date +%F).md
The bench_pipeline output is a markdown table; drop it into a
file under perf/snapshots/, add a one-paragraph summary at
the top citing the change that moved the numbers, then link
from a row in PERFORMANCE.md.
The trend tracker#
perf/trend/ holds longer-running series — one file per
metric, appended to over many releases. Use when:
The metric you’re tracking has more than two data points and belongs on a chart.
The series spans multiple PRs (e.g. assembly time over a refactor that lands in three steps).
The format is one row per measurement with date, commit, value,
and any notes. perf/trend/README.md codifies the
conventions.
Profiling#
Three tools cover what the team profiles:
cProfile/pyinstrument— Python-side bottlenecks, per-function timing. Wrap the smallest reproducer that shows the slowdown; don’t profile the whole test suite.py-spy— sampling profiler for processes you don’t want to restart. Useful for diagnosing a stucksolve_modalthat’s been running for hours.memray— peak-RSS and allocation tracking. The pipeline harness already wraps this when invoked with--memory.
Don’t commit profile traces. Save them under
/tmp/ or your scratch dir; if the diagnosis is interesting,
distil to a written summary in the PR description and a
PERFORMANCE.md row.
The release-readiness regression suite#
The release-readiness GitHub Action (see
.github/workflows/release-readiness.yml) runs on every push
to main and emits a diff against the previous tagged
release’s pipeline-bench numbers. Regressions ≥ 10 % block the
docs deploy until investigated.
When a regression alert fires:
Pull the most recent
perf/snapshots/against the regressed metric.Bisect with
git bisectagainst theperf/bench_pipeline.pytarget.The diagnosis lands in
PERFORMANCE.mdwith the fix.
Common pitfalls#
Testing on a debug build. Always use the optimised install (
uv pip install -e .). A debug build’s numbers bear no relation to production.Letting the scheduler interfere. Before benchmarking, pin the process:
taskset -c 0 ...on Linux. Background noise easily moves a 1-second benchmark by 10 %.Using a stale virtualenv after dependency changes. Reinstall (
uv pip install -e .) after touchingpyproject.tomlso the benchmark sees the same wheel as CI does.Capturing a single run. The micro harness runs ≥ 5 rounds because single-run noise is real. Don’t paste a one-shot number into
PERFORMANCE.md.Not citing the change that moved the numbers. Every perf row references the PR / commit that caused the move. A row without provenance is folklore.
Where things live#
Concern |
Path |
|---|---|
Running tracker |
|
Micro / assembly benchmarks |
|
End-to-end pipeline harness |
|
Snapshots |
|
Trend series |
|
Latest run-of-record |
|
Release-readiness CI |
|