Performance — profiling, snapshots, the perf tracker ===================================================== How performance work flows through the repo: where to profile, how to capture a snapshot, when to commit one to the perf tracker, and how to read the running history. The repo treats performance as a **first-class artefact** — every optimisation that lands gets a dated row in ``PERFORMANCE.md`` so the history stays visible. Old rows are not collapsed; the point is to see numbers move over time. .. contents:: Page contents :local: :depth: 2 Two harnesses ------------- Performance numbers come from one of two harnesses, both checked into the repo: .. list-table:: :header-rows: 1 :widths: 22 30 48 * - Harness - Path - Drives * - **Micro / assembly** - ``tests/benchmarks/`` - ``pytest --benchmark-only`` over the standard 2 × 40 × 40 SOLID185 flat plate (5 043 nodes, 3 200 hex cells, 15 129 DOF — see ``tests/integration/_flat_plate.py``). Per-element kernels run on a single unit cube / unit tet. * - **End-to-end pipeline** - ``perf/bench_pipeline.py`` - Drives a parameterised SOLID185 flat plate (clamped at ``X = 0``, 10 modes) through the full ``Model.solve_modal`` pipeline with per-stage timing and peak-RSS capture. The micro harness reports median wall time per round (≥ 5 rounds via pytest-benchmark) so the noise floor is bounded. The pipeline harness reports per-stage breakdown so a slowdown landing in one stage is visible without re-running the whole suite. The perf tracker — ``PERFORMANCE.md`` ------------------------------------- A markdown changelog at the repo root. Every meaningful perf change gets a new dated section. What goes in: * **Each landed optimisation** — date, PR ref, the metric it moved (assembly time, peak RSS, modal solve, …), before / after numbers. * **Cross-platform / cross-backend baselines** — when a new solver backend lands, capture its numbers against the default so future comparisons start from a known floor. * **Regression diagnoses** — when a regression surfaces, the diagnosis lands in ``PERFORMANCE.md`` alongside the fix so the trail is auditable. What does **not** go in: * Speculative numbers from a half-implemented PR. Wait until it merges, capture the final number, then write the row. * Numbers from a non-standard machine without explicit call-out. The flat-plate baseline assumes the maintainer's reference machine; community contributions noting different hardware go in a separate "external timings" subsection. How to capture a snapshot ------------------------- Snapshots are dated markdown files under ``perf/snapshots/`` that pin a moment-in-time set of numbers. They're cited from ``PERFORMANCE.md`` rows that need a richer breakdown than the one-line summary fits. Naming: ``perf/snapshots/_.md`` (the short name describes the change — ``mem_after_triu_k``, ``solvers_baseline``, etc.). When to take one: * The change touches the **assembly inner loop** or any path that runs once per element. Even a 1 % shift compounds across millions of elements. * The change moves **peak RSS** by more than 5 %. * A **new solver backend** lands. To take one: .. code-block:: bash # Micro / per-element kernel times pytest tests/benchmarks --benchmark-only \ --benchmark-save= \ -o python_files=bench_*.py # End-to-end pipeline (timing + peak RSS) python perf/bench_pipeline.py --output perf/snapshots/_$(date +%F).md The bench_pipeline output is a markdown table; drop it into a file under ``perf/snapshots/``, add a one-paragraph summary at the top citing the change that moved the numbers, then link from a row in ``PERFORMANCE.md``. The trend tracker ----------------- ``perf/trend/`` holds longer-running series — one file per metric, appended to over many releases. Use when: * The metric you're tracking has more than two data points and belongs on a chart. * The series spans multiple PRs (e.g. assembly time over a refactor that lands in three steps). The format is one row per measurement with date, commit, value, and any notes. ``perf/trend/README.md`` codifies the conventions. Profiling --------- Three tools cover what the team profiles: * ``cProfile`` / ``pyinstrument`` — Python-side bottlenecks, per-function timing. Wrap the smallest reproducer that shows the slowdown; don't profile the whole test suite. * ``py-spy`` — sampling profiler for processes you don't want to restart. Useful for diagnosing a stuck ``solve_modal`` that's been running for hours. * ``memray`` — peak-RSS and allocation tracking. The pipeline harness already wraps this when invoked with ``--memory``. Don't commit profile traces. Save them under ``/tmp/`` or your scratch dir; if the diagnosis is interesting, distil to a written summary in the PR description and a ``PERFORMANCE.md`` row. The release-readiness regression suite -------------------------------------- The ``release-readiness`` GitHub Action (see ``.github/workflows/release-readiness.yml``) runs on every push to ``main`` and emits a diff against the previous tagged release's pipeline-bench numbers. Regressions ≥ 10 % block the docs deploy until investigated. When a regression alert fires: 1. Pull the most recent ``perf/snapshots/`` against the regressed metric. 2. Bisect with ``git bisect`` against the ``perf/bench_pipeline.py`` target. 3. The diagnosis lands in ``PERFORMANCE.md`` with the fix. Common pitfalls --------------- * **Testing on a debug build.** Always use the optimised install (``uv pip install -e .``). A debug build's numbers bear no relation to production. * **Letting the scheduler interfere.** Before benchmarking, pin the process: ``taskset -c 0 ...`` on Linux. Background noise easily moves a 1-second benchmark by 10 %. * **Using a stale virtualenv after dependency changes.** Reinstall (``uv pip install -e .``) after touching ``pyproject.toml`` so the benchmark sees the same wheel as CI does. * **Capturing a single run.** The micro harness runs ≥ 5 rounds because single-run noise is real. Don't paste a one-shot number into ``PERFORMANCE.md``. * **Not citing the change that moved the numbers.** Every perf row references the PR / commit that caused the move. A row without provenance is folklore. Where things live ----------------- .. list-table:: :header-rows: 1 :widths: 32 68 * - Concern - Path * - Running tracker - ``PERFORMANCE.md`` (repo root) * - Micro / assembly benchmarks - ``tests/benchmarks/`` (pytest-benchmark targets) * - End-to-end pipeline harness - ``perf/bench_pipeline.py`` * - Snapshots - ``perf/snapshots/_.md`` * - Trend series - ``perf/trend/.md`` * - Latest run-of-record - ``perf/latest_.md`` * - Release-readiness CI - ``.github/workflows/release-readiness.yml``