Estimating wall time + memory before you solve#

Model.estimate_solve() returns predicted (wall_s, peak_rss_mb) for a modal or static solve without running it. Useful when you want to:

  • Check a job fits in your SLURM time / memory budget before submitting.

  • Pick between in-core and OOC up front.

  • Compare how a problem scales with n_dof without the measurement overhead.

How it works#

The estimator fits a simple log-log power law per (host_signature, linear_solver) bucket — on 3D SPD meshes under sparse-Cholesky ordering, factor fill and solve cost both scale as \(n^{4/3}\) (George 1973), so log(wall_s) is nearly linear in log(n_dof) for any fixed host + backend.

Training data comes from the TA-6 benchmark module (femorph_solver.benchmark): every femorph-benchmark-*.json file in the current working directory is loaded, broken into training rows, and grouped by (host_signature, linear_solver). Buckets with ≥ 2 non-OOC rows get their own coefficient fit; everything else falls back to a shared cross-host fit, and if no training data is available at all the estimator uses a shape-of-universe prior tuned from the repo’s own Intel Core i9-14900KF benchmark sweeps (with a p95 = 2 × p50 confidence band to signal you’re looking at an uncalibrated prediction).

Usage#

import numpy as np, pyvista as pv
import femorph_solver

# Build your model as usual...
m = femorph_solver.Model.from_grid(grid)
m.et(1, "SOLID185")
m.mp("EX", 1, 2.0e11); m.mp("PRXY", 1, 0.3); m.mp("DENS", 1, 7850)
# ... apply BCs

est = m.estimate_solve(n_modes=10)
print(est)
# Estimate(wall_s=p50 3.47 / p95 4.16,
#          peak_rss_mb=p50 892 / p95 1070,
#          bucket='Intel(R) Core(TM) i9-14900KF|94|Linux|x86_64|mkl_direct',
#          n_training=8)

if est.wall_s_p95 > 60:
    print("This is a long-running solve — consider OOC or a bigger host.")

Training on your own host#

Run the TA-6 benchmark once to seed training data:

python -m femorph_solver.benchmark --level basic --out ./

Then subsequent estimate_solve calls read the femorph-benchmark-basic-*.json file and fit per-host coefficients for the backends the basic level exercised (auto + arpack). Run the full level to cover more backends:

python -m femorph_solver.benchmark --level full --out ./

Extending the feature vector#

HostSpec carries an extras dict — arbitrary key/value strings from the TA-6 benchmark’s host report. Future retraining passes can add features (memory bandwidth benchmark, BLAS thread count, disk type for OOC) by extending that dict and bumping the estimator’s feature-extraction step; historical JSON files stay loadable because the loader reads every field via .get() with defaults.

Limitations#

  • Per-host fits need data. On a first run with no training JSON, the estimator falls back to the repo’s prior, which was tuned on a 14900KF + Intel MKL. Expect wide p95 bands (2× p50) until you’ve run the benchmark at least once on your hardware.

  • Extrapolation beyond training range is unreliable. If your training rows top out at 200 k DOFs and you ask for 1 M, the power-law prediction might still be in the ballpark — but the confidence band doesn’t widen enough. Run the benchmark at a size close to your target problem for a tight estimate.

  • Only ``n_dof`` is currently used as a feature. Future revisions will add nnz / mesh aspect ratio / mode count as features — the schema is already append-only, so expanded training data flows through without a loader change.

References#

  • George, A. “Nested dissection of a regular finite element mesh.” SIAM J. Numer. Anal. 10 (1973), pp. 345-363. Origin of the \(n^{4/3}\) factor-fill bound for structured 3D grids — the power-law basis for the log-log fit.