Estimating wall time + memory before you solve#
Model.estimate_solve() returns predicted (wall_s,
peak_rss_mb) for a modal or static solve without running it.
Useful when you want to:
Check a job fits in your SLURM time / memory budget before submitting.
Pick between in-core and OOC up front.
Compare how a problem scales with
n_dofwithout the measurement overhead.
How it works#
The estimator fits a simple log-log power law per
(host_signature, linear_solver) bucket — on 3D SPD meshes
under sparse-Cholesky ordering, factor fill and solve cost both
scale as \(n^{4/3}\) (George 1973), so log(wall_s) is
nearly linear in log(n_dof) for any fixed host + backend.
Training data comes from the TA-6 benchmark module
(femorph_solver.benchmark): every
femorph-benchmark-*.json file in the current working
directory is loaded, broken into training rows, and grouped by
(host_signature, linear_solver). Buckets with
≥ 2 non-OOC rows get their own coefficient fit; everything else
falls back to a shared cross-host fit, and if no training data is
available at all the estimator uses a shape-of-universe prior
tuned from the repo’s own Intel Core i9-14900KF benchmark sweeps
(with a p95 = 2 × p50 confidence band to signal you’re looking
at an uncalibrated prediction).
Usage#
import numpy as np, pyvista as pv
import femorph_solver
# Build your model as usual...
m = femorph_solver.Model.from_grid(grid)
m.et(1, "SOLID185")
m.mp("EX", 1, 2.0e11); m.mp("PRXY", 1, 0.3); m.mp("DENS", 1, 7850)
# ... apply BCs
est = m.estimate_solve(n_modes=10)
print(est)
# Estimate(wall_s=p50 3.47 / p95 4.16,
# peak_rss_mb=p50 892 / p95 1070,
# bucket='Intel(R) Core(TM) i9-14900KF|94|Linux|x86_64|mkl_direct',
# n_training=8)
if est.wall_s_p95 > 60:
print("This is a long-running solve — consider OOC or a bigger host.")
Training on your own host#
Run the TA-6 benchmark once to seed training data:
python -m femorph_solver.benchmark --level basic --out ./
Then subsequent estimate_solve calls read the
femorph-benchmark-basic-*.json file and fit per-host
coefficients for the backends the basic level exercised
(auto + arpack). Run the full level to cover more
backends:
python -m femorph_solver.benchmark --level full --out ./
Extending the feature vector#
HostSpec carries an
extras dict — arbitrary key/value strings from the TA-6
benchmark’s host report. Future retraining passes can add
features (memory bandwidth benchmark, BLAS thread count,
disk type for OOC) by extending that dict and bumping the
estimator’s feature-extraction step; historical JSON files
stay loadable because the loader reads every field via
.get() with defaults.
Limitations#
Per-host fits need data. On a first run with no training JSON, the estimator falls back to the repo’s prior, which was tuned on a 14900KF + Intel MKL. Expect wide
p95bands (2×p50) until you’ve run the benchmark at least once on your hardware.Extrapolation beyond training range is unreliable. If your training rows top out at 200 k DOFs and you ask for 1 M, the power-law prediction might still be in the ballpark — but the confidence band doesn’t widen enough. Run the benchmark at a size close to your target problem for a tight estimate.
Only ``n_dof`` is currently used as a feature. Future revisions will add
nnz/ mesh aspect ratio / mode count as features — the schema is already append-only, so expanded training data flows through without a loader change.
References#
George, A. “Nested dissection of a regular finite element mesh.” SIAM J. Numer. Anal. 10 (1973), pp. 345-363. Origin of the \(n^{4/3}\) factor-fill bound for structured 3D grids — the power-law basis for the log-log fit.