Verification methodology — resolving discrepancies#

The standard approach used when working through a vendor verification manual. Companion to Verification-manual spec (the how-to-author of a VM row); this page is how-to-resolve when the row’s first cross-check fails.

The methodology is deliberately not “do whatever it takes to match the vendor’s number”. Vendor numbers can be wrong; even when they are right, they may rest on assumptions different from ours. The goal is to understand every discrepancy down to the level where we can name its source — published material or first principles, never hand-waving.

When the goal is 100 % coverage of an external corpus, the discipline below is what makes the coverage trustworthy. Every landed benchmark must leave a written record of which branch of the methodology it traversed.

The standard approach #

Step 1 — Read the example, reproduce it from our features #

Read the vendor problem statement (geometry, material, loading, analysis type, reference solution). Author a deck from the problem statement — not by lifting deck text — and run it through from_bdf / from_inp / from_dat into a femorph-solver Model.

If the deck format requires a card we don’t yet support, that’s its own work item — open a child issue tracking the missing interop card and continue with a hand-built Model until the card lands. See Reader-pending fallback in the spec for the convention.

Step 2 — Compare the result to the published reference #

Pick the same quantity of interest the vendor reports (tip displacement, frequency, peak stress, …) and compute it at the same evaluation point on our mesh. Tabulate three numbers:

Closed form / NAFEMS reference — the analytical or consensus answer cited in the problem statement.
Vendor-reported value — what the verification manual claims the vendor’s solver produces.
Our value — what femorph-solver produces on the hand-authored deck.

A “match” is when our value is within engineering tolerance of the closed form (typically 1 % for analytical references, 5 % for NAFEMS converged values, looser when the vendor manual explicitly notes coarse-mesh discretisation).

Step 3 — When we don’t match, name the source #

There are exactly four reasons our number can disagree with the manual’s, and each is resolved differently. Walk through them in order for any benchmark whose first cross-check fails.

(a) Our deck author misread the problem#

Most discrepancies fall here. Re-read the problem statement end-to-end with a fresh eye:

units (consistent psi-vs-Pa, in-vs-m, lb-vs-N);
boundary conditions (one of “simply supported” / “clamped” / “guided” — the published reference value is very sensitive to which);
loading direction and distribution (point vs. line vs. surface load; consistent vs. lumped equivalent);
the reference quantity itself (tip deflection vs. centre-span deflection; peak stress at fibre vs. neutral axis; natural frequency in Hz vs. rad/s).

Authoring mistakes almost always show up here. When you catch one, fix the deck and the prose, and link the fix in the benchmark page so future readers see the trap.

(b) Our solver assumes something different from theirs#

Real, principled assumption mismatches. The most common ones:

Element kernel formulation — Bathe-Dvorkin MITC vs. Reissner-Mindlin shells; standard hex vs. enhanced-strain (EAS) hex; reduced-integration vs. full integration; selective reduced integration on the volumetric term.
Stress recovery point — extrapolated from Gauss points vs. evaluated at nodes; bottom-fibre vs. top-fibre vs. mid-plane.
Mass matrix — lumped vs. consistent (matters in modal / dynamic problems by ~5 %).
Plane stress vs. plane strain vs. axisymmetric — for the same QUAD4 mesh, three completely different physical problems.
Isotropic-elastic constitutive — the same Hooke’s law has different numeric implementations (Lamé form vs. Voigt-vector form vs. matrix form), but they should agree to machine precision; if they don’t the bug is on us.

When you identify the assumption mismatch, write it into the benchmark page with a citation to the textbook source (Bathe Finite Element Procedures, Hughes The Finite Element Method, Cook Concepts and Applications of FEA, etc.) and pin the expected magnitude of the offset. The benchmark then asserts the right tolerance — wide enough to absorb the documented offset, tight enough to catch a real regression.

(c) The vendor’s published number is wrong#

Rare, but it does happen — vendor manuals carry typos in reference values, off-by-factor unit conversions, and occasionally outright wrong reference solutions. Before declaring this branch, walk all three of:

the closed-form derivation from first principles (do the algebra yourself; don’t trust a copy-paste from another manual);
the NAFEMS or textbook target if the problem has one;
a second vendor’s published value for the same problem (Ansys VM, Abaqus Benchmarks, etc., often republish the same NAFEMS problem and we can sanity-check Ansys against MSC).

When all three agree against the manual’s reference value, the manual is wrong. Document it in the benchmark page and assert against the closed form, not the manual’s number. Do not just “loosen the tolerance to absorb the discrepancy” — that hides the fact and pollutes the record.

(d) We’re missing a feature#

Some discrepancies cannot be resolved with the kernels we ship — the problem genuinely needs:

a new element formulation (axisymmetric, layered shell, beam with warping, …);
an analysis type we don’t have (linear buckling, transient dynamics, contact, thermal, …);
a load card (gravity, line pressure, line moment, …) the interop reader doesn’t parse yet;
a constitutive option (plasticity, hyperelasticity, creep, …).

These get child issues linked back to the corpus tracker (currently #345 for MSC VG, #511 for MAPDL VM). The benchmark sits in the ☐-blocked column on the tracker until the feature lands; on landing, the same author flips the box. This is the only acceptable way to “skip” a problem — the gap is named, tracked, and unblocked over time.

Step 4 — Land the benchmark with a written record #

Every benchmark page that ships under doc/source/verification/ includes:

the four-section template (problem, analytical reference, femorph-solver result, cross-references);
if the cross-vendor numbers disagreed, a short “Discrepancy resolution” subsection naming which branch above applied and citing the source that pinned it;
a row in vendor_matrix.rst flipped to ☑ with a link to the PR.

This is what makes 100 % coverage of an external corpus actually mean something — every problem is either matched, explained, or named-as-blocked, and every choice has a paper trail back to published material or first principles.

Working contract for ingest agents #

When picking a row from a tracker (#345, #511, etc.):

Comment on the tracker claiming the row.
Read the published problem statement, run the methodology above, and produce the deck pair (BDF + INP, .dat + .cdb, etc.) and a closed-form BenchmarkProblem class.
If you hit branch (a) or (b), fix-and-document.
If you hit branch (c), assert against the truth source and document the manual error in the benchmark page.
If you hit branch (d), open the feature gap as its own issue and leave the row on the tracker as ☐-blocked with the blocking-issue link.
Open the PR; on merge, tick the row and update doc/source/verification/vendor_matrix.rst.

What this methodology is not#

It is not “match the vendor’s published number at all costs”. Matching the wrong number is worse than not matching.
It is not “skip what’s hard”. The “hard” rows reveal real feature gaps and become the most useful direction-setters once surfaced.
It is not “trust the closed form blindly”. Closed forms carry their own assumptions (small-deflection theory, plane stress, Kirchhoff vs. Mindlin, …) that must match the problem you’re actually computing.

The only invariant: every number in our verification corpus is backed by published material or first-principles derivation, and every discrepancy has a documented owner.

Where to put the paper trail #

Artifact	Lives at
Hand-authored deck (BDF / INP / `.dat`)	`tests/interop/<vendor>/fixtures/vm_<name>.<ext>`
Closed-form `BenchmarkProblem`	`src/femorph_solver/validation/_problems/<name>.py`
Registry row	`tests/cross_solver/_verification_registry.py`
Public-facing benchmark page	`doc/source/verification/<name>.rst`
Coverage row	`doc/source/verification/vendor_matrix.rst`
Tracker row	GitHub #345 (MSC VG), #511 (MAPDL VM), #601 (umbrella), #322 (live count)
Methodology branch (a/b/c/d)	inline note in the benchmark page
Discrepancy citation	inline note in the benchmark page + git commit message
Feature gap (branch d)	dedicated GitHub issue cross-linked from the tracker

Everything else — the agent-to-agent context, the per-session debugging, the why-we-chose-this-mesh — belongs in the PR description and the related issue thread, not scattered across the benchmark pages themselves.