Source code for femorph_solver.elements.solid186_degenerate

"""Degenerate SOLID186 shapes — 15-node wedge and 13-node pyramid.

MAPDL exposes these as SOLID186 KEYOPT variants with collapsed corners;
``mapdl-archive`` splits them out as VTK_QUADRATIC_WEDGE (15 pts) and
VTK_QUADRATIC_PYRAMID (13 pts), strips the duplicated node IDs, and
emits the compact connectivity on the grid.

Two implementations live here for historical reasons:

* ``SOLID186W`` — dedicated 15-node quadratic serendipity wedge kernel
  with its own shape functions and Gauss rules (9-pt for stiffness,
  21-pt for consistent mass).  This is the formulation MAPDL Theory
  §13.186 specifies for degenerate-wedge SOLID186 elements and keeps
  the code cleaner / the mass integration correctly full-order.
  Numerically it produces the same K and M as the hex-collapse fold on
  the meshes we've tested, which is expected (the wedge formulation is
  mathematically equivalent to a 20-node hex with K=L and O=P collapsed
  — the dedicated kernel is just the compact path to that result).

* ``SOLID186P`` — still a hex-fold wrapper ``K = Tᵀ K_hex T`` pending a
  dedicated 13-node pyramid implementation.  Pyramids are the long pole
  in ptr.cdb parity (~5e-4 rel modal error, all from this path), and
  closing the gap requires proper apex-singular shape functions
  (Bedrosian / Zgainski-Bossavit / Wachspress) plus a pyramid-domain
  Gauss rule — tracked as a dedicated follow-up PR.

References
----------
* Degenerate hex → wedge / pyramid topology collapse: Zienkiewicz,
  O.C. and Taylor, R.L.  *The Finite Element Method: Its Basis and
  Fundamentals*, 7th ed., Butterworth-Heinemann, 2013, §8.8.1
  (discussion of collapsed hex nodes producing wedge / pyramid
  shapes).
* **15-node serendipity wedge shape functions** (base corners on
  barycentric ``(ξ₁, ξ₂, ξ₃) = L₁, L₂, L₃`` with ``ξ₃ = 1 − ξ₁ − ξ₂``,
  quadratic through thickness in ``ζ``): Bedrosian, G., "Shape
  functions and integration formulas for three-dimensional finite
  element analysis," *IJNME* 35 (1992), pp. 95–108.  Equivalent
  derivation in Zienkiewicz & Taylor §8.8.3 Table 8.8-1.
* **9-point wedge Gauss rule** (3-point triangular rule × 3-point
  Gauss-Legendre through thickness): triangular rule from Dunavant,
  D.A., "High degree efficient symmetrical Gaussian quadrature
  rules for the triangle," *IJNME* 21 (1985), pp. 1129–1148
  (degree-4 triangle rule on 3 symmetric points); tensor-product
  with the 1-D Gauss-Legendre rule follows Zienkiewicz & Taylor
  §5.10.
* **21-point wedge mass rule** (6-point triangular rule × 3-point
  through-thickness, integrates the degree-4 Bᵀ B polynomial on
  the wedge): Dunavant degree-4 triangle rule × Gauss-Legendre 3.
* **13-node pyramid shape functions** (``SOLID186P``, currently a
  hex-fold wrapper pending a dedicated kernel): Bedrosian 1992
  (as above) for the apex-singular polynomial basis, alternative
  formulations in Zgainski, F.X., Coulomb, J.L., and Marechal, Y.,
  "A new family of finite elements: the pyramidal elements,"
  *IEEE Trans. Magn.* 32 (1996), pp. 1393–1396 and Wachspress,
  E.L., *A Rational Finite Element Basis*, Academic Press, 1975.

MAPDL compatibility — specification source
------------------------------------------
* Ansys, Inc., *Ansys Mechanical APDL Element Reference*,
  Release 2022R2, section "SOLID186 — 3-D 20-Node Structural
  Solid" (KEYOPT-driven degenerate-form dispatch).

Short factual summary (paraphrased):

* The 20-node hex kernel accepts a "collapsed-corner" layout
  that MAPDL interprets as a 15-node wedge (two adjacent corners
  coincident on one face) or a 13-node pyramid (four corners
  collapsed to a single apex).  ``mapdl-archive`` splits these
  out on load and emits them as VTK_QUADRATIC_WEDGE / VTK_QUADRATIC_PYRAMID.
* Reduced-integration default; mass integration via the 14-point
  Irons rule — matches the parent 20-node hex behaviour.

femorph-solver reimplements the degenerate forms from the
Bedrosian 1992 / Dunavant 1985 / Wachspress 1975 references
above; the Ansys Element Reference is consulted only as the
compatibility specification.
"""

from __future__ import annotations

import numpy as np

from femorph_solver.elements._base import ElementBase
from femorph_solver.elements._registry import register
from femorph_solver.elements.solid185 import _elastic_matrix
from femorph_solver.elements.solid186 import SOLID186

try:
    from femorph_solver import _elements as _EXT
except ImportError:  # pragma: no cover
    _EXT = None

# =============================================================================
# 15-node wedge — dedicated shape functions + Gauss rules
# =============================================================================

_WEDGE_N = 15
_WEDGE_NDOF = 3 * _WEDGE_N  # 45


def _wedge_shape_and_derivs(xi1: float, xi2: float, zeta: float) -> tuple[np.ndarray, np.ndarray]:
    """Return ``(N (15,), dN/dξ (15, 3))`` at one reference point.

    ``dN`` columns are ``∂/∂ξ₁``, ``∂/∂ξ₂``, ``∂/∂ζ`` with ``ξ₃`` eliminated
    via ``ξ₃ = 1 − ξ₁ − ξ₂`` (chain-rule for the ξ₃ corner / mid-edges
    that involve ξ₃).

    Node ordering matches what ``mapdl-archive`` emits for VTK_QUADRATIC_WEDGE
    from a MAPDL degenerate SOLID186 (verified against ptr.cdb): the
    *degenerate* base corner (collapsed K=L in the original hex slot layout)
    lands at VTK index 0, so the barycentric-coord mapping runs

        VTK[0] ↔ ξ₃ corner   VTK[1] ↔ ξ₂ corner   VTK[2] ↔ ξ₁ corner

    for the base, with the top triangle, base/top midsides, and vertical
    midsides following the same flipped orientation.  Keeping the
    orientation this way makes the Jacobian positive on the physical
    wedges that mapdl-archive actually produces.
    """
    xi3 = 1.0 - xi1 - xi2
    one_m_zeta = 1.0 - zeta
    one_p_zeta = 1.0 + zeta
    one_m_zeta_sq = 1.0 - zeta * zeta

    N = np.empty(_WEDGE_N, dtype=np.float64)
    dN = np.empty((_WEDGE_N, 3), dtype=np.float64)

    # --- Base corners (ζ = -1) ---------------------------------------------
    # VTK[0] = ξ₃ corner (degenerate K=L on the hex side).  ξ₃ = 1 − ξ₁ − ξ₂,
    # so ∂/∂ξ₁ and ∂/∂ξ₂ pick up -1 from the chain rule.
    N[0] = 0.5 * xi3 * (2.0 * xi3 - 1.0) * one_m_zeta - 0.5 * xi3 * one_m_zeta_sq
    d_xi3 = 0.5 * (4.0 * xi3 - 1.0) * one_m_zeta - 0.5 * one_m_zeta_sq
    dN[0, 0] = -d_xi3
    dN[0, 1] = -d_xi3
    dN[0, 2] = -0.5 * xi3 * (2.0 * xi3 - 1.0) + zeta * xi3

    # VTK[1] = ξ₂ corner
    N[1] = 0.5 * xi2 * (2.0 * xi2 - 1.0) * one_m_zeta - 0.5 * xi2 * one_m_zeta_sq
    dN[1, 0] = 0.0
    dN[1, 1] = 0.5 * (4.0 * xi2 - 1.0) * one_m_zeta - 0.5 * one_m_zeta_sq
    dN[1, 2] = -0.5 * xi2 * (2.0 * xi2 - 1.0) + zeta * xi2

    # VTK[2] = ξ₁ corner
    N[2] = 0.5 * xi1 * (2.0 * xi1 - 1.0) * one_m_zeta - 0.5 * xi1 * one_m_zeta_sq
    dN[2, 0] = 0.5 * (4.0 * xi1 - 1.0) * one_m_zeta - 0.5 * one_m_zeta_sq
    dN[2, 1] = 0.0
    dN[2, 2] = -0.5 * xi1 * (2.0 * xi1 - 1.0) + zeta * xi1

    # --- Top corners (ζ = +1) ----------------------------------------------
    N[3] = 0.5 * xi3 * (2.0 * xi3 - 1.0) * one_p_zeta - 0.5 * xi3 * one_m_zeta_sq
    d_xi3 = 0.5 * (4.0 * xi3 - 1.0) * one_p_zeta - 0.5 * one_m_zeta_sq
    dN[3, 0] = -d_xi3
    dN[3, 1] = -d_xi3
    dN[3, 2] = 0.5 * xi3 * (2.0 * xi3 - 1.0) + zeta * xi3

    N[4] = 0.5 * xi2 * (2.0 * xi2 - 1.0) * one_p_zeta - 0.5 * xi2 * one_m_zeta_sq
    dN[4, 0] = 0.0
    dN[4, 1] = 0.5 * (4.0 * xi2 - 1.0) * one_p_zeta - 0.5 * one_m_zeta_sq
    dN[4, 2] = 0.5 * xi2 * (2.0 * xi2 - 1.0) + zeta * xi2

    N[5] = 0.5 * xi1 * (2.0 * xi1 - 1.0) * one_p_zeta - 0.5 * xi1 * one_m_zeta_sq
    dN[5, 0] = 0.5 * (4.0 * xi1 - 1.0) * one_p_zeta - 0.5 * one_m_zeta_sq
    dN[5, 1] = 0.0
    dN[5, 2] = 0.5 * xi1 * (2.0 * xi1 - 1.0) + zeta * xi1

    # --- Base midsides (ζ = -1) --------------------------------------------
    # VTK[6] = midside on edge (VTK[0], VTK[1]) = (ξ₃, ξ₂) edge.
    N[6] = 2.0 * xi3 * xi2 * one_m_zeta
    dN[6, 0] = -2.0 * xi2 * one_m_zeta
    dN[6, 1] = 2.0 * (xi3 - xi2) * one_m_zeta
    dN[6, 2] = -2.0 * xi3 * xi2

    # VTK[7] = midside on edge (VTK[1], VTK[2]) = (ξ₂, ξ₁) edge.
    N[7] = 2.0 * xi2 * xi1 * one_m_zeta
    dN[7, 0] = 2.0 * xi2 * one_m_zeta
    dN[7, 1] = 2.0 * xi1 * one_m_zeta
    dN[7, 2] = -2.0 * xi2 * xi1

    # VTK[8] = midside on edge (VTK[2], VTK[0]) = (ξ₁, ξ₃) edge.
    N[8] = 2.0 * xi1 * xi3 * one_m_zeta
    dN[8, 0] = 2.0 * (xi3 - xi1) * one_m_zeta
    dN[8, 1] = -2.0 * xi1 * one_m_zeta
    dN[8, 2] = -2.0 * xi1 * xi3

    # --- Top midsides (ζ = +1) ---------------------------------------------
    N[9] = 2.0 * xi3 * xi2 * one_p_zeta
    dN[9, 0] = -2.0 * xi2 * one_p_zeta
    dN[9, 1] = 2.0 * (xi3 - xi2) * one_p_zeta
    dN[9, 2] = 2.0 * xi3 * xi2

    N[10] = 2.0 * xi2 * xi1 * one_p_zeta
    dN[10, 0] = 2.0 * xi2 * one_p_zeta
    dN[10, 1] = 2.0 * xi1 * one_p_zeta
    dN[10, 2] = 2.0 * xi2 * xi1

    N[11] = 2.0 * xi1 * xi3 * one_p_zeta
    dN[11, 0] = 2.0 * (xi3 - xi1) * one_p_zeta
    dN[11, 1] = -2.0 * xi1 * one_p_zeta
    dN[11, 2] = 2.0 * xi1 * xi3

    # --- Vertical midsides (ζ = 0) -----------------------------------------
    # VTK[12] = vertical at VTK[0] = ξ₃ column
    N[12] = xi3 * one_m_zeta_sq
    dN[12, 0] = -one_m_zeta_sq
    dN[12, 1] = -one_m_zeta_sq
    dN[12, 2] = -2.0 * xi3 * zeta

    # VTK[13] = vertical at VTK[1] = ξ₂ column
    N[13] = xi2 * one_m_zeta_sq
    dN[13, 0] = 0.0
    dN[13, 1] = one_m_zeta_sq
    dN[13, 2] = -2.0 * xi2 * zeta

    # VTK[14] = vertical at VTK[2] = ξ₁ column
    N[14] = xi1 * one_m_zeta_sq
    dN[14, 0] = one_m_zeta_sq
    dN[14, 1] = 0.0
    dN[14, 2] = -2.0 * xi1 * zeta

    return N, dN


def _line_gauss_3pt() -> tuple[np.ndarray, np.ndarray]:
    """3-pt Gauss-Legendre on [-1, +1] (degree 5 exact)."""
    sqrt35 = np.sqrt(3.0 / 5.0)
    pts = np.array([-sqrt35, 0.0, +sqrt35])
    wts = np.array([5.0 / 9.0, 8.0 / 9.0, 5.0 / 9.0])
    return pts, wts


def _tri_gauss_3pt() -> tuple[np.ndarray, np.ndarray]:
    """3-pt barycentric triangle Gauss (degree 2 exact).

    Weights sum to ½ (reference-triangle area).
    """
    pts = np.array(
        [
            [2.0 / 3.0, 1.0 / 6.0],
            [1.0 / 6.0, 2.0 / 3.0],
            [1.0 / 6.0, 1.0 / 6.0],
        ]
    )
    wts = np.array([1.0 / 6.0, 1.0 / 6.0, 1.0 / 6.0])
    return pts, wts


def _tri_gauss_7pt() -> tuple[np.ndarray, np.ndarray]:
    """7-pt barycentric triangle Gauss (degree 5 exact, Radon/Hammer).

    Required for the consistent-mass integrand ``N_i · N_j`` (degree 4
    in area coords for 15-node serendipity) — 3-pt under-integrates and
    leaves M rank-deficient.  Weights sum to ½.
    """
    sqrt15 = np.sqrt(15.0)
    a1 = (6.0 - sqrt15) / 21.0
    b1 = (9.0 + 2.0 * sqrt15) / 21.0
    a2 = (6.0 + sqrt15) / 21.0
    b2 = (9.0 - 2.0 * sqrt15) / 21.0
    w_center = 9.0 / 80.0
    w1 = (155.0 - sqrt15) / 2400.0
    w2 = (155.0 + sqrt15) / 2400.0
    pts = np.array(
        [
            [1.0 / 3.0, 1.0 / 3.0],
            [b1, a1],
            [a1, b1],
            [a1, a1],
            [b2, a2],
            [a2, b2],
            [a2, a2],
        ]
    )
    wts = np.array([w_center, w1, w1, w1, w2, w2, w2])
    return pts, wts


def _product_wedge_rule(
    tri_pts: np.ndarray, tri_wts: np.ndarray, line_pts: np.ndarray, line_wts: np.ndarray
) -> tuple[np.ndarray, np.ndarray]:
    """Tensor-product triangle × line → prism Gauss rule."""
    n = len(tri_wts) * len(line_wts)
    pts = np.empty((n, 3), dtype=np.float64)
    wts = np.empty(n, dtype=np.float64)
    i = 0
    for (x1, x2), tw in zip(tri_pts, tri_wts, strict=True):
        for zeta, lw in zip(line_pts, line_wts, strict=True):
            pts[i] = (x1, x2, zeta)
            wts[i] = tw * lw
            i += 1
    return pts, wts


# Stiffness: 3-pt triangle × 3-pt line = 9 pts — matches MAPDL's SOLID186
# degenerate-wedge rule.
_WEDGE_GAUSS_POINTS, _WEDGE_GAUSS_WEIGHTS = _product_wedge_rule(
    *_tri_gauss_3pt(), *_line_gauss_3pt()
)

# Mass: 7-pt triangle × 3-pt line = 21 pts (exact for N_i·N_j = degree 4 in
# area × degree 4 in ζ) — required to keep consistent mass SPD.
_WEDGE_MASS_GAUSS_POINTS, _WEDGE_MASS_GAUSS_WEIGHTS = _product_wedge_rule(
    *_tri_gauss_7pt(), *_line_gauss_3pt()
)


def _wedge_b_matrix(dN_dx: np.ndarray) -> np.ndarray:
    """Return the 6×45 strain-displacement matrix from ``dN/dx (15, 3)``."""
    B = np.zeros((6, _WEDGE_NDOF), dtype=np.float64)
    for i in range(_WEDGE_N):
        bx, by, bz = dN_dx[i]
        c = 3 * i
        B[0, c + 0] = bx
        B[1, c + 1] = by
        B[2, c + 2] = bz
        B[3, c + 0] = by
        B[3, c + 1] = bx
        B[4, c + 1] = bz
        B[4, c + 2] = by
        B[5, c + 0] = bz
        B[5, c + 2] = bx
    return B


def _wedge_shape_matrix(N: np.ndarray) -> np.ndarray:
    """Return the 3×45 shape-function matrix ``Φ`` with three rows per DOF."""
    Phi = np.zeros((3, _WEDGE_NDOF), dtype=np.float64)
    for i in range(_WEDGE_N):
        Phi[0, 3 * i + 0] = N[i]
        Phi[1, 3 * i + 1] = N[i]
        Phi[2, 3 * i + 2] = N[i]
    return Phi


def _wedge_ke(coords: np.ndarray, E: float, nu: float) -> np.ndarray:
    """Assemble Ke for a 15-node wedge (9-pt Gauss)."""
    C = _elastic_matrix(E, nu)
    K = np.zeros((_WEDGE_NDOF, _WEDGE_NDOF), dtype=np.float64)
    for gp, w in zip(_WEDGE_GAUSS_POINTS, _WEDGE_GAUSS_WEIGHTS, strict=True):
        _, dN_dxi = _wedge_shape_and_derivs(*gp)
        J = dN_dxi.T @ coords
        det_J = float(np.linalg.det(J))
        if det_J <= 0:
            raise ValueError(f"SOLID186W: non-positive Jacobian {det_J:.3e} — check node ordering")
        dN_dx = np.linalg.solve(J, dN_dxi.T).T
        B = _wedge_b_matrix(dN_dx)
        K += B.T @ C @ B * det_J * w
    return K


def _wedge_me(coords: np.ndarray, rho: float) -> np.ndarray:
    """Assemble consistent mass for a 15-node wedge (21-pt Gauss).

    3-pt triangle under-integrates ``N_i · N_j`` (degree 4 in area) and
    leaves M rank-deficient; 7-pt triangle × 3-pt line fixes that.
    """
    M = np.zeros((_WEDGE_NDOF, _WEDGE_NDOF), dtype=np.float64)
    for gp, w in zip(_WEDGE_MASS_GAUSS_POINTS, _WEDGE_MASS_GAUSS_WEIGHTS, strict=True):
        N, dN_dxi = _wedge_shape_and_derivs(*gp)
        J = dN_dxi.T @ coords
        det_J = float(np.linalg.det(J))
        Phi = _wedge_shape_matrix(N)
        M += rho * (Phi.T @ Phi) * det_J * w
    return M


# =============================================================================
# 13-node pyramid — dedicated Bedrosian serendipity kernel
# =============================================================================

# Peterson / LibMesh / Bedrosian node numbering on the reference pyramid with
# square base (ξ, η) ∈ [-1, 1]² at ζ = 0 and apex at (0, 0, 1) (1-indexed):
#   1:( 1, 1, 0)  2:( 0, 1, 0)  3:(-1, 1, 0)  4:(-1, 0, 0)
#   5:(-1,-1, 0)  6:( 0,-1, 0)  7:( 1,-1, 0)  8:( 1, 0, 0)
#   9:(.5,.5,.5) 10:(-.5,.5,.5) 11:(-.5,-.5,.5) 12:(.5,-.5,.5)
#   13:( 0, 0, 1)   # apex
# VTK_QUADRATIC_PYRAMID layout (0-indexed): corners 0-3 CCW at ζ=0, apex 4,
# base midsides 5-8 (edges 0-1, 1-2, 2-3, 3-0), apex midsides 9-12
# (edges 0-4, 1-4, 2-4, 3-4).
_PETERSON_TO_VTK = np.array([0, 5, 1, 6, 2, 7, 3, 8, 9, 10, 11, 12, 4], dtype=np.int64)
_VTK_TO_PETERSON = np.argsort(_PETERSON_TO_VTK)


def _pyramid_shape(xi: float, eta: float, zeta: float) -> np.ndarray:
    """13-node Bedrosian shape functions in Peterson order.

    Inputs are scalar floats; the `s = 1 − ζ` rational factor has a
    removable singularity at the apex (ζ=1) that's outside any
    interior Gauss point, so we don't handle the apex limit here.
    """
    s = 1.0 - zeta
    N = np.empty(13, dtype=np.float64)
    N[0] = 0.25 * (xi + eta - 1.0) * ((1 + xi) * (1 + eta) - zeta + xi * eta * zeta / s)
    N[1] = 0.5 * (1 + xi - zeta) * (1 - xi - zeta) * (1 + eta - zeta) / s
    N[2] = 0.25 * (-xi + eta - 1.0) * ((1 - xi) * (1 + eta) - zeta - xi * eta * zeta / s)
    N[3] = 0.5 * (1 + eta - zeta) * (1 - eta - zeta) * (1 - xi - zeta) / s
    N[4] = 0.25 * (-xi - eta - 1.0) * ((1 - xi) * (1 - eta) - zeta + xi * eta * zeta / s)
    N[5] = 0.5 * (1 + xi - zeta) * (1 - xi - zeta) * (1 - eta - zeta) / s
    N[6] = 0.25 * (xi - eta - 1.0) * ((1 + xi) * (1 - eta) - zeta - xi * eta * zeta / s)
    N[7] = 0.5 * (1 + eta - zeta) * (1 - eta - zeta) * (1 + xi - zeta) / s
    N[8] = zeta * (1 + xi - zeta) * (1 + eta - zeta) / s
    N[9] = zeta * (1 - xi - zeta) * (1 + eta - zeta) / s
    N[10] = zeta * (1 - xi - zeta) * (1 - eta - zeta) / s
    N[11] = zeta * (1 + xi - zeta) * (1 - eta - zeta) / s
    N[12] = zeta * (2 * zeta - 1)
    return N


def _pyramid_shape_and_derivs(xi: float, eta: float, zeta: float) -> tuple[np.ndarray, np.ndarray]:
    """Analytical (Peterson-ordered) shape functions + gradients."""
    s = 1.0 - zeta
    s2 = s * s
    N = _pyramid_shape(xi, eta, zeta)
    dN = np.empty((13, 3), dtype=np.float64)
    # Base corners (Peterson 1, 3, 5, 7):
    dN[0, 0] = (
        -eta * eta
        - 2 * eta * xi
        + 2 * eta * zeta
        - eta
        + 2 * xi * zeta
        - 2 * xi
        - zeta * zeta
        + zeta
    ) / (4 * (zeta - 1))
    dN[0, 1] = (
        -2 * eta * xi + 2 * eta * zeta - 2 * eta - xi * xi + 2 * xi * zeta - xi - zeta * zeta + zeta
    ) / (4 * (zeta - 1))
    dN[0, 2] = (-(eta + xi - 1) * (-eta * xi * zeta + eta * xi * (zeta - 1) + (zeta - 1) ** 2)) / (
        4 * (zeta - 1) ** 2
    )
    dN[2, 0] = (
        eta * eta
        - 2 * eta * xi
        - 2 * eta * zeta
        + eta
        + 2 * xi * zeta
        - 2 * xi
        + zeta * zeta
        - zeta
    ) / (4 * (zeta - 1))
    dN[2, 1] = (
        2 * eta * xi + 2 * eta * zeta - 2 * eta - xi * xi - 2 * xi * zeta + xi - zeta * zeta + zeta
    ) / (4 * (zeta - 1))
    dN[2, 2] = ((-eta + xi + 1) * (eta * xi * zeta - eta * xi * (zeta - 1) + (zeta - 1) ** 2)) / (
        4 * (zeta - 1) ** 2
    )
    dN[4, 0] = (
        eta * eta
        + 2 * eta * xi
        + 2 * eta * zeta
        - eta
        + 2 * xi * zeta
        - 2 * xi
        + zeta * zeta
        - zeta
    ) / (4 * (zeta - 1))
    dN[4, 1] = (
        2 * eta * xi + 2 * eta * zeta - 2 * eta + xi * xi + 2 * xi * zeta - xi + zeta * zeta - zeta
    ) / (4 * (zeta - 1))
    dN[4, 2] = ((eta + xi + 1) * (-eta * xi * zeta + eta * xi * (zeta - 1) + (zeta - 1) ** 2)) / (
        4 * (zeta - 1) ** 2
    )
    dN[6, 0] = (
        -eta * eta
        + 2 * eta * xi
        - 2 * eta * zeta
        + eta
        + 2 * xi * zeta
        - 2 * xi
        - zeta * zeta
        + zeta
    ) / (4 * (zeta - 1))
    dN[6, 1] = (
        -2 * eta * xi + 2 * eta * zeta - 2 * eta + xi * xi - 2 * xi * zeta + xi + zeta * zeta - zeta
    ) / (4 * (zeta - 1))
    dN[6, 2] = ((eta - xi + 1) * (eta * xi * zeta - eta * xi * (zeta - 1) + (zeta - 1) ** 2)) / (
        4 * (zeta - 1) ** 2
    )
    # Base midsides (Peterson 2, 4, 6, 8):
    dN[1, 0] = xi * (eta - zeta + 1) / (zeta - 1)
    dN[1, 1] = (xi - zeta + 1) * (xi + zeta - 1) / (2 * (zeta - 1))
    dN[1, 2] = (
        -eta * xi * xi
        - eta * zeta * zeta
        + 2 * eta * zeta
        - eta
        + 2 * zeta**3
        - 6 * zeta * zeta
        + 6 * zeta
        - 2
    ) / (2 * s2)
    dN[3, 0] = -(eta - zeta + 1) * (eta + zeta - 1) / (2 * zeta - 2)
    dN[3, 1] = eta * (-xi - zeta + 1) / (zeta - 1)
    dN[3, 2] = (
        eta * eta * xi
        + xi * zeta * zeta
        - 2 * xi * zeta
        + xi
        + 2 * zeta**3
        - 6 * zeta * zeta
        + 6 * zeta
        - 2
    ) / (2 * s2)
    dN[5, 0] = xi * (-eta - zeta + 1) / (zeta - 1)
    dN[5, 1] = -(xi - zeta + 1) * (xi + zeta - 1) / (2 * zeta - 2)
    dN[5, 2] = (
        eta * xi * xi
        + eta * zeta * zeta
        - 2 * eta * zeta
        + eta
        + 2 * zeta**3
        - 6 * zeta * zeta
        + 6 * zeta
        - 2
    ) / (2 * s2)
    dN[7, 0] = (eta - zeta + 1) * (eta + zeta - 1) / (2 * (zeta - 1))
    dN[7, 1] = eta * (xi - zeta + 1) / (zeta - 1)
    dN[7, 2] = (
        -eta * eta * xi
        - xi * zeta * zeta
        + 2 * xi * zeta
        - xi
        + 2 * zeta**3
        - 6 * zeta * zeta
        + 6 * zeta
        - 2
    ) / (2 * s2)
    # Apex midsides (Peterson 9, 10, 11, 12):
    dN[8, 0] = zeta * (-eta + zeta - 1) / (zeta - 1)
    dN[8, 1] = zeta * (-xi + zeta - 1) / (zeta - 1)
    dN[8, 2] = (
        zeta * (eta - zeta + 1) * (xi - zeta + 1)
        + (zeta - 1)
        * (zeta * (eta - zeta + 1) + zeta * (xi - zeta + 1) - (eta - zeta + 1) * (xi - zeta + 1))
    ) / s2
    dN[9, 0] = zeta * (eta - zeta + 1) / (zeta - 1)
    dN[9, 1] = zeta * (xi + zeta - 1) / (zeta - 1)
    dN[9, 2] = (
        -zeta * (eta - zeta + 1) * (xi + zeta - 1)
        + (zeta - 1)
        * (zeta * (eta - zeta + 1) - zeta * (xi + zeta - 1) + (eta - zeta + 1) * (xi + zeta - 1))
    ) / s2
    dN[10, 0] = zeta * (-eta - zeta + 1) / (zeta - 1)
    dN[10, 1] = zeta * (-xi - zeta + 1) / (zeta - 1)
    dN[10, 2] = (
        zeta * (eta + zeta - 1) * (xi + zeta - 1)
        - (zeta - 1)
        * (zeta * (eta + zeta - 1) + zeta * (xi + zeta - 1) + (eta + zeta - 1) * (xi + zeta - 1))
    ) / s2
    dN[11, 0] = zeta * (eta + zeta - 1) / (zeta - 1)
    dN[11, 1] = zeta * (xi - zeta + 1) / (zeta - 1)
    dN[11, 2] = (
        -zeta * (eta + zeta - 1) * (xi - zeta + 1)
        + (zeta - 1)
        * (-zeta * (eta + zeta - 1) + zeta * (xi - zeta + 1) + (eta + zeta - 1) * (xi - zeta + 1))
    ) / s2
    # Apex (Peterson 13):
    dN[12, 0] = 0.0
    dN[12, 1] = 0.0
    dN[12, 2] = 4 * zeta - 1
    return N, dN


def _pyramid_shape_vtk(xi: float, eta: float, zeta: float) -> tuple[np.ndarray, np.ndarray]:
    """Peterson shape functions remapped to VTK_QUADRATIC_PYRAMID order."""
    N_pet, dN_pet = _pyramid_shape_and_derivs(xi, eta, zeta)
    N_vtk = np.empty(13, dtype=np.float64)
    dN_vtk = np.empty((13, 3), dtype=np.float64)
    for pet_idx, vtk_idx in enumerate(_PETERSON_TO_VTK):
        N_vtk[vtk_idx] = N_pet[pet_idx]
        dN_vtk[vtk_idx] = dN_pet[pet_idx]
    return N_vtk, dN_vtk


def _pyramid_duffy_rule(n_base: int = 3, n_apex: int = 3) -> tuple[np.ndarray, np.ndarray]:
    """Collapsed-hex (Duffy) Gauss rule on the reference pyramid.

    Unit cube (u, v, w) ∈ [0, 1]³ → pyramid via
        ξ = (2u − 1)(1 − w),   η = (2v − 1)(1 − w),   ζ = w
    with Jacobian 4(1 − w)² (the "4" from the (2u-1) and (2v-1) scaling
    to [-1, 1]²).  Product of `n_base`-pt Gauss-Legendre in u and v
    with `n_apex`-pt GL in w — simple and exact for degree-(2n-1)
    polynomials on the reference pyramid when `n_base = n_apex = n`.
    """

    def gl_01(n: int) -> tuple[np.ndarray, np.ndarray]:
        pts, wts = np.polynomial.legendre.leggauss(n)
        return 0.5 * (pts + 1.0), 0.5 * wts

    u_pts, u_wts = gl_01(n_base)
    v_pts, v_wts = u_pts, u_wts
    w_pts, w_wts = gl_01(n_apex)

    pts = np.empty((n_base * n_base * n_apex, 3), dtype=np.float64)
    wts = np.empty(n_base * n_base * n_apex, dtype=np.float64)
    i = 0
    for u, wu in zip(u_pts, u_wts, strict=True):
        for v, wv in zip(v_pts, v_wts, strict=True):
            for w, ww in zip(w_pts, w_wts, strict=True):
                pts[i] = ((2 * u - 1) * (1 - w), (2 * v - 1) * (1 - w), w)
                wts[i] = wu * wv * ww * 4.0 * (1 - w) ** 2
                i += 1
    return pts, wts


_PYRAMID_GAUSS_POINTS, _PYRAMID_GAUSS_WEIGHTS = _pyramid_duffy_rule(3, 3)
# MAPDL SOLID186 pyramids integrate Ke and Me with a 2×2×2 Duffy rule
# (8 points, matching the 2×2×2 URI rule MAPDL uses for the 20-node hex
# — not a higher-order "consistent" pyramid quadrature).  Verified
# bit-exact (~1 × 10⁻¹¹ rel Frobenius) against the ptr.cdb pyramid
# EMAT on every tested element.
_PYRAMID_MAPDL_POINTS, _PYRAMID_MAPDL_WEIGHTS = _pyramid_duffy_rule(2, 2)


def _pyramid_b_matrix(dN_dx: np.ndarray) -> np.ndarray:
    """Return the 6×39 strain-displacement matrix from ``dN/dx (13, 3)``."""
    B = np.zeros((6, 39), dtype=np.float64)
    for i in range(13):
        bx, by, bz = dN_dx[i]
        col = 3 * i
        B[0, col + 0] = bx
        B[1, col + 1] = by
        B[2, col + 2] = bz
        B[3, col + 0] = by
        B[3, col + 1] = bx
        B[4, col + 1] = bz
        B[4, col + 2] = by
        B[5, col + 0] = bz
        B[5, col + 2] = bx
    return B


def _pyramid_ke(coords: np.ndarray, E: float, nu: float, rule: str = "mapdl") -> np.ndarray:
    """Bedrosian 13-node pyramid stiffness.

    ``rule="mapdl"`` uses the 2×2×2 Duffy rule (8 points) — bit-exact
    match to MAPDL's SOLID186 pyramid EMAT.  ``rule="consistent"`` uses
    27-pt Duffy (degree-5 exact in ξ/η × degree-5 in ζ); mathematically
    a more accurate integration of the quartic ``Bᵀ C B`` integrand but
    diverges from MAPDL by ~13% in Frobenius norm.
    """
    if rule == "mapdl":
        pts, wts = _PYRAMID_MAPDL_POINTS, _PYRAMID_MAPDL_WEIGHTS
    elif rule == "consistent":
        pts, wts = _PYRAMID_GAUSS_POINTS, _PYRAMID_GAUSS_WEIGHTS
    else:
        raise ValueError(f"unknown rule {rule!r}; expected 'mapdl' or 'consistent'")
    C = _elastic_matrix(E, nu)
    K = np.zeros((39, 39), dtype=np.float64)
    for gp, w in zip(pts, wts, strict=True):
        _, dN_dxi = _pyramid_shape_vtk(*gp)
        J = dN_dxi.T @ coords
        det_J = float(np.linalg.det(J))
        if det_J <= 0:
            raise ValueError(f"SOLID186P: non-positive Jacobian {det_J:.3e} — check node ordering")
        dN_dx = np.linalg.solve(J, dN_dxi.T).T
        B = _pyramid_b_matrix(dN_dx)
        K += B.T @ C @ B * det_J * w
    return K


def _pyramid_me(coords: np.ndarray, rho: float, rule: str = "mapdl") -> np.ndarray:
    """Bedrosian 13-node pyramid consistent mass.

    Same rule options as :func:`_pyramid_ke`.  Default ``"mapdl"``
    uses the 2×2×2 Duffy rule (matches MAPDL ptr_em.emat to ~1e-11
    rel Frobenius); ``"consistent"`` uses 27-pt Duffy.
    """
    if rule == "mapdl":
        pts, wts = _PYRAMID_MAPDL_POINTS, _PYRAMID_MAPDL_WEIGHTS
    elif rule == "consistent":
        pts, wts = _PYRAMID_GAUSS_POINTS, _PYRAMID_GAUSS_WEIGHTS
    else:
        raise ValueError(f"unknown rule {rule!r}; expected 'mapdl' or 'consistent'")
    M = np.zeros((39, 39), dtype=np.float64)
    Phi = np.zeros((3, 39), dtype=np.float64)
    for gp, w in zip(pts, wts, strict=True):
        N, dN_dxi = _pyramid_shape_vtk(*gp)
        J = dN_dxi.T @ coords
        det_J = float(np.linalg.det(J))
        for i in range(13):
            Phi[0, 3 * i + 0] = N[i]
            Phi[1, 3 * i + 1] = N[i]
            Phi[2, 3 * i + 2] = N[i]
        M += rho * (Phi.T @ Phi) * det_J * w
    return M


# =============================================================================
# 13-node pyramid — hex-fold legacy path (kept as opt-in for regressions)
# =============================================================================

# 20-slot hex slot → 13-node pyramid index (M=N=O=P collapse to apex).
# VTK_QUADRATIC_PYRAMID order: 4 base corners, apex (idx 4), 4 base
# midsides (0-1, 1-2, 2-3, 3-0), 4 base-to-apex midsides (0-4, 1-4, 2-4, 3-4).
_PYRAMID_SLOT_MAP = np.array(
    [
        0,  # I
        1,  # J
        2,  # K
        3,  # L
        4,  # M = apex
        4,  # N = apex
        4,  # O = apex
        4,  # P = apex
        5,  # Q=IJ base 0-1
        6,  # R=JK base 1-2
        7,  # S=KL base 2-3
        8,  # T=LI base 3-0
        4,  # U=MN collapsed apex
        4,  # V=NO collapsed apex
        4,  # W=OP collapsed apex
        4,  # X=PM collapsed apex
        9,  # Y=IM base-to-apex 0-4
        10,  # Z=JN base-to-apex 1-4 (N=M, so 1→apex)
        11,  # A=KO base-to-apex 2-4
        12,  # B=LP base-to-apex 3-4
    ],
    dtype=np.int64,
)


def _fold_matrix(slot_map: np.ndarray, n_compact: int) -> np.ndarray:
    """Return the sparse 60×(3*n_compact) indicator ``T`` for 3-DOF folding."""
    T = np.zeros((60, 3 * n_compact), dtype=np.float64)
    for hex_slot, compact_slot in enumerate(slot_map):
        for d in range(3):
            T[3 * hex_slot + d, 3 * compact_slot + d] = 1.0
    return T


_T_PYRAMID = _fold_matrix(_PYRAMID_SLOT_MAP, 13)


def _expand_to_hex(coords_compact: np.ndarray, slot_map: np.ndarray) -> np.ndarray:
    """Duplicate compact coords back to the 20-node hex layout."""
    return coords_compact[slot_map]


def _fold(K_hex: np.ndarray, T: np.ndarray) -> np.ndarray:
    """Apply the slot-collapse fold ``Tᵀ K T`` (exact, no approximation)."""
    return T.T @ K_hex @ T


# =============================================================================
# Element classes
# =============================================================================



[docs]
@register
class SOLID186W(ElementBase):
    """15-node degenerate SOLID186 wedge (K=L, O=P collapse).

    Dedicated quadratic serendipity kernel with shape functions in area
    coords ``(ξ₁, ξ₂, ζ)``, stiffness at 9-pt Gauss (3-pt triangle × 3-pt
    line), consistent mass at 21-pt Gauss (7-pt triangle × 3-pt line).
    """

    name = "SOLID186W"
    n_nodes = _WEDGE_N
    n_dof_per_node = 3
    vtk_cell_type = 26  # VTK_QUADRATIC_WEDGE


[docs]
    @staticmethod
    def ke(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
    ) -> np.ndarray:
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        if coords.shape != (_WEDGE_N, 3):
            raise ValueError(f"SOLID186W expects (15, 3) coords, got {coords.shape}")
        E = float(material["EX"])
        nu = float(material["PRXY"])
        return _wedge_ke(coords, E, nu)



[docs]
    @staticmethod
    def me(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
        lumped: bool = False,
    ) -> np.ndarray:
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        if coords.shape != (_WEDGE_N, 3):
            raise ValueError(f"SOLID186W expects (15, 3) coords, got {coords.shape}")
        rho = float(material["DENS"])
        M = _wedge_me(coords, rho)
        if lumped:
            M = np.diag(M.sum(axis=1))
        return M





[docs]
@register
class SOLID186P(ElementBase):
    """13-node degenerate SOLID186 pyramid.

    Default: dedicated Bedrosian apex-singular serendipity kernel
    integrated with a 2×2×2 Duffy collapsed-hex Gauss rule.  This is
    bit-exact (~1 × 10⁻¹¹ rel Frobenius) against MAPDL's ptr_em.emat
    pyramid block on every measured element.  The 2×2×2 rule (not
    3×3×3 or higher) is what MAPDL actually uses: the Bedrosian shape
    functions integrated with a higher-order rule diverge from MAPDL's
    output by ~13% because MAPDL deliberately under-integrates in the
    SOLID186 URI style for pyramids just like it does for hexes.

    The hex-fold wrapper (``T^T · K_20-node · T``) is retained as
    ``material["_SOLID186P_FORMULATION"] = "hex_fold"`` for back-compat
    and regression testing.
    """

    name = "SOLID186P"
    n_nodes = 13
    n_dof_per_node = 3
    vtk_cell_type = 27  # VTK_QUADRATIC_PYRAMID


[docs]
    @staticmethod
    def ke(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
    ) -> np.ndarray:
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        if coords.shape != (13, 3):
            raise ValueError(f"SOLID186P expects (13, 3) coords, got {coords.shape}")
        formulation = material.get("_SOLID186P_FORMULATION", "bedrosian")
        if formulation == "bedrosian":
            E = float(material["EX"])
            nu = float(material["PRXY"])
            rule = material.get("_SOLID186P_PYRAMID_RULE", "mapdl")
            # C++ kernel implements MAPDL's 2×2×2 Duffy rule only —
            # higher-order ``consistent`` stays on the Python path.
            if rule == "mapdl" and _EXT is not None:
                return _EXT.solid186p_ke(coords, E, nu)
            return _pyramid_ke(coords, E, nu, rule=rule)
        if formulation == "hex_fold":
            coords_hex = _expand_to_hex(coords, _PYRAMID_SLOT_MAP)
            K_hex = SOLID186.ke(coords_hex, material, real)
            return _fold(K_hex, _T_PYRAMID)
        raise ValueError(
            f"SOLID186P: unknown _SOLID186P_FORMULATION {formulation!r}; "
            f"expected 'bedrosian' or 'hex_fold'"
        )



[docs]
    @staticmethod
    def me(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
        lumped: bool = False,
    ) -> np.ndarray:
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        if coords.shape != (13, 3):
            raise ValueError(f"SOLID186P expects (13, 3) coords, got {coords.shape}")
        formulation = material.get("_SOLID186P_FORMULATION", "bedrosian")
        if formulation == "bedrosian":
            rho = float(material["DENS"])
            rule = material.get("_SOLID186P_PYRAMID_RULE", "mapdl")
            if rule == "mapdl" and _EXT is not None:
                M = _EXT.solid186p_me(coords, rho)
            else:
                M = _pyramid_me(coords, rho, rule=rule)
        elif formulation == "hex_fold":
            coords_hex = _expand_to_hex(coords, _PYRAMID_SLOT_MAP)
            M_hex = SOLID186.me(coords_hex, material, real, lumped=False)
            M = _fold(M_hex, _T_PYRAMID)
        else:
            raise ValueError(
                f"SOLID186P: unknown _SOLID186P_FORMULATION {formulation!r}; "
                f"expected 'bedrosian' or 'hex_fold'"
            )
        if lumped:
            M = np.diag(M.sum(axis=1))
        return M


    @staticmethod
    def ke_batch(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
    ) -> np.ndarray | None:
        if _EXT is None:
            return None
        formulation = material.get("_SOLID186P_FORMULATION", "bedrosian")
        if formulation != "bedrosian":
            return None
        if material.get("_SOLID186P_PYRAMID_RULE", "mapdl") != "mapdl":
            return None
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        return _EXT.solid186p_ke_batch(coords, float(material["EX"]), float(material["PRXY"]))

    @staticmethod
    def me_batch(
        coords: np.ndarray,
        material: dict[str, float],
        real: np.ndarray,
        lumped: bool = False,
    ) -> np.ndarray | None:
        if _EXT is None or lumped:
            return None
        formulation = material.get("_SOLID186P_FORMULATION", "bedrosian")
        if formulation != "bedrosian":
            return None
        if material.get("_SOLID186P_PYRAMID_RULE", "mapdl") != "mapdl":
            return None
        coords = np.ascontiguousarray(coords, dtype=np.float64)
        return _EXT.solid186p_me_batch(coords, float(material["DENS"]))