discretization.tex

% !TEX root = main.tex

This section describes Trilinos packages that provide tools for spatial and temporal discretization of integro-differential equations. Most of Trilinos discretization efforts have been devoted to implement tools for mesh-based discretizations of partial differential equations (PDEs) with a focus on high-order finite elements. A notable exception is constituted by the research package Compadre, which provides tools for meshless approximation of linear operators that can be used for the discretization of differential equations and for data transfer.
Trilinos discretization packages have been adopted by many applications addressing a wide range of physics problems, including solid mechanics, earth system modeling, semiconductor devices and electro-magnetics. These applications have taken different approaches in adopting Trilinos mesh-based discretization tools. The less intrusive approach is the adoption of Intrepid2 tools to perform local finite element assembly. The application has to manage the global assembly possibly using the \code{DoFManager} provided by Panzer and FE Crs matrix and vector structures provided by Tpetra.
A more intrusive approach is to additionally use the Phalanx package for managing dependencies of field evaluations in conjunction with the Thyra Model Evaluator and Sacado algorithmic differentiations, and possibly Tempus for time integration. This approach is particularly useful when developing complex multiphysics problems because it allows easy re-use of computational kernels and automates the computation of Jacobians and sensitivities.
The most intrusive approach is to build the application around the Panzer package, which provides all of the above, plus the handling of linear and nonlinear solvers and integrated constrained optimization capabilities.
In the following we describe some of the Trilinos discretization packages. For brevity, we do not include the description of these packages: Sierra ToolKit (STK), Krino and Percept, which provide mesh and level-set tools, and Shards, which provides tools for mesh topology. We refer to Trilinos website (\url{https://trilinos.github.io}) for a brief description of these packages.


\subsection{Local Assembly: Intrepid2}
Intrepid2 provides interoperable tools for compatible discretizations of PDEs; it is a performance-portable re-implementation and extension of the legacy Intrepid package \cite{bochev2012}. Intrepid2 mainly focuses on local assembly of continuous and discontinuous finite elements. It also provides limited capabilities for finite volume discretization.  Intrepid2 works on batches of elements (cells), and provides tools to efficiently compute discretized linear functionals (e.g., right-hand-side vectors) and differential operators (e.g., stiffness matrices) at the element level. Intrepid2 implements compatible finite element spaces of various polynomial orders for $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ function spaces on triangles, quadrilaterals, tetrahedrons, hexahedrons, wedges and pyramids. It provides both Lagrangian basis functions and hierarchical basis functions \cite{fuentes2015} and it implements performance optimizations (e.g., sum factorizations) exploiting the underlying structure of the problem (e.g., tensor-product elements or other symmetries).  The degrees of freedom of $H({\rm div})$ and $H({\rm curl})$ finite elements as well as high-order $H({\rm grad})$ finite elements depend on the global orientation of edges and faces and Intrepid2 provides orientation tools for matching the degrees of freedom on shared edges and faces. It also provides interpolation-based projection tools for projecting functions in $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ to the respective discrete spaces. Intrepid2 implements these capabilities through the following classes:
\begin{itemize}
\item \code{CellTools}: This class provides geometric operations on the reference and physical frame. This includes computation of tangents and normals to edges/faces in the physical frame, computation of Jacobian of the reference-to-physical frame maps, and other metric computations. 
\item \code{CubatureFactory}: This class provides several quadrature rules (called \emph{cubatures} in Intrepid2) of various degrees of accuracy for approximating integrals over the elements and their boundaries.
\item \code{Basis:} This is the base class for a variety of basis functions for compatible finite element spaces. Each class includes a \code{getValues()} method that computes the value of the basis functions or their derivatives (e.g., gradient for $H({\rm grad})$ functions, curl for $H({\rm curl})$ functions) at a set of input points. The implementation of \code{getValues()} can be very different depending on the basis. Specific optimizations are available for tensor-product elements.  Additionally, there is a \code{BasisFamily} class with a convenience method, \code{getBasis()}, which constructs a basis depending on a template argument specifying the type of basis (e.g., hierarchical or nodal), the cell topology and function space on which it is defined, and its polynomial degree.
\item \code{OrientationTools}: This class provides methods to orient the basis functions based on the global orientation of edges and faces, determined by the global numbering of the cell vertices. This is achieved by building a linear operator (a permutation for tensor-product elements) that encodes the orientation of a particular cell, and applying that operator to the reference basis functions.
\item \code{ProjectionTools}: This class provides methods for interpolation-based projections of a given function into a compatible finite element space or between compatible finite element spaces~\cite{demkowicz2007}.  The provided projections commute with the corresponding differential operators if the quadrature rules can exactly integrate the functions being projected. As an example, projecting an $H({\rm grad})$ function into the $H({\rm grad})$ finite element space and then taking its gradient gives the same result as taking the gradient of the function first and then projecting the gradient into the $H({\rm curl})$ finite element space.
\item \code{FunctionSpaceTools}: This class provides transformations of fields from reference to physical frame and back, computation of measures on edges, faces and cells, scalar/vector/tensor multiplications and contractions for computing integrals.
\item \code{IntegrationTools}: This class provides integration methods that can take advantage of tensor product structures in basis values, providing mechanisms for performance-portable, \emph{sum-factorized} assembly across $H({\rm grad})$, $H({\rm curl})$, $H({\rm div})$ and $L^2$ function spaces.  In the future, we plan to provide similar interfaces to support matrix-free discretizations.
\end{itemize}
Intrepid2 makes use of Kokkos containers to enable memory layouts that are adapted to the computational platform. Intrepid2 also uses Kokkos for its core computational kernels, enabling threaded execution across a variety of architectures. The data types used by Intrepid2 are templated; it is therefore possible to propagate Sacado types through Intrepid2 to perform automatic differentiation. Current development of Intrepid2 focuses on providing efficient matrix-free discretizations to enhance efficiency on GPU architectures. 

\subsection{Local Field Evaluation: Phalanx}
Phalanx is a local field evaluation library designed for equation assembly in PDE applications. The goal of Phalanx is to decompose a complex problem into a number of simpler problems with managed dependencies to support rapid development and extensibility of PDE codes \cite{Notz2012,pawlowski2012automating,pawlowski2012automatingpart2}. The data structures use Kokkos \cite{trott2021kokkos} for performance portability. Through the use of template metaprogramming, Phalanx supports arbitrary user defined data types and evaluation types. This feature allows for simple integration of automatic differentiation tools via operator-overloaded scalar types from the Sacado package \cite{phipps2022automatic}; cf.~Section~\ref{sec:sacado}. From a simple equation definition, quantities such as Jacobians, Jacobian-vector products, Hessians and parameter sensitivities can be evaluated to machine precision. These quantities can be used by other Trilinos packages for operations including Newton-based nonlinear solves, gradient-based optimization, constraint enforcement, and bifurcation analysis \cite{pawlowski2012automating,pawlowski2012automatingpart2}.

Phalanx uses a graph-based design to manage data dependencies. The runtime-defined directed acyclic graph (DAG) allows for rapid prototyping in a production environment where simple interfaces for analysts, flexible models/data structures, and integration of non-trivial third-party libraries are paramount. Phalanx is used in a number of large-scale parallel codes including Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd} and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.

In recent years, Phalanx has been extended to provide utilities for performance portability under automatic differentiation. For example, Phalanx provides tools for building and managing a Kokkos view-of-views on device without the use of unified virtual memory and provides utilities for running virtual functions on device. In the future, these utilities may be split out into a separate package.

\subsection{Time Integration: Tempus}
Tempus (Latin meaning time as in ``tempus fugit'' $\rightarrow$ ``time flies'') is the Trilinos time-integration package for advanced transient
analysis.  It includes various time integrators and embedded
sensitivity analysis for next-generation code architectures.  Tempus
provides “out-of-the-box” time-integration capabilities, which
allows users to quickly and easily incorporate time-integration
capabilities to their applications and switch between various time
integrators depending on the simulation needs.  Additionally, Tempus
provides “build-your-own” capabilities, which allows applications
to incorporate various Tempus components to augment or replace
an application's transient capabilities. Other capabilities include
embedded error analysis, sensitivity analysis, and transient optimization
with ROL.

Tempus offers a general infrastructure for the time evolution of
solutions through a variety of general integration schemes.  Tempus
provides time integrators for explicit and implicit methods and for
first- and second-order ODEs.  It can be used from small systems of
equations (e.g., single ODEs for the time evolution of plasticity
models, and multiple ODEs for coupled chemical reactions) to
large-scale transient simulations requiring exascale computing
(e.g., flow fields around reentry vehicles and magneto-hydrodynamics).

Tempus has several components that can be used in concert or
individually, depending on the needs of the application.
\begin{itemize}
  \item Integrators are the time-loop structure for time integration
  and provide several features, e.g., control the advancement of
  the solution, selection of the next timestep size and solution
  output.

  \item Time steppers are individual methods that advance the
  solution from one step to the next.  A variety of time steppers
  are available:
  \begin{itemize}
    \item \emph{Classic one-step methods} (e.g., forward Euler and trapezoidal methods)
    \item \emph{Explicit Runge-Kutta (RK) methods} (e.g., RK explicit 4 Stage)
    \item \emph{Diagonally Implicit Runge-Kutta (DIRK) methods} (e.g.,
    general tableau DIRK and many specific DIRK/SDIRK methods)
    \item \emph{Implicit-Explicit (IMEX) Runge-Kutta methods} (e.g., IMEX
    RK SSP2, IMEX RK SSP3, and general tableau IMEX RK methods)
    \item \emph{Multi-Step methods} (i.e., BDF2)
    \item \emph{Second-order ODE methods} (e.g., Leapfrog, Newmark methods
    and HHT-Alpha)
    \item \emph{Steppers with subSteppers} (e.g., operator-split and
    subcycling methods)
  \end{itemize}

  \item Solution history is used to maintain the solution during
  time-step failure, solution restart/output, interpolation of
  solution between time steps, and to provide the solution for
  transient adjoint sensitivities.

  \item Timestep control and strategies provide methods to select
  the time-step size based on user input and/or temporal error
  control (e.g., bounding min/max time-step size, relative/absolute
  maximum error, and timestep adjustments for output and checkpointing).
\end{itemize}

Additionally, Tempus has several mechanisms which allow users to
insert application-specific algorithms into Tempus components (e.g.,
through observers and creating derived classes).

\subsection{Finite Element Analysis: Panzer}
The Panzer package provides global tools for finite element analysis. The package is targeted for large-scale parallel implicit PDEs using continuous and discontinuous compatible finite elements as well as control volume finite elements. Panzer enables the solution of nonlinear problems by interfacing with several Trilinos linear and nonlinear solvers. It uses the Intrepid2 package for arbitrary order finite element bases. It computes derivatives and sensitivities with the Sacado automatic differentiation (AD) package. Panzer supports both Epetra and Tpetra data structures and achieves performance portability through the Kokkos programming model.

Panzer is designed for multiphysics systems. The assembly engine allows for different equation sets in each element block of the mesh and allows for mixed bases for degrees of freedom (DOF) within each element block (e.g., mixed $H(\rm{div})$ and $H(\rm{curl})$ system for electro-magnetics). The assembly tools can create a single fully coupled Jacobian matrix with all off-block dependencies (as a single Tpetra::CrsMatrix), or it can create a blocked system, grouping sets of DOFs into separate explicit matrices. This allows for physics-based block preconditioning strategies using the Teko block preconditioner package in Trilinos \cite{Bonilla2023,Cyr2016a} or MueLu's fully coupled AMG hierarchies for multiphysics systems~\cite{Ohm2022a}. The assembly process relies on the Phalanx package for efficient assembly of the multiphysics systems. Panzer additionally can wrap the assembly in a \code{Thyra::ModelEvaluator}, providing direct interfaces to the linear and nonlinear analysis packages in Trilinos.

Panzer can provide low level utilities for application codes to build on, or can be used as a high level application framework. Important capabilities include the following:
\begin{itemize}
\item \code{DOFManager}: Panzer provides a stand-alone DOF manager class in the dof-mgr subpackage. Given a list of DOFs and their corresponding basis and element blocks, the DOF manager can provide the mapping from DOFs on the mesh entities to the entries in linear algebra objects such as a residual vector or Jacobian matrix. The DOF manager can return the objects required to build distributed Tpetra 
%and Epetra 
maps and graphs for both uniquely owned global indices and ghosted indices used during assembly. 
%\todo{Remove ``Epetra''?}
It provides the local indexing used during assembly as well. The DOF manager contains a mesh abstraction called a connection manager that provides information about mesh connectivity, global numbering of mesh topological entities and element block groups. It is designed to support any underlying mesh database, allowing applications to use the DOF manager with any finite element application.
\item \code{STK\_Interface}: Panzer contains a concrete implementation of the connection manager API for the STK mesh database package in Trilinos. The \code{STK\_Interface} object wraps a STK mesh database. It provides a simple interface for accessing global indices on the mesh and can be used to read/write associated solution data to the database. It additionally supports SEACAS use for writing mesh data to disk. This \code{STK\_Interface} also includes support for periodic boundary conditions.  This capability can match topological entities on periodic parallel distributed faces. Once matched, the DOFs are unified to enforce periodicity for the DOF manager. This capability is in the adapters-stk subpackage.
\item \emph{Linear Object Factory:} Panzer provides a linear object factory and linear object container designed to support parallel distributed assembly. 
%It supports both Epetra and Tpetra objects. 
The returned containers hold the linear algebra objects for either a uniquely owned DOF map used for solving the linear system or a ghosted version of the linear objects that can be used for assembly. The containers support export and import operations between the unique and ghosted containers. These are used to simplify the assembly process and abstract the underlying Tpetra linear algebra objects. %(i.e., Epetra or Tpetra). 
The linear object factory can also create DOF gather and scatter functors for the corresponding (and possibly blocked) matrices. The gather and scatter operations are specialized on the assembly types, including residuals, Jacobians, and Tangents.
Support for Hessians will be implemented as needed.
The capability is found in the disc-fe subpackage.

\item \emph{Worksets:} The Panzer assembly process provides workset containers to hold static data that can be reused in finite element computations. For example, when using a static mesh, these containers could hold the Jacobian, Jacobian inverse and all finite element basis values at the quadrature points. The workset concept breaks the local elements on an MPI process into smaller blocks of uniform computations that can be dispatched to a CPU or GPU. The workset helps to control total memory use for temporaries in the Phalanx directed acyclic graph. This capability is part of the disc-fe subpackage.
\item \emph{Assembly Kernels:} Panzer provides a number of assembly kernels that are optimized for use with Sacado AD data types. When assembling on GPUs, the assembly kernels were found to be an order of magnitude faster if the AD evaluation was parallelized over the internal derivative dimension \cite{phipps2022automatic}. This required using kernels with a \code{Kokkos::TeamPolicy} for finite element assembly where the lowest level of parallelism is used internally by the Sacado AD type. When building Panzer for GPUs, the \code{DFad} hierarchic parallelism flag in Sacado should be enabled for kernel performance. The kernels can be found in the disc-fe subpackage. 
\item Panzer provides an example miniapp for implicit electro-magnetics that is used for benchmarking linear solver performance and for acceptance testing of new high performance computing systems. It demonstrates an $H(\rm{curl})$-$H(\rm{div})$ formulation for the electric and magnetic fields. This is found in the MiniEM subpackage.
\end{itemize}

The Panzer package is intended to provide both low- and high-level tools for implicit finite elements discretizations. The high level tools aggregate many Trilinos discretization and solver packages. While the high level tools can be used as a rapid prototyping environment, the front end is fairly complex to set up, as opposed to true rapid prototyping frameworks such as deal.ii \cite{dealII95}, FEniCSx \cite{BarattaEtal2023} or Firedrake \cite{FiredrakeUserManual}. Panzer is not user friendly in this regard, however the examples and miniapps are a good starting point and can be quickly adapted to other physics. A number of performance portable applications are using Panzer tools at different levels of adoption; these include Albany \cite{Salinger2016}, Charon \cite{CharonUsersManual2020}, Drekar \cite{Crockatt2022,Miller2019,Shadid2016mhd} and EMPIRE \cite{BettencourtBrownEtAl2021_EmpirePic}.

\subsection{Approximation of Linear Operators: Compadre}

The Compadre package provides tools for the approximation of linear operators (including point evaluation and derivatives), given the location of samples of a function over an unstructured cloud of points. The resulting stencils, when applied directly to samples of the function at these locations, provide an approximation of the linear operator acting on the function at the point(s) queried. This is useful for meshed and meshless data transfer applications (remap). Samples of the function at the specified locations can also be viewed as unknowns, in which case the solution returned by Compadre can be used as a stencil for meshless discretization of PDEs \cite{REBAR2024}.

The package uses generalized moving least squares (GMLS) for approximating functionals. GMLS generalizes classic moving least squares (MLS) in the sense that selecting point evaluations for the sampling functionals and the target operator in GMLS provides a traditional moving least squares reconstruction. However, GMLS provides much more flexibility than MLS by allowing one to possibly replace the evaluation of the target function at points with \emph{sampling} functionals of a function (e.g., integrals of the function over local domains),  and, instead of approximating the target function, to approximate a target \emph{operator} (e.g., gradient, curl, divergence) of a function. A detailed description of GMLS can be found in \cite{mirzaei2012generalized,wendland2004scattered}.

With this increased flexibility, exotic examples of remap are possible. For instance, it is possible to use an average vector normal integral over edges (Raviart-Thomas type representation) or a cell-average integral (finite-volume representation) as the sampling functionals or the target operator \cite{gmd-15-6601-2022}. Compadre supports full space reconstruction in 1D, 2D, and 3D, and also supports select sampling functionals and target operators on manifolds \cite{GROSS2020109340}.

Compadre's stencil generation involves independent problems to be solved in parallel at the team level with loops over the thread and vector level within each problem. This hierarchical parallelism is achieved with performance portability by using the Kokkos programming model and leveraging the batched QR with pivoting algorithm implemented in Kokkos Kernels.

Future plans include the implementation of other meshless methods like locally-supported radial basis functions.