You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

MINI Element for the Navier–Stokes System in 3D: Vectorized Codes and Superconvergence

Abstract

A fast vectorized codes for assembly mixed finite element matrices for the generalized Navier–Stokes system in three space dimensions in the MATLAB language are proposed by the MINI element. Vectorization means that the loop over tetrahedra is avoided. Numerical experiments illustrate computational efficiency of the codes. An experimental superconvergence rate for the pressure component is established.

1Introduction

Numerical solution algorithms for the Navier Stokes equations is the rapidly developing field, in which various cost-effective methods are being proposed. This can be domain decomposition methods (Rønquist, 1996; Girault and Wheeler, 2008), methods based on splitting (Henriksen and Holmen, 2002; Viguerie and Veneziani, 2018), asymptotic expansions (Panasenko, 1998; Hoanga and Martinez, 2018), multigrid methods (Griebel et al., 1998; Pernice and Tocci, 2001) etc. In this paper we use a simple Schur complement algorithm. A survey of the Schur complement methods can be found in Loghin and Wathen (2002).

The paper focuses on the generalized Navier–Stokes system in three space dimensions (3D) approximated by the mixed finite element method using the MINI element called also the P1-bubble/P1 pair (Arnold et al., 1984). The main contribution of the paper consists in the development of the vectorized codes for fast assembly finite element matrices in the MATLAB language so that the loop over tetrahedra is avoided. These codes are very fast and enable to perform experiments with relatively large scale problems. Vectorized codes were proposed for different differential operators, e.g. see Rahman and Valdman (2015) and references therein. Our research is inspired by Koko (2019), where the vectorized codes for the generalized Stokes problem are proposed. However, an extension to the Navier–Stokes problem is not trivial. Formally, it consists of adding the nonlinear convective term in the momentum equation. This nonlinearity is typically treated iteratively using the Oseen or the Newton type linearization (Elman et al., 2014). In the first case, the Oseen convection matrix depends on the nodal velocity field from the previous iteration that is presented in computations. The situation for the Newton convective matrix is more involved, since it depends, in addition, on the partial derivatives of the velocity field that are not immediately presented. We approximate them from appropriate directional derivatives so that computed approximations are invariant with respect to local renumbering of nodes and can by easily vectorized. Another ingredient in the assembling process is the bubble component elimination that is performed by the Schur complement reduction on the element level. This elimination requires inverting blocks of the local matrices that is done by the vectorized Cramer rule. The elimination itself uses a vectorized variant of linear combinations of vectors and of sums of vector outer products. Note that the vectorized codes for the Navier–Stokes problem play an important role in the whole solution process, since the finite element matrices for the linearized subproblems are repeatedly assembled in each iterative step that have to be fast.

Numerical experiments with our codes show superconvergence rate of the finite element approximation of the pressure component that is close to O(h3/2) or higher. Similar results were observed for the pure Stokes problem by Cioncolini and Boffi (2022).

The rest of the paper is organized as follows. Section 2 presents the classical and weak formulation of the problem and introduces the basic iterative schemes. In Section 3, we describe mixed finite element approximation based on the MINI element, for which we derive local linear systems that are split on the bubble and non-bubble components. We present also an idea how to approximate partial derivatives from the discrete vector field of the previous iteration. In Section 4, we discuss element matrices in more details so that their final forms may be coded by vectorized operations. Section 5 introduces ideas of the vectorization and refers to our free available codes. In Section 6, we describe the dual implementation of the iterative scheme that we use in computations. Section 7 is devoted to numerical experiments. First, we demonstrate low time requirements of the vectorized codes and than we compute the convercence rates. Finally, we conclude with some remarks and comments in Section 8.

To better understand our presentation we use different font styles. The mathematics bold symbols are used for the vector functions or their vector arguments, e.g. u, p. The “text up” symbols are used for matrices, vectors and scalars on the level of finite elements, e.g. Akk, Bl, p. The “bold up” symbols are used for global matrices and vectors, e.g. A, BU, p. Finally, the typewriter style is used for codes, e.g. for p = 1:4.

2Formulation

Let ΩR3 be a bounded domain with a sufficiently smooth boundary Γ=Ω. We consider the steady Navier–Stokes system with the homogeneous Dirichlet boundary condition:

(1)
νΔu+u·u+αu+p=finΩ,·u=0inΩ,u=uDonΓ,
where ν>0 is the kinematic viscosity, α>0 is a constant from a time discretization of the unsteady problem, and f:ΩR3 represents the external forces. We are searching for the vector velocity field u:ΩR3 and the scalar pressure field p:ΩR. The existence and uniqueness of a weak solution to (1) are discussed in Girault and Raviart (1986).

We will consider two iterative methods for solving (1) with different linearizations of the convection term u·u. Let w be an approximation of u and δw be an increment such that u=w+δw. The Oseen type linearization is based on the omission of the linear term with δw:

u·u=(w+δw)·u=w·u+δw·uw·u.
In the Newton type linearization we omit the quadratic term with δw:
u·u=w·u+δw·(w+δw)=w·u+(uw)·w+δw·δww·u+u·ww·w.
We arrive at the following linearizations of (1):
(2)
νΔu+w·u+ρu·w+αu+p=f+ρw·winΩ,·u=0inΩ,u=0onΓ,
where w:ΩR3 is given. For ρ=0/1, we get the Oseen/Newton linearization of (1).

The weak formulation of (2) requires the following spaces:

V=(H01(Ω))3,Q={qL2(Ω):Ωqdx=0}
and the forms:
a(u,v)=νi=13Ωui·vidx,c(w;u,v)=Ω(w·u)·vdx,m(u,v)=αΩu·vdx,b(v,q)=Ωq(·v)dx,aρ(w;u,v)=a(u,v)+c(w;u,v)+ρc(u;w,v)+m(u,v),lρ(w;v)=Ωf·vdx+ρc(w;w,v),
where u,v,w(H1(Ω))3, qL2(Ω), and u=(u1,u2,u3), v=(v1,v2,v3).

The weak formulations of (2) read as follows:

(3)
Find(u,p)V×Qsuch that for all(v,q)V×Q,aρ(w;u,v)+b(v,p)=lρ(w;v),b(u,q)=0.
The problem (1) can be solved by the following iterative scheme:
(4)
Given(u(0),p(0))V×Q.Forκ1,find(u(κ),p(κ))V×Qsolving (3) withw=u(κ1).
This scheme will be called the Oseen/Newton iteration for ρ=0/1, respectively. A comprehensive discussion of convergence results can be found in Elman et al. (2014), pp. 344–346. The Oseen iteration are also caled the Picard iteration.

3Mixed Finite Element Approximation with the MINI Element

We approximate (3) by the mixed finite element method. It requires to choose a finite element pair satisfying the inf-sup condition (Brezzi and Fortin, 1991). Here, we use the P1-bubble/P1 pair called also the MINI element proposed by Arnold et al. (1984).

We suppose that Ω is a polyhedral domain. Let Th be a regular partition of Ω given by tetrahedra TjTh, 1jnt, Let xiΩ, 1inp, be finite element nodes. Each tetrahedron T=Tj has 4 vertices xi1,xi2,xi3,xi4T. Each vertex of T is associated with the local linear basis function ϕj=ϕj(x) such that ϕj(xik)=δjk, 1j,k4. The bubble function is defined on T by the product: ϕb=ϕb(x)=44ϕ1(x)ϕ2(x)ϕ3(x)ϕ4(x), xT. We denote also ϕ5=ϕb. Integrals over T can be evaluated by the formula:

Tϕ1α1(x)ϕ2α2(x)ϕ3α3(x)ϕ4α4(x)dx=6|T|α1!α2!α3!α4!(α1+α2+α3+α4)!,
where |T| is the tetrahedron volume.

On Th we introduce the space of the bubble functions and the piecewise linear functions:

Bh={vhC(Ω¯):vh|T=cTϕb,cTRTTh},Wh={vhC(Ω¯):vh|TP1(T)TTh},
respectively, where P1(T) is the space of all polynomials on T of the degree at least one. We approximate V and Q as follows:
Vh={vh(WhBh)3:vh(xi)=0xiΓ},Qh={qhWh:Ωqhdx=0},
respectively. The finite element approximation of (3) reads as follows:
(5)
Find(uh,ph)Vh×Qhsuch that for all(vh,ph)V0h×Qh,aρ(wh;uh,vh)+b(vh,ph)=lρ(wh;vh),b(uh,qh)=0,
where whVh is an approximation of w from (3).

On TTh the components of uh=(uh1,uh2,uh3) are the linear combinations of the basis functions:

uhk(x)=j=14ukjϕj(x)+ukbϕb(x),1k3,andph(x)=j=14pjϕj(x).
The linear systems arising from (5) over one element T read as:
(6)
A¯11A¯12A¯13B¯1A¯21A¯22A¯23B¯2A¯31A¯32A¯33B¯3B¯1B¯2B¯30u¯1u¯2u¯3p=f¯1f¯2f¯30,
where u¯k=(uk1,uk2,uk3,uk4,ukb), 1k3, and p=(p1,p2,p3,p4). The matrices A¯klR5×5 are algebraic counterparts of the form aρ:
A¯kk=R¯+C¯+ρN¯kk+M¯,A¯kl=ρN¯klforkl,
where R¯ is the diffusion matrix, C¯ is the Oseen convection matrix, N¯kl are the Newton convection matrices, 1k,l3, and, M¯ is the mass matrix. Further, B¯kR4×5 are the components of the divergence matrix representing the form b and the right-hand sides f¯kR5, 1k3, correspond to the form lρ. The entries are given as follows:
(7)
(R¯)ij=νTϕj(x)·ϕi(x)dx,(C¯)ij=k=13Twhk(x)kϕj(x)ϕi(x)dx,(N¯kl)ij=Tlwhk(x)ϕj(x)ϕi(x)dx,(M¯)ij=αTϕj(x)ϕi(x)dx,(f¯k)i=Tfk(x)ϕi(x)dx+ρl=13Twhl(x)lwhk(x)ϕi(x)dx
for 1i,j5, where whk are the components of wh, and
(8)
(B¯k)ij=Tkϕj(x)ϕi(x)dx
for 1i4 and 1j5.

Recall that wh represents the velocity field from the previous iteration of discretized analogy of (4). In the discrete case we know only nodal values of wh at the vertices xi of T, 1i4. In (C¯)ij, (N¯kl)ij, and (f¯k)i we will use constant approximations of whk and lwhk on T. The approximation of whk is defined by the arithmetic mean:

wk=14(wk1+wk2+wk3+wk4),1k3,
where wki=whk(xi), 1i4. The approximation of lwhk denoted by δlwk will be taken by their value at x1: δlwk=lwhk(x1). The relations between the gradient whk(x1)=(δ1wk,δ2wk,δ3wk) and the derivatives in the directions xi+1x1 lead to three linear systems of the form:
(9)
whk(x1)·(xi+1x1)=whk(xi+1)whk(x1),1i3,
from which δlwk, 1k,l3, can be computed. Note that the solutions are invariant with respect to (local) renumbering of the vertices xi of T. Using the same symbols for the respective approximations of (C¯)ij, (N¯kl)ij, and (f¯k)i, we get:
(10)
(C¯)ij=k=13wkTkϕj(x)ϕi(x)dx,(N¯kl)ij=δlwkTϕj(x)ϕi(x)dx,
(11)
(f¯k)i=Tfk(x)ϕi(x)dx+ρl=13wlδlwkTϕi(x)dx.
Hence, C¯ is the linear combination of B¯k and N¯kl are multiples of M¯ so that the Ossen and Newton matrices may be assembled from the matrices of the Stokes problem.

The element matrices and vectors will be divided on the non-bubble and bubble components:

R¯=RrrωR,C¯=CcUcLωC,N¯kl=NklnklnklωNkl,M¯=M(α)m(α)m(α)ωM(α),B¯k=(Bk,Bkb),f¯k=fkfkb,u¯k=ukukb,
where R,M(α),Bk,C,NklR4×4, r,m(α),Bkb,cU,cL,nkl,fk,ukR4, and ωR, ωM(α),ωC,ωNkl,fkb,ukbR for 1k,l3.

4Element Matrices

The tetrahedron TTh is represented by four vertices xi=(xi,yi,zi), 1i4. We denote the entries of xixj by x[ij]=xixj, y[ij]=yiyj, z[ij]=zizj and wk[ij]=wkiwkj, ij.

4.1Derivatives of the Basis Functions

The constant values of the basis functions derivatives on T are given by the following formulas (see Arzt, 2019):

1ϕ11ϕ21ϕ31ϕ4=1det(X)x,2ϕ12ϕ22ϕ32ϕ4=1det(X)y,3ϕ13ϕ23ϕ33ϕ4=1det(X)z,
where
x=y[42]z[32]y[32]z[42]y[31]z[41]y[41]z[31]y[41]z[21]y[21]z[41]y[21]z[31]y[31]z[21],y=x[32]z[42]x[42]z[32]x[41]z[31]x[31]z[41]x[21]z[41]x[41]z[21]x[31]z[21]x[21]z[31],z=x[42]y[32]x[32]y[42]x[31]y[41]x[41]y[31]x[41]y[21]x[21]y[41]x[21]y[31]x[31]y[21],X=x[21]x[31]x[41]y[21]y[31]y[41]z[21]z[31]z[41].
The volume |T| of the tetrahedron T can be computed by:
|T|=16|det(X)|=16|x[21]x2+x[31]x3+x[41]x4|.

4.2Approximation of the Velocity Derivatives

The linear systems (9) read as follows:

x[21]y[21]z[21]x[31]y[31]z[31]x[41]y[41]z[41]δ1wkδ2wkδ3wk=wk[21]wk[31]wk[41]for1k3.
Using Cramer’s rule, we get:
δ1wk=(wk[21]x2+wk[31]x3+wk[41]x4)/det(X),δ2wk=(wk[21]y2+wk[31]y3+wk[41]y4)/det(X),δ3wk=(wk[21]z2+wk[31]z3+wk[41]z4)/det(X)
for 1k3.

4.3Stokes Matrices

The following formulas are adopted from Koko (2019). For the diffusion matrix we get:

R=ν36|T|(xx+yy+zz),ωR=2048ν8505|T|(x12+y12+z12x2(x3+x4)x3x4y2(y3+y4)y3y4ωR=z2(z3+z4)z3z4),
and r=0. For the mass matrix we get:
M(α)=α|T|202111121111211112,m(α)=8α|T|1051111,ωM(α)=8192α|T|51975.
For the divergence matrix we get:
B1=S24xxxx,B2=S24yyyy,B3=S24zzzz,B1b=16S315x,B2b=16S315y,B3b=16S315z,
where S=sgn(det(X)). The signum-function is due to |det(X)|/det(X).

4.4Oseen and Newton Convection Matrices

Using Green’s formula and ϕb=ϕ5=0 on T, we obtain:

(C¯)i,5=k=13wkTkϕb(x)ϕi(x)dx=k=13wkTkϕi(x)ϕb(x)dx=(C¯)5,i
for 1i4. It implies cL=cU. Further:
ωC=(C¯)5,5=k=13wkTkϕb(x)ϕb(x)dx=k=13wki=14kϕi(x)Ii=0,
since i=04kϕi(x)=0 on T and Ii=44Tjiϕj(x)ϕb(x)dx does not depend on i. Comparing the first equality in (10) with (8), we can write C=k=13wkBk and cU=k=13wkBkb. For the Oseen convection matrix we get:
C=S24w1xxxx+w2yyyy+w3zzzz,cU=16S315(w1x+w2y+w2z),
cL=cU, and ωC=0.

Comparing N¯ in (10) with M¯ in (7), we get for the Newton convection matrices:

Nkl=M(δlwk),nkl=m(δlwk),ωNkl=ωM(δlwk)for1k,l3.

4.5Right-Hand Side Vectors

The functions fk are approximated on T by the mean values. Denoting:

f˜k=14i=14fk(xi)+ρl=13wlδlwk,
we get from (11):
fk=|T|4f˜k1111,fkb=32|T|105f˜kfor1k3.

4.6Bubble Components Elimination

Let us permute the system (6) as follows:

(12)
A11A12A13ZU,11ZU,12ZU,13B1A21A22A23ZU,21ZU,22ZU,23B2A31A32A33ZU,31ZU,32ZU,33B3ZL,11TZL,21ZL,31ω11ω12ω13B1bZL,12ZL,22ZL,32ω21ω22ω23B2bZL,13ZL,23ZL,33ω31ω32ω33B3bB1B2B3B1bB2bB3b0u1u2u3u1bu2bu3bp=f1f2f3f1bf2bf3b0,
where Akk=R+C+ρNkk+M(α), ZU,kk=cU+ρnkk+m(α), ZL,kk=cU+ρnkk+m(α), ωkk=ωR+ρωNkk+ωM(α), and Akl=ρNkl, ZU,kl=ZL,kl=ρnkl, ωkl=ρωNkl, kl, for 1k,l3. For the Oseen linearization (ρ=0), the blocks (1,1), (1,2), (2,1), (2,2) (denoted by lines) of the matrix in (12) are block diagonal. For this simpler case the bubble component elimination is described in Koko (2019). Let us consider the Newton linearization (ρ0). The middle block equations in (12) gives:
(13)
u1bu2bu3b=W1f1bf2bf3bZL,11ZL,21ZL,31ZL,12ZL,22ZL,32ZL,13ZL,23ZL,33u1u2u3B1bB2bB3bp,
where W=(ωij)R3×3. The inverse W1=(ωˆij) is given by Cramer’s rule:
det(W)=ω11ω22ω33+ω12ω23ω31+ω13ω21ω32ω13ω22ω31det(W)=ω12ω21ω33ω11ω23ω32,ωˆ11=(ω22ω33ω23ω32)/det(W),ωˆ12=(ω13ω32ω12ω33)/det(W),ωˆ13=(ω12ω23ω13ω22)/det(W),ωˆ21=(ω23ω31ω21ω33)/det(W),ωˆ22=(ω11ω33ω13ω31)/det(W),ωˆ23=(ω13ω21ω11ω23)/det(W),ωˆ31=(ω21ω32ω22ω31)/det(W),ωˆ32=(ω12ω31ω11ω32)/det(W),ωˆ33=(ω11ω22ω12ω21)/det(W).
Applying (13) in the first and the last block equations of (12) we arrive at:
(14)
Aˆ11Aˆ12Aˆ13BˆU,1Aˆ21Aˆ22Aˆ23BˆU,2Aˆ31Aˆ32Aˆ33BˆU,3BˆL,1TBˆL,2BˆL,3Eˆu1u2u3p=fˆ1fˆ2fˆ3gˆ,
where
Aˆkl=Aklj=13ZˆU,kjZL,lj,fˆk=fkj=13ZˆU,k1fkb,BˆU,k=Bkj=13BjbZˆU,kj,BˆL,k=Bkj=13BˆjbZL,kj,Eˆ=j=13BˆbjBbj,gˆ=j=13Bˆbjfbj,ZˆU,kl=j=13ZU,kjωˆjl,Bˆkb=j=13Bjbωˆjk
for 1k,l3. These formulas can be easily vectorized, since they are given by linear combinations of vectors or by outer products of vectors.

5Vectorized Coding

The local matrices and vectors Aˆkl, BˆU,k, BˆL,k, Eˆ, fˆk, gˆ on TjTh, 1jnt, derived in (14), are associated with the global ones Akl, BU,k, BL,k, E, fk, g, 1k,l3, respectively, through the ordered index-set J=[j1,j2,j3,j4] determined by the vertices xj1,xj2,xj3,xj4 of Tj. For instance:

(Akl)JJ:=(Akl)JJ+Aˆkl,(fk)J:=(fk)J+fˆk,
and analogously for other matrices and vectors. An usual assembly procedure uses three loops: the outer loop over all Tj and two (one in case of vectors) inner loops over indices of J. In vectorized coding we interchange positions of the loops so that the short loops over J are the outer ones while the inner loop over all Tj is replaced by appropriate vectorized operations. To demonstrate this process, we show in an abstract setting how to vectorize outer products of vectors appearing in Aˆkl, BˆU,k, BˆL,k, and Eˆ.

We assume that for TjTh, 1jnt, the indices of J are stored in the j-th row of the array t of the size nt×4 so that:

J=t(j,1:4).
Let vj,wjR4 be column vectors that define the local contribution to the (abstract) global matrix A corresponding to Tj so that:
AJJ:=AJJ+vjwj.
We assume also that vj,wj are stored in the j-th rows of the arrays v, w of the size nt×4 so that:
vj=v(j,1:4),wj=w(j,1:4),
respectively. Using three loops we update A (the MATLAB representation of A) as follows:

infor543_g001.jpg

The same effect can be achieved by the following vectorized code:

infor543_g002.jpg

Here, the dot-product “.*” and the addition “+” are the vectorized MATLAB operations that are performed on the low level and, therefore, are fast. For more details, we refer to Koko (2019) and to our free available codes (Kučera et al., 2023).

6Algebraic Iterative Scheme

The algebraic version of the iterative scheme (4) reads as follows:

(15)
Given(u(0),p(0))R3np×Rnp.Forκ1,solveA(u(κ1))BU(u(κ1))BL(u(κ1))E(u(κ1))u(κ)p(κ)=f(u(κ1))g(u(κ1)),
where A=A(u(κ1)), BU=BU(u(κ1)), BL=BL(u(κ1)), and f=f(u(κ1)) are composed from the blocks Akl, BU,k, BL,k, and fk, 1k,ld, respectively. Dependency of all blocks on the velocity field from the previous iteration u(κ1) is due to the bubble component elimination (on the element level). Other artefacts of this elimination are the presence of the blocks E=E(u(κ1)), g=g(u(κ1)) and the fact that BUBL. The linear system in (15) is naturally adapted by the homogeneous Dirichlet boundary data of (1)3. We initialize (15) by u(0)=0, p(0)=0.

Our implementation of (15) is based on an inexact dual strategy. In each step of (15) we solve iteratively the Schur complement linear systems:

(16)
Sp(κ)=d,
where S=BLA1BU+E, d=BLA1fg. The precision of p(κ) computed from (16) is driven adaptively with respect to the precision achieved in the outer iterations of (15). One iteration in (15) requires the following points:
  • Assembling A, BU, BL, E, f, and g by a vectorized code.

  • Computing LU-factorization of A with the complete pivoting that results in the lower, upper triangular matrices L, U, respectively, and in two permutation matrices P, Q such that PAQ=LU.

  • Assembling d=BL(Q(U1(L1(Pf))))g and activating the procedure for matrix-vector products Sq based on Sq=BL(Q(U1(L1(P(BUq)))))+Eq.

  • Solving (16) using the BiCGSTAB method (Elman et al., 2014) which applies the matrix-vector procedure from the previous point. The BiCGSTAB iterations start from the initial approximation p(κ1) and end if the adaptive inner terminating tolerance is achieved:

    tolBiCGSTAB(κ)={rtol×err(κ1),cfact×tolBiCGSTAB(κ1)},
    where 0<rtol<1, 0<cfact<1, and err(κ1) is the outer terminating criterion (err(0)=1 and tolBiCGSTAB(0)=rtol/cfact). The mass matrix is the preconditioner in the BiCGSTAB method (Elman et al., 2014).

  • Computing u(κ)=Q(U1(L1(P(fBUp(κ))))).

  • Stop if the outer terminating criterion is sufficiently small:

    err(κ):=(u(κ),p(κ))(u(κ1),p(κ1))(u(κ),p(κ))+1ε,
    otherwise perform the next outer iteration with κ:=κ+1.

7Numerical Experiments

We consider three test problems defined on the unit cube with known analytic solutions. They are 3D extensions of the well-known test problems of computational fluid dynamics in 2D. First, we examine time requirements of the assembly operations for different discretizations. Then we investigate convergence properties of the finite element approximation. All computations are done in MATLAB R2021a on supercomputer Karolina (IT4Innovations, 2023). Meshes are generated by free available iso2mesh generator (Fang, 2018). The structured partition Th of Ω=[0,1]3 is defined so that Ω is first divided onto n3 cubes of the same size and, then, each of them is divided onto five tetrahedra. In this case we have the mesh norm h=3/n. The mesh norms for unstructured partitions are computed as maxima over all tetrahedra. See Fig. 1 for examples of structured and unstructured meshes. Recall that np is the number of the finite element nodes and nt is the number of the tetrahedra. The iterative scheme (15) uses the following parameters: ε=105 and rtol=cfact=0.9.

Fig. 1

Structured (left) and unstructured (right) mesh for test problems.

Structured (left) and unstructured (right) mesh for test problems.

7.1Test Problem #1

We consider the functions u=(u1,u2,u3) and p in Ω=[0,1]3 as follows:

u1(x,y,z)=4(1cos(2πx))sin(2πy)z(1z),u2(x,y,z)=4sin(2πx)(cos(2πy)1)z(1z),u3(x,y,z)=0,p(x,y,z)=2π(cos(2πx)+2cos(2πy)cos(2πz)).
These functions solve the problem (1) with f:=νΔu+u·u+αu+p. The problem corresponds to a flow formed by a vortex rotating around the central axis of the cube so that the velocity field is parallel to the xy plane with the maximal magnitude in the central plan z=0.5.

7.2Test Problem #2

We consider the functions u=(u1,u2,u3) and p in Ω=[0,1]3 as follows:

u1(x,y,z)=(1cos(2πx))sin(2πy)sin(2πz),u2(x,y,z)=2sin(2πx)(cos(2πy)1)sin(2πz),u3(x,y,z)=sin(2πx)sin(2πy)(1cos(2πz)),p(x,y,z)=2π(cos(2πx)+2cos(2πy)cos(2πz)).
These functions solve the problem (1) with f defined as in the problem #1. The flow is a vortex of a fusiform character around the central axis of the cube, i.e. rising and falling spirals above the xy plane.

7.3Test Problem #3

We consider the functions u=(u1,u2,u3) and p in Ω=[0,1]3 as follows:

u1(x,y,z)=x2(1x)22y(1y)(12y)2z(1z)(12z),u2(x,y,z)=2y2(1y)22x(1x)(2x1)2z(1z)(12z),u3(x,y,z)=z2(1z)22y(1y)(12y)2x(1x)(12x),p(x,y,z)=x(1x)y(1y)(1z).
These functions solve the problem (1) with f defined as in the problem #1. The flow character is a vortex of the form as in the problem #2 but now described by the polynomial functions instead of the trigonometric ones.

7.4Example 1: Time Demands

In Tables 12 we denote by A_time(V), A_time(L) the CPU time for assembly operations when the vectorized code or the loop over tetrahedra is used, respectively. S_time is the CPU time for solving the respective linear system.

In Table 1 we report assembly time of the vectorized operations and the loop over tetrahedra on meshes of the cube Ω=[0,1]3 computed by:

ratio_1=A_time(L)/A_time(V).
One can see that the loop over tetrahedra is extremely unefficient for large scale problems. These tests are computed for the Oseen linearization with ν=0.5 and α=1.

Table 1

Loop over tetrahedra versus vectorized code.

np72921974913926115625297915065391125
nt25608640204804000069120135000233280425920
A_time(V)8.1e−032.8e−025.9e−021.3e−012.4e−016.4e−011.2e+002.1e+00
A_time(L)2.4e−011.9e+002.0e+011.4e+024.7e+021.9e+036.0e+032.0e+04
ratio_129.666.8333.61079.61952.33020.95079.39299.0
Table 2

Loop over tetrahedra: ν=0.5.

Oseen linearizationNewton linearization
np491315625491315625
nt20480691202048069120
A_time(L)2.17e+013.22e+022.56e+015.05e+02
S_time1.57e−011.26e+004.28e−013.41e+00
ratio_2(L)0.9930.9960.9840.993

Tables 23 are obtained by solving the problem #1. We report the ratio of the assembly operations per iteration of the scheme (15) computed by:

ratio_2(X)=A_time(X)/(A_time(X)+S_time),X=L,V.
Table 2 is computed by the loop over tetrahedra. It shows that about 99% computational time per iteration takes assembling of matrices. Similar behaviour was observed by Koko (2016) for linear elasticity problems.

Table 3 is computed by the vectorized codes. It is seen that the relative efficiency of the vectorized assembly operations is higher for large scale problems. Comparing S_time we see that the first order Oseen linearization is faster than the second order Newton linearization. A heuristic explanation of this fact consists in more complicated structure of the Newton matrices.

Table 3

Vectorized code: ν=0.5;0.05.

Oseen linearizationNewton linearization
np103823166375250047103823166375250047
nt48668078732011916404866807873201191640
A_time(V)7.60e+001.52e+012.50e+016.77e+001.46e+012.18e+01
S_time3.45e+017.53e+011.58e+029.21e+012.30e+025.13e+02
ratio_2(V)0.1800.1680.1370.0690.0600.041
A_time(V)7.66e+001.50e+012.19e+017.19e+001.52e+012.28e+01
S_time5.47e+011.12e+022.04e+021.21e+022.74e+026.78e+02
ratio_2(V)0.1230.1180.0970.0560.0520.032

7.5Example 2: Convergence Rate on Structured Meshes

In this example, we investigate experimentally convergence rates of finite element approximations computed by our vectorized codes on structured meshes. The following optimal convergence result is proved in Boffi et al. (2013) for the Stokes problem and the MINI element:

(17)
uuhH1+pphL2Ch(uH2+pH1),
where C>0 does not dependent on h. Cioncolini and Boffi (2019) and again Cioncolini and Boffi (2022) studied experimentally convergence rates of the Stokes problem in 2D and 3D, respectively. Note that the bound (17) is valid also for the Navier–Stokes problem as it follows from Girault and Raviart (1986) but under more complicated assumptions (pp. 101, Theorem 4.1). The formula (17) indicates that uuhH1 as well as pphL2 converge linearly. In Tables 49, we compute convergence rates for the test problems #1– #3 with ν=0.5,0.05 and α=0. From the obtained results we can conclude that the experimental convergence rates are close to the following ones: uuhL2=O(h2), pphL2=O(h3/2), and uuhH1=O(h) and, in some cases, they are higher. Especially, the convergence rate of the pressure component shows the superconvergence property. This result was theoretically proved for the pure Stokes problem in 2D by Eichel et al. (2011). It was experimentally confirmed in above mentioned papers of Cioncolini and Boffi but, again, for the Stokes problem. Our observation for the Navier–Stokes problem is new.

Table 4

Convergence rates for the test problem #1 with ν=0.5, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖221.749e−22.846e−13.507e−1
5.774e−2‖309.410e−32.001.785e−11.502.469e−11.13
4.558e−2‖385.856e−32.011.254e−11.501.909e−11.09
3.608e−2‖463.990e−32.019.422e−21.441.559e−11.06
3.093e−2‖542.891e−32.017.414e−21.501.318e−11.05
2.794e−2‖622.190e−32.016.030e−21.501.142e−11.04
Table 5

Convergence rates for the test problem #1 with ν=0.05, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖221.583e−26.059e−23.515e−1
5.774e−2‖308.452e−32.023.379e−21.882.467e−11.14
4.558e−2‖385.239e−32.022.180e−21.851.907e−11.09
3.608e−2‖463.560e−32.021.538e−21.831.557e−11.06
3.093e−2‖542.575e−32.021.151e−21.811.316e−11.05
2.794e−2‖621.949e−32.028.996e−31.791.141e−11.04
Table 6

Convergence rates for the test problem #2 with ν=0.5, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖223.998e−27.449e−18.546e−1
5.774e−2‖302.184e−21.954.726e−11.475.974e−11.15
4.558e−2‖381.370e−21.973.336e−11.474.604e−11.10
3.608e−2‖469.383e−31.982.514e−11.483.751e−11.07
3.093e−2‖546.822e−31.991.981e−11.493.168e−11.05
2.794e−2‖625.182e−31.991.613e−11.492.743e−11.04
Table 7

Convergence rates for the test problem #2 with ν=0.05, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖223.505e−21.826e−18.442e−1
5.774e−2‖301.898e−21.981.024e−11.875.927e−11.14
4.558e−2‖381.187e−21.996.576e−21.874.580e−11.09
3.608e−2‖468.117e−31.994.556e−21.923.738e−11.06
3.093e−2‖545.898e−31.993.435e−21.763.160e−11.05
2.794e−2‖624.481e−31.992.668e−21.832.738e−11.04
Table 8

Convergence rates for the test problem #3 with ν=0.5, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖223.648e−51.386e−31.114e−3
5.774e−2‖301.990e−51.958.827e−41.457.985e−41.07
4.558e−2‖381.248e−51.986.257e−41.466.233e−41.15
3.608e−2‖468.539e−61.994.733e−41.465.115e−41.03
3.093e−2‖546.205e−61.993.740e−41.474.339e−41.03
2.794e−2‖624.711e−61.993.052e−41.473.769e−41.02
Table 9

Convergence rates for the test problem #3 with ν=0.05, structured mesh.

hnuuhL2RatepphL2RateuuhH1Rate
7.873e−2‖223.649e−51.409e−41.114e−3
5.774e−2‖301.990e−51.958.934e−51.477.986e−41.07
4.558e−2‖381.248e−51.986.309e−51.476.233e−41.05
3.608e−2‖468.539e−61.994.773e−51.465.115e−41.03
3.093e−2‖546.205e−61.993.769e−51.474.339e−41.03
2.794e−2‖624.711e−61.993.073e−51.483.769e−41.02

7.6Example 3: Convergence Rate on Unstructured Meshes

In Tables 1011, we present results analogous to Tables 67 computed for the test problem #2 but now on unstructured meshes. Since the convergence rates are scattered, we characterize them by the geometrical mean in the rows labelled G_mean. Surprisingly, the conference rates are in many cases better than on the structured meshes.

Table 10

Convergence rates for the test problem #2 with ν=0.5, unstructured mesh.

huuhL2RatepphL2RateuuhH1Rate
9.4515e−22.275e−24.502e−13.572e−1
7.6129e−21.435e−22.133.044e−11.812.706e−11.28
6.2010e−28.915e−32.322.297e−11.372.060e−11.33
4.8458e−25.542e−31.931.677e−11.281.595e−11.04
3.8869e−23.481e−32.111.210e−11.481.242e−11.16
3.1102e−22.177e−32.119.098e−21.289.756e−21.08
G_mean2.121.431.17
Table 11

Convergence rates for the test problem #2 with ν=0.05, unstructured mesh.

huuhL2RatepphL2RateuuhH1Rate
9.4515e−22.275e−24.502e−13.572e−1
7.6129e−21.435e−22.133.044e−11.812.706e−11.28
6.2010e−28.915e−32.322.297e−11.372.060e−11.33
4.8458e−25.542e−31.931.677e−11.281.595e−11.04
3.8869e−23.481e−32.111.210e−11.481.242e−11.14
3.1102e−22.177e−32.119.098e−21.289.756e−21.08
G_mean2.111.431.17

8Conlusions and Comments

In this paper, we present main ideas for vectorized coding of matrices and vectors describing mixed finite element approximation based on the MINI element for the Navier–Stokes system in 3D. It is shown that the vectorized operations are considerably faster than the loop over tetrahedra. This allows to experiment with this problem in the user-friendly Matlab environment. Note that our codes are freely available (Kučera et al., 2023) and include also 2D case.

The dual implementation of the basic iterative schemes in Section 6 is a starting point for more sophisticated problem with the stick-slip boundary condition, describing hydrophobia effect, e.g. in which the dual formulation is a natural tool (see Haslinger et al., 2021). This scheme works well for small Reynold’s numbers.

Finally, we should point out that the results of our experiments are in agreement with the theoretical convergence rates of the finite element approximation. Moreover, it extends observations of other authors on a superconvergence rate of the pressure component. It confirms, among others, correctness of our codes.

Acknowledgements

This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90140) and the internal VSB-TUO SGS-project.

References

1 

Arnold, D.N., Brezzi, F., Fortin, M. ((1984) ). A stable finite element for the Stokes equations. Calcolo, 21: , 337–344.

2 

Arzt, V. (2019). Finite Element Meshes and Assembling of Stiffness Matrices. Master’s thesis, VŠB-TU Ostrava, Czech Republic (in Czech). https://dspace.vsb.cz/bitstream/handle/10084/137486/ARZ0009_USP_B3968_3901R076_2019.pdf?sequence=1&isAllowed=y [online].

3 

Boffi, D., Brezzi, F., Fortin, M. ((2013) ). Mixed Finite Element Methods and Applications, Springer Series in Computational Mathematics. Springer Verlag, Heidelberg, New York, Dordrecht, London.

4 

Brezzi, F., Fortin, M. ((1991) ). Mixed and Hybrid Finite Element Methods, Springer Series in Computational Mathematics. Springer Verlag, Berlin.

5 

Cioncolini, A., Boffi, D. ((2019) ). The MINI mixed finite element for the Stokes problem: an experimental investigation. Computers and Mathematics with Applications, 77: (9), 2432–2446.

6 

Cioncolini, A., Boffi, D. ((2022) ). Superconvergence of the MINI mixed finite element discretization of the Stokes problem: an experimental study in 3D. Finite Elements in Analysis and Design, 201: , 103706.

7 

Eichel, H., Tobiska, L., Xie, H. ((2011) ). Supercloseness and superconvergence of stabilized low-order finite element discretizations of the Stokes problem. Mathematics of Computations, 80: , 697–722.

8 

Elman, H.C., Silvester, D.J., Wathen, A.J. ((2014) ). Finite Elements and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics, Numerical Mathematics and Scientific Computation. Oxford University Press, Oxford.

9 

Fang, Q. (2018). Iso2mesh: a 3D surface and volumetric mesh generator for MATLAB/Octave. http://iso2mesh.sourceforge.net [online].

10 

Girault, V., Raviart, P.A. ((1986) ). Finite Element Methods for Navier-Stokes Equations, Springer Series in Computational Mathematics. Springer Verlag, Berlin.

11 

Girault, V., Wheeler, M.F. ((2008) ). Discontinuous Galerkin methods. In: Glowinski, R., Neittaanmäki, P. (Eds.), Partial Differential Equations. Computational Methods in Applied Sciences, Vol. 16: . Springer, Dordrecht, pp. 2–26.

12 

Griebel, M., Neunhoeffer, T., Regler, H. ((1998) ). Algebraic multigrid methods for the solution of the Navier–Stokes equations in complicated geometries. International Journal for Numerical Methods in Fluids, 26: (3), 281–301.

13 

Haslinger, J., Kučera, R., Sassi, T., Šátek, V. ((2021) ). Dual strategies for solving the Stokes problem with stick-slip boundary conditions in 3D. Mathematics and Computers in Simulation, 189: , 191–206.

14 

Henriksen, M.O., Holmen, J. ((2002) ). Algebraic splitting for incompressible Navier–Stokes equations. Journal of Computational Physics, 175: (2), 438–453.

15 

Hoanga, L.T., Martinez, V.R. ((2018) ). Asymptotic expansion for solutions of the Navier–Stokes equations with non-potential body forces. Journal of Mathematical Analysis and Applications, 462: , 84–113.

16 

IT4Innovations (2023). https://www.it4i.cz/en/infrastructure/karolina [online].

17 

Koko, J. ((2016) ). Fast MATLAB assembly of FEM matrices in 2D and 3D using cell array approach. International Journal of Modeling, Simulation, and Scientific Computing, 7: (2), 1650010.

18 

Koko, J. ((2019) ). Efficient MATLAB codes for the 2D/3D Stokes equation with the mini-element. Informatica, 30: (2), 243–268.

19 

Kučera, R., Arzt, V., Koko, J. (2023). Free available vectorized codes. https://homel.vsb.cz/~kuc14/programs/ReferenceAssembling.zip [online].

20 

Loghin, D., Wathen, A.J. ((2002) ). Schur complement preconditioners for the Navier–Stokes equations. International Journal for Numerical Methods in Fluids, 40: (3–4), 403–412.

21 

Panasenko, G.P. ((1998) ). Asymptotic expansion of the solution of Navier-Stokes equation in a tube structure. Comptes Rendus De L Academie Des Sciences Serie Ii Fascicule B-mecanique Physique Astronomie, 326: (12), 867–872.

22 

Pernice, M., Tocci, M.D. ((2001) ). A multigrid-preconditioned Newton–Krylov method for the incompressible Navier–Stokes equations. SIAM Journal on Scientific Computing, 23: , 398–418.

23 

Rahman, T., Valdman, J. ((2015) ). Fast MATLAB assembly of FEM matrices in 2D and 3D: edge elements. Applied Mathematics and Computation, 267: , 252–263.

24 

Rønquist, E.M. ((1996) ). A domain decomposition solver for the steady Navier-Stokes equations. In: Ilin, A., Scott, R. (Eds.), Proceedings of the 3rd International Conference on Spectral and High-Order Methods. HJM, Houston, pp. 469–485.

25 

Viguerie, A., Veneziani, A. ((2018) ). Algebraic splitting methods for the steady incompressible Navier–Stokes equations at moderate Reynolds numbers. Computer Methods in Applied Mechanics and Engineering, 330: , 271–291.