FEBio on Apple Silicon

FEBio on Apple Silicon

You are here:
< Back

Overview

Apple introduced a line of chips (e.g., M1, M2, M3) based on the arm64 architecture, to supersede their prior reliance on Intel x86 architecture. Currently, FEBio and FEBioStudio executables for Mac computers continue to be compiled using the x86 architecture, and they run smoothly on Apple Silicon machines using Apple’s Rosetta 2 dynamic binary translator. The reason for compiling FEBio and FEBioStudio on x86 architecture for Mac is that Intel’s MKL library, which includes the Pardiso linear solver for sparse systems (among the fastest linear solvers available to FEBio), is only available for x86 architecture.

However, the original developer of the Pardiso linear solver, Dr. Olaf Schenk (who licensed the Pardiso solver to Intel many years ago), has released updated versions of Panua Pardiso that run on arm64 architecture (including Linux and Mac). The licenses for Pardiso under this model are not all free, therefore we (the FEBio developers) plan to continue providing Mac versions of FEBio and FEBioStudio on x86 architecture running Intel’s MKL Pardiso, until Apple stops supporting Rosetta 2.

For now, we are able to compile FEBio and FEBioStudio on arm64 architecture for Mac and link it to Panua’s Pardiso library. Here, we report benchmark results for various runs on the FEBio finite element software, comparing Panua’s Pardiso 8.2 library on arm64 to the Intel oneMKL Pardiso (2023.2) running on the same Apple Silicon M2 machine using Rosetta 2. The purpose of this comparison is to assist users with deciding whether to license and download the Panua Pardiso library for their own applications.

Benchmark Problems

The benchmark problems employed here are generally available from the FEBioStudio Model Repository, under the Tutorials heading. In some cases the models were modified to employ a different quasi-Newton method, or to switch from symmetric to non-symmetric stiffness matrix format, as indicated in the table below, or to run for fewer time steps (Problem 3 in the tables below was run for 50 steps instead of 500).

The benchmark problems span the range from small problems (13.6 K equations for CupDrawingNUMISHEET93) to large problems (3.1 M equations for PressureOnMRblock10M). They include symmetric and non-symmetric stiffness matrices, and they employ full Newton updates as well as BFGS and Broyden quasi-Newton updates.

Results are presented for representative problems that use the fluid, fluid-FSI and solid (structural mechanics) modules. When the solver is set up to perform full Newton iterations, the stiffness matrix is factorized and solved at each iteration of each time point, therefore the number of stiffness reformations matches the number of equilibrium iterations. In contrast, when using quasi-Newton schemes such as BFGS (for symmetric stiffness matrix) and Broyden (for non-symmetric stiffness matrix), the number of stiffness reformations is much smaller than the number of equilibrium iterations.

For the purpose of assessing the performance of the linear solver, one should focus on the Solver Time and number of Stiffness Reformations (the ratio of these two entries provides the average solver time for each stiffness reformation). For the purpose of assessing the overall performance of the finite element software, one may focus on the Total Elapsed Time, which accounts for the additional time required to perform quasi-Newton updates and saving results to the plot file.

No.Problem (Panua Pardiso, 24 threads)ModuleMatrixQuasi-NewtonEquationsStiffness ReformationsEquilibrium iterationsSolver Time (s)Total Elapsed Time (s)
1UnsteadyFlowPastCylinderFluidnon-symmfull Newton755982933293323062429
2
UnsteadyFlowPastCylinder
Fluidnon-symmBroyden75598855715126389
3AortaV3Fluidnon-symmBroyden5182934237101303
4Flow Past FlapFluid-FSInon-symmBroyden25312126669654207
5SqueezeFilmLubricationNewtonian2DFluid-FSInon-symmBroyden
47695
252120324461118
6CupDrawingNUMISHEET93Solidnon-symmfull Newton13630420742076262563
7L BRACKET FILLET Very FineSolidsymmetricBFGS562989141820
8PressureOnMRblockSolidsymmetricBFGS669780142527
9PressureOnMRblockSolidnon-symmBroyden669780153952
10PressureOnMRblock10MSolidsymmetricBFGS306030014399407
Panua Pardiso solver results, using 24 threads on Mac Studio 2023, Apple M2 Ultra Chip, 192 GB Memory, macOS Sequoia 15.1, with native arm64 architecture.

Using the same computer (Mac Studio 2023, Apple M2 Ultra Chip, 192 GB Memory, macOS Sequoia 15.1), results were compared between the Panua Pardiso linear solver, running natively on the arm64 architecture, and the Intel MKL Pardiso running under Rosetta 2 with simulated x86 architecture.

No.Problem (Intel MKL Pardiso, 24 threads)ModuleMatrixQuasi-NewtonEquationsStiffness ReformationsEquilibrium iterationsSolver Time (s)Total Elapsed Time (s)
1UnsteadyFlowPastCylinderFluidnon-symmfull Newton755982933293328253088
2UnsteadyFlowPastCylinderFluidnon-symmBroyden7559885571588322
3AortaV3Fluidnon-symmBroyden518293423781202
4Flow Past FlapFluid-FSInon-symmBroyden25312126774756216
5SqueezeFilmLubricationNewtonian2DFluid-FSInon-symmBroyden47695252123184371189
6CupDrawingNUMISHEET93Solidnon-symmfull Newton13630420742779896787
7L BRACKET FILLET Very FineSolidsymmetricBFGS562989142830
8PressureOnMRblockSolidsymmetricBFGS669780143943
9PressureOnMRblockSolidnon-symmBroyden669780156066
10PressureOnMRblock10MSolidsymmetricBFGS306030014911926
Intel MKL Pardiso solver results, using 24 threads on Mac Studio 2023, Apple M2 Ultra Chip, 192 GB Memory, macOS Sequoia 15.1, using x86 architecture running on Apple’s Rosetta 2 dynamic binary translator.

The results of this comparison demonstrate that Panua’s Pardiso outperforms or matches Intel MKL’s Pardiso for seven of the ten problems tested in this benchmark analysis (Problems 1, 4, 6, 7, 8, 9, and 10). In general, when the number of equations is small or when a quasi-Newton method is employed (Problems 2, 3 and 5), Intel MKL’s Pardiso performs better than Panua’s Pardiso. When a very large number of equations is employed (more than three million in Problem 10), Panua’s Pardiso solver (399 s) is more than twice as fast as Intel’s MKL solver (911 s).

Performance with Increasing Number of Threads

To examine how these parallel sparse matrix solvers perform with increasing number of threads, we examined two problems more closely (Problems 2 and 8). Problem 2 uses Broyden’s quasi-Newton updates with only 85 stiffness reformations (out of 5715 equilibrium iterations) on a problem with a relatively small number of equations (~76 K). Problem 8 uses a single matrix reformation (out of 4 equilibrium iterations) on a problem with a relatively large number of equations (~0.67 M). Both problems were analyzed using 1, 2, 4, 8, 16 and 24 threads (set by modifying the environmental variable OMP_NUM_THREADS). Results are presented as Solver Time versus No. of Threads and Total Elapsed Time versus No. of Threads, on a semi-log graph.

The results for Problem 2 show that increasing the number of threads does not decrease the solver time monotonically: Above a certain number of threads, the computational overhead of shared-memory communications among threads becomes more expensive than the gain from distributing the solver tasks to multiple threads. Interestingly, this threshold occurs earlier for the Panua Pardiso solver, compared to the Intel MKL Pardiso solver.

However, when the number of equations is increased considerably, as for Problem 8, distributing the computational load among more threads does provide a monotonically decreasing solver time.

For Problem 8, the Panua Pardiso solver shows a considerable computational advantage against the Intel MKL Pardiso solver, even with a single processor. In fact, Panua Pardiso using only 2 threads matches the performance of Intel MKL Pardiso using 24 threads. This result highlights the considerable computational efficiency of the Panua Pardiso solver against the Intel MKL, on a Mac computer with Apple Silicon chip, where Panua Pardiso uses the native arm64 architecture while the Intel solver uses x86 chip architecture and Rosetta 2.

Conclusion for FEBio on Apple Silicon

Based on these results, it appears that Mac users who use Apple Silicon machines with arm64 architecture would benefit from installing an arm64 version of FEBio and FEBioStudio, though this would require them to acquire a Panua Pardiso license for that architecture.

We have not performed benchmark comparisons of Panua Pardiso on an Intel Mac versus a Silicon Mac because that would require using two different machines (from different generations) and the comparison would not be appropriate. There is also no benefit to comparing Intel’s MKL Pardiso to Panua’s Pardiso on an Intel Mac since Apple discontinued Intel chips in their computers.

Installation of Panua Pardiso on Mac

Instructions for installing the Panua Pardiso library can be found on the Panua website. (You can install Panua Pardiso for Intel x86_64 architecture or for Apple Silicon arm64 architecture, the procedures are the same.) To associate a unique node and user with a license, these instructions require users to run an executable called “get_fingerprint”, however the execution of this code will produce an error message from macOS:

To override this error one needs to open System Settings , navigate to Privacy & Security, and scroll down to the Security section:

Click on Allow Anyway, then rerun the ./get_fingerprint command. You will be prompted once more:

Click on Open Anyway and, after you enter your username and password on that computer, this will finally complete the execution of the get_fingerprint command. Continue with the instructions provided by the Panua website. After downloading the Panua-Pardiso folder, place it in a familiar location (e.g., your home directory). Also follow Panua’s instructions for creating a panua.lic file in your home directory, where you cut and paste the license key that should have been emailed to you.

Compiling FEBio with Panua Pardiso

Instructions for building FEBio on any machine (Windows, Linux, Mac) can be found on GitHub. The CMake script for compiling FEBio with the Panua Pardiso library should be modified to (a) check the box for USE_PDL (PDL stands for Pardiso Library and refers specifically to the Panua Pardiso), and (b) in the PDL_LIB field, specify the full path and filename to the libpardisoXXX.dylib (where XXX depends on the architecture you use), which you downloaded and placed in a familiar location (e.g., if you placed the downloaded folder in your home directory, the path to the library should include the subfolders that lead to the lib folder, such as /Users/myusername/panua-pardisoXXX/lib/libpardiso.dylib). Whether compiling for x86_64 or arm64 architectures, you should compile exclusively for that architecture (set CMAKE_OSX_ARCHITECTURE to x86_64 or arm64).

Assuming that you built FEBio successfully on your desired architecture, the first time that your run it (e.g., from a Terminal window) you will get the same warning as when you tried to run ./get_fingerprint. Therefore, follow the same instructions as in that section to run FEBio successfully. This will be needed only once.

When running FEBio with the Panua Pardiso linear solver, the febio.xml configuration file located in the bin/Release/ (or /bin/Debug) folder of your FEBio project should use <default_linear_solver type=“pardiso-project”/>. Alternatively, within FEBioStudio you can modify the Step of your analysis to set linear_solver to Pardiso-project. (Historically, the website URL for the Panua Pardiso used to be pardiso-project.org.)

Apple’s Accelerate Framework

All Mac versions of FEBio are compiled to use the sparse solvers from Apple’s Accelerate Framework. While Apple’s sparse solvers suite includes a number of iterative solvers, we have determined that only the direct solvers can solve all categories of problems (i.e., all modules) of the FEBio finite element analysis suite. Therefore, by default FEBio is configured to use the direct solvers suitable for symmetric and non-symmetric matrices (users can override these defaults and try the iterative solvers from the Accelerate framework as explained in the User’s Manual).

In our experience with FEBio, the direct solvers from Apple’s Accelerate Framework compare very poorly with either Intel’s MKL Pardiso or Panua’s Pardiso. FEBio problems running with the Accelerate framework (set the default_linear_solver to “accelerate”) are comparatively so slow that we only performed an analysis of the benchmark problems that have fewer than 100 K equations (Problems 2, 4, 5, and 6).

No.Problem (Apple Accelerate framework, 24 threads)ModuleMatrixQuasi-NewtonEquationsStiffness ReformationsEquilibrium iterationsSolver Time (s)Total Elapsed Time (s)
2UnsteadyFlowPastCylinderFluidnon-symmBroyden7559885571512423219
4Flow Past FlapFluid-FSInon-symmBroyden253121407181177600
5SqueezeFilmLubricationNewtonian2DFluid-FSInon-symmBroyden47695failsfailsfailsfails
6CupDrawingNUMISHEET93Solidnon-symmfull Newton136304207420739856086
Apple Accelerate framework direct solver results, using 24 threads on Mac Studio 2023, Apple M2 Ultra Chip, 192 GB Memory, macOS Sequoia 15.1, with native arm64 architecture.

Results show that Problem 2 requires 1242 s of solver time for the Accelerate framework (compared to 126 s for Panua Pardiso and 88 s for Intel MKL Pardiso), which is ten to fourteen times slower than Pardiso. Results are slightly less drastic with Problem 4 (177 s versus 54 s and 56 s). However, Problem 5 fails to run on the Accelerate direct solvers. Problem 6 runs four to six times slower on Accelerate than on Pardiso solvers (3985 s on Accelerate, versus 626 s on Panua Pardiso and 989 on Intel MKL Pardiso).

In summary, Apple’s direct solvers in its Accelerate framework are not well suited for finite element modeling in FEBio.

Was this article helpful?
5 out of 5 stars

3 ratings

5 Stars 100%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
How can we improve this article?
Please submit the reason for your vote so that we can improve the article.
Table of Contents
Go to Top