MultiPhysicsVault/.raw/AbaqusAnalysisUserGuide1/AbaqusAnalysisUserGuide1_057.md

<!-- source-page: 561 -->

For a single-user machine that is used to run both Abaqus/Standard and other applications simultaneously, setting a lower memory limit makes sense. If an analysis requires more than the specified value, you can decide to increase memory and continue the job. However, Abaqus/Standard will have to contend with the other applications for memory, which will impair the efficiency of both Abaqus/Standard and the other applications. If the other applications are interactive, the performance degradation could be problematic. In such a case you might decide to delay continuing the analysis until the machine can be dedicated to running Abaqus/Standard alone.

# Setting memory on multi-user machines

The guidelines for setting memory on a multi-user machine are very similar to those for single-user machines, except that a judgement must be made as to the amount of memory that each user on the machine can expect to have for a single analysis. A reasonable approach might be to divide the machine’s physical memory by the number of expected simultaneous jobs. Another sensible approach is to divide the machine’s physical memory by the total number of CPUs and then multiply by the number of CPUs used for the current job. If the memory requirement among the simultaneous jobs is not even, you might want to divide the machine’s physical memory in an uneven way accordingly. In general, to ensure acceptable performance, users on multi-user machines need to coordinate with each other to properly set the memory limit.

# Setting memory when using queues

Often queues have an associated memory limit, and determining the appropriate queue for a job requires some judgement. You are advised to run a data check analysis and select a queue based on the estimates provided in the printed output file. However, for large analyses even a data check analysis can require a large amount of memory. Choosing an appropriate queue for a data check analysis requires some experience with particular classes of problems. You may want to submit data check runs initially to queues with very large memory limits to get the necessary estimates. An appropriate queue can then be chosen to actually run the job. If the jobs are to be submitted to shared memory machines, it makes sense to set memory to about 90% of the memory limit for the queue. If the jobs are to be submitted to computer clusters, it is reasonable to use the default memory setting.

<!-- source-page: 562 -->

<!-- source-page: 563 -->

# 3.5 Parallel execution

• “Parallel execution: overview,” Section 3.5.1
• “Parallel execution in Abaqus/Standard,” Section 3.5.2
• “Parallel execution in Abaqus/Explicit,” Section 3.5.3
• “Parallel execution in Abaqus/CFD,” Section 3.5.4

<!-- source-page: 564 -->

<!-- source-page: 565 -->

# 3.5.1 PARALLEL EXECUTION: OVERVIEW

Products: Abaqus/Standard Abaqus/Explicit Abaqus/CFD

# References

• “Obtaining information,” Section 3.2.1
• “Using the Abaqus environment settings,” Section 3.3.1
• “Parallel execution in Abaqus/Standard,” Section 3.5.2
• “Parallel execution in Abaqus/Explicit,” Section 3.5.3
• “Parallel execution in Abaqus/CFD,” Section 3.5.4

# Overview

Parallel execution of Abaqus is implemented using two different schemes: threads and message passing. Threads are lightweight processes that can perform different tasks simultaneously within the same application. Threads can communicate relatively easily by sharing the same memory pool. Thread-based parallelization is readily available on all shared memory platforms.

Parallelization with message passing uses multiple analysis processes that communicate with each other via the Message Passing Interface (MPI). This requires MPI components to be installed. On the command line you can set mp\_mode=mpi to indicate that MPI components are available on the system. Alternatively, set mp\_mode=MPI in the environment file (see “Using the Abaqus environment settings,” Section 3.3.1). The MPI-based implementation is the default on all platforms where it is supported.

Abaqus/CFD is implemented using only the MPI mode and does not support threads. The parallel linear solvers used in Abaqus/CFD require that MPI components be installed even for single-processor calculations.

Output the local installation notes for your system to learn about local multiprocessing capabilities (see “Obtaining information,” Section 3.2.1). From the Support page at www.3ds.com/simulia, refer to the System Information page for the current release of Abaqus for complete information about parallel processing support on various platforms, including information about MPI requirements and availability.

# Parallel processing support for Abaqus features

The following Abaqus/Standard features can be executed in parallel: analysis input preprocessing, the direct sparse solver, the iterative solver, and element operations. Analysis input preprocessing uses only MPI-based parallelization and will not be executed in parallel if only data checking is performed. For Abaqus/Explicit all of the computations other than those involving the analysis input preprocessor and the packager can be executed in parallel. Each of the features that are available for parallel execution has certain limitations, which are documented in detail; see “Parallel execution in Abaqus/Standard,” Section 3.5.2, and “Parallel execution in Abaqus/Explicit,” Section 3.5.3. All features in Abaqus/CFD are available for parallel execution without restrictions.

<!-- source-page: 566 -->

# Parallel execution on shared memory computers

Abaqus/Standard and Abaqus/Explicit can be executed in parallel on shared memory computers by using threads or the MPI. When the MPI is available, Abaqus runs all available parallel features with MPIbased parallelization and activates thread-based parallel implementations for cases where an equivalent MPI-based implementation does not exist (e.g., direct sparse solver). Abaqus/CFD can also be executed on shared memory computers but only with the MPI.

# Parallel execution on computer clusters

Abaqus can be executed in parallel on computer clusters by using MPI-based parallelization. For parallel execution on computer clusters, the list of machines or hosts is given with the mp\_host\_list environment file parameter. This parameter also defines the number of processors to be used on each host.

# Parallel execution using GPGPU hardware

The direct solver in Abaqus/Standard can be executed in parallel on computers equipped with computecapable GPGPU cards.

# Use with user subroutines

User subroutines can be used when running jobs in parallel. In a distributed run, the entire model is decomposed into separate domains (partitions). Each domain is serviced by a separate MPI process. Abaqus provides well-defined synchronization points at which it is possible to exchange information across all MPI ranks, using the MPI communications facilities. All native MPI calls are supported, in both Fortran and C++. In addition, for cases of hybrid execution, user subroutines and any subroutines called by them must be thread safe. This precludes the use of common blocks, data statements, and save statements. To work around these limitations and for guidelines and techniques, see “Ensuring thread safety,” Section 2.1.22 of the Abaqus User Subroutines Reference Guide.

<!-- source-page: 567 -->

# 3.5.2 PARALLEL EXECUTION IN Abaqus/Standard

Products: Abaqus/Standard Abaqus/CAE

# References

• “Obtaining information,” Section 3.2.1
• “Using the Abaqus environment settings,” Section 3.3.1
• “Controlling job parallel execution,” Section 19.8.8 of the Abaqus/CAE User’s Guide, in the HTML version of this guide

# Overview

Parallel execution in Abaqus/Standard:

• reduces run time for large analyses;
• is available for shared memory computers and computer clusters for the element operations, direct sparse solver, and iterative linear equation solver; and
• can use compute-capable GPGPU hardware on shared memory computers for the direct sparse solver.

# Parallel equation solution with the default direct sparse solver

The direct sparse solver (“Direct linear equation solver,” Section 6.1.5) supports both shared memory computers and computer clusters for parallelization. On shared memory computers or a single node of a computer cluster, thread-based parallelization is used for the direct sparse solver, and high-end graphics cards that support general processing (GPGPUs) can be used to accelerate the solution. On multiple compute nodes of a computer cluster, a hybrid MPI and thread-based parallelization is used.

The direct sparse solver cannot be used on multiple compute nodes of a computer cluster if:

• the analysis also includes an eigenvalue extraction procedure, or
• the analysis requires features for which MPI-based parallel execution of element operations is not supported.

In addition, the direct sparse solver cannot be used on multiple nodes of a computer cluster for analyses that include any of the following:

• multiple load cases with changing boundary conditions (“Multiple load case analysis,” Section 6.1.4), and
• the quasi-Newton nonlinear solution technique (“Convergence criteria for nonlinear problems,” Section 7.2.3).

To execute the parallel direct sparse solver on computer clusters, the environment variable mp\_host\_list must be set to a list of host machines (see “Using the Abaqus environment settings,” Section 3.3.1). MPI-based parallelization is used between the machines in the host list. Thread-based

<!-- source-page: 568 -->

parallelization is used within a host machine if more than one processor is available on that machine in the host list and if the model does not contain cavity radiation using parallel decomposition (see “Decomposing large cavities in parallel” in “Cavity radiation,” Section 41.1.1). For example, if the environment file has the following:

```python
cpus=8
mp_host_list=[[maple',4],['pine',4]]
```

Abaqus/Standard will use four processors on each host through thread-based parallelization. A total of two MPI processes (equal to the number of hosts) will be run across the host machines so that all eight processors are used by the parallel direct sparse solver.

Models containing parallel cavity decomposition use only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when parallel cavity decomposition is enabled.

Input File Usage: Use the following option in conjunction with the command line input to execute the parallel direct sparse solver:

\*STEP

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors:

abaqus job=beam cpus=2

Abaqus/CAE Usage: Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use multiple

processors, and specify the number of processors, n

# GPGPU acceleration of the direct sparse solver

The direct sparse solver supports GPGPU acceleration.

Input File Usage: Enter the following input on the command line to activate GPGPU direct sparse solver acceleration:

abaqus job=job-name gpus=n

Abaqus/CAE Usage: Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use GPGPU

acceleration, and specify the number GPGPUs

# Memory requirements for the parallel direct sparse solver

The parallel direct sparse solver processes multiple fronts in parallel in addition to parallelizing the solution of individual fronts. Therefore, the direct parallel solver requires more memory than the serial

<!-- source-page: 569 -->

solver. The memory requirements are not predictable exactly in advance since it is not determined a priori which fronts will actually be processed simultaneously.

# Equation ordering for minimum solve time

Direct sparse solvers require the system of equations to be ordered for minimum floating point operation count. The ordering procedure is performed in parallel when multiple host machines are used on a computer cluster. In a shared memory configuration the ordering procedure is not performed in parallel. The parallel ordering procedure will compute different orders when run on different number of host machines, which will affect the floating point operation count for the direct solver. Parallel ordering can offer performance improvements, particularly for large models using many host machines by significantly reducing the time to compute the order. Parallel ordering may cause performance degradation if the order determined results in a higher floating point operation count for the direct solver.

The serial ordering procedure can be used in cases where the variability in the ordering inherent in the parallel ordering procedure is not acceptable. You can deactivate parallel solver ordering from the command line or by using the order\_parallel environment file parameter (see “Command line default parameters” in “Using the Abaqus environment settings,” Section 3.3.1).

Input File Usage: Enter the following input on the command line to deactivate parallel solver ordering:

abaqus job=job-name order\_parallel=OFF

Abaqus/CAE Usage: Deactivation of parallel solver ordering is not supported in Abaqus/CAE.

# Parallel equation solution with the iterative solver

The iterative solver (“Iterative linear equation solver,” Section 6.1.6) uses only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. To execute the parallel iterative solver, specify the number of CPUs for the job. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when the parallel iterative solver is used.

Input File Usage: Use the following option in conjunction with the command line input to execute the parallel iterative solver:

\*STEP, SOLVER=ITERATIVE

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “cube” on four processors with the iterative solver:

abaqus job=cube cpus=4

Abaqus/CAE Usage: Step module: step editor: Other: Method: Iterative

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n

<!-- source-page: 570 -->

Parallel execution of the element operations is the default on all supported platforms. The command line and environment variable standard\_parallel can be used to control the parallel execution of the element operations (see “Using the Abaqus environment settings,” Section 3.3.1, and “Abaqus/Standard, Abaqus/Explicit, and Abaqus/CFD execution,” Section 3.2.2). If parallel execution of the element operations is used, the solvers also run in parallel automatically. For analyses using the direct sparse solver and not containing parallel cavity decomposition, thread-based parallelization of the element operations is used on shared memory computers and a hybrid MPI and thread parallel scheme is used on computer clusters. For analyses using the iterative solver or if parallel cavity decomposition is enabled, only MPI-based parallelization of element operations is supported.

When MPI-based parallelization of element operations is used, element sets are created for each domain and can be inspected in Abaqus/CAE. The sets are named STD\_PARTITION\_n, where n is the domain number.

Parallel execution of the element operations (thread or MPI-based parallelization) is not supported for the following procedures:

• eigenvalue buckling prediction (“Eigenvalue buckling prediction,” Section 6.2.3),
• natural frequency extraction (“Natural frequency extraction,” Section 6.3.5) that does not use the SIM architecture,
• response spectrum analysis (“Response spectrum analysis,” Section 6.3.10),
• random response analysis (“Random response analysis,” Section 6.3.11), and
• mode-based linear dynamics (“Transient modal dynamic analysis,” Section 6.3.7; “Mode-based steady-state dynamic analysis,” Section 6.3.8; “Subspace-based steady-state dynamic analysis,” Section 6.3.9; and “Complex eigenvalue extraction,” Section 6.3.6) that do not use the SIM architecture.

Parallel execution of element operations is available only through MPI-based parallelization for analyses that include any of the following:

• steady-state transport (“Steady-state transport analysis,” Section 6.4.1),
• static, implicit dynamic, or direct-solution steady-state dynamic analyses for models using substructures, if recovering results within substructures is not requested (“Static stress analysis,” Section 6.2.2; “Implicit dynamic analysis using direct integration,” Section 6.3.2; “Direct-solution steady-state dynamic analysis,” Section 6.3.4; “Substructuring,” Section 10.1).

Analyses using the direct sparse solver and any of the procedures above that support only MPI-based parallelization of element operations can be run on computer clusters. However, only one processor per compute node is used for the element operations since thread-based parallelization is not supported.

Parallel execution of element operations is available only through thread-based parallelization for:

• cavity radiation analyses where parallel decomposition of the cavity is not allowed and writing of restart data is requested (“Cavity radiation,” Section 41.1.1),