INTEL 2017¶

notes from (intel).

Option de compilation¶

-align

Tells the compiler how to align certain data items.

-align [keyword[, keyword...]]

-noalign

-align none : Tells the compiler not to add padding bytes anywhere in common blocks or structures. This is the same as specifying noalign.

-align arraynbyte : Aligns the start of arrays on an n-byte boundary. n can be 8, 16, 32, 64, 128, or 256. The default value for n is 8. This affects the starting alignment for all arrays except for arrays in COMMON. Arrays do not have padding between their elements.

-align commons : Aligns all common block entities on natural boundaries up to 4 bytes, by adding padding bytes as needed.

The align nocommons option adds no padding to common blocks. In this case, unaligned data can occur unless the order of data items specified in the COMMON statement places the largest numeric data item first, followed by the next largest numeric data (and so on), followed by any character data.

-align dcommons : Aligns all common block entities on natural boundaries up to 8 bytes, by adding padding bytes as needed. This option is useful for applications that use common blocks, unless your application has no unaligned data or, if the application might have unaligned data, all data items are four bytes or smaller. For applications that use common blocks where all data items are four bytes or smaller, you can specify align commons instead of align dcommons.

-align qcommons : Aligns all common block entities on natural boundaries up to 16 bytes, by adding padding bytes as needed. This option is useful for applications that use common blocks, unless your application has no unaligned data or, if the application might have u naligned data, all data items are eight bytes or smaller. For applications that use common blocks where all data items are eight bytes or smaller, you can specify align dcommons instead of align qcommons.

-align zcommons : Aligns all common block entities on natural boundaries up to 32 bytes, by adding padding bytes as needed. This option is useful for applications that use common blocks, unless your application has no unaligned data or, if the application might have unaligned data, all data items are 16 bytes or smaller. For applications that use common blocks where all data items are 16 bytes or smaller, you can specify align qcommons instead of align zcommons.

-align norecords : Aligns components of derived types and fields within record structures on arbitrary byte boundaries with no padding. The align records option requests that multiple data items in record structures and derived-type structures without the SEQUENCE statement be naturally aligned, by adding padding as needed.

-align recnbyte : Aligns components of derived types and fields within record structures on the smaller of the size boundary specified (n) or the boundary that will naturally align them. n can be 1, 2, 4, 8, 16, or 32. The default value for n is 8. When you specify this option, each structure member after the first is stored on either the size of the member type or n-byte boundaries, whichever is smaller. For example, to specify 16 bytes as the packing boundary (or alignment constraint) for all structures and unions in the file prog1.f, use the following command:

ifort -align rec16byte prog1.f

This option does not affect whether common blocks are naturally aligned or packed.

-align sequence : Aligns components of a derived type declared with the SEQUENCE statement (sequenced components) according to the alignment rules that are currently in use. The default alignment rules are to align unsequenced components on natural boundaries.

The align nosequence option requests that sequenced components be packed regardless of any other alignment rules. Note that align none implies align nosequence.

If you specify an option for standards checking, align sequence is ignored.

-align all : Tells the compiler to add padding bytes whenever possible to obtain the natural alignment of data items in common blocks, derived types, and record structures. Specifies align dcommons, align records, align nosequence. This is the same as specifying align with no keyword.

-assume

Tells the compiler to make certain assumptions.

-assume none : Disables all the assume options.

-assume byterecl : Specifies that the units for the OPEN statement RECL specifier (record length) value are in bytes for unformatted data files, not longwords (four-byte units). For formatted files, the RECL value is always in bytes.

If a file is open for unformatted data and assume byterecl is specified, INQUIRE returns RECL in bytes; otherwise, it returns RECL in longwords. An INQUIRE returns RECL in bytes if the unit is not open.

-assume [much more keywork]

-auto

Causes all local, non-SAVEd variables to be allocated to the run-time stack.

This option places local variables (scalars and arrays of all types), except those declared as SAVE, on the run-time stack. It is as if the variables were declared with the AUTOMATIC attribute.

It does not affect variables that have the SAVE attribute or ALLOCATABLE attribute, or variables that appear in an EQUIVALENCE statement or in a common block.

This option may provide a performance gain for your program, but if your program depends on variables having the same value as the last time the routine was invoked, your program may not function properly.

If you want to cause variables to be placed in static memory, specify option [Q]save. If you want only scalar variables of certain intrinsic types to be placed on the run-time stack, specify option auto-scalar.

-auto-scalar

Scalar variables of intrinsic types INTEGER, REAL, COMPLEX, and LOGICAL are allocated to the run-time stack. Note that the default changes to auto if one of the following options are specified:

• recursive
• [q or Q]openmp

-big_endian

Utile quand on veut

• lire un fichier venant d’un calculateur externe (IBM en particulier)
• donner un fichier binaire à un calculateur externe (IBM en particulier)

Gère l’écriture des floats et intgers et Permet de dire quand quel sens on ecrit les octets

little_endian (Datarmor)
• -i4 de 0 à 3
• -i8 de 0 à 7
big_endian (IBM, historique)
• -i4 de 3 à 0
• -i8 de 7 à 0

A proscrire sauf si indispensable (lire un fichier venant d’un calculateur externe...)

-c

Prevents linking. Compilation stops after the object file is generated. The compiler generates an object file for each Fortran source file.

-fast

Maximizes speed across the entire program.

Default OFF

It sets the following options:

-ipo, -O3, -no-prec-div, -static, -fp-model fast=2, and -xHost

-falias

This option specifies whether or not the compiler can assume that during a procedure call, local variables in the caller that are not present in the actual argument list and not visible by host association, are not referenced or redefined due to hidden aliasing. The Fortran standard generally prohibits such aliasing.

Default -fno-alias

-fno-alias

Procedure calls do not alias local variables.

-ffnalias

Aliasing is assumed within functions.

-fno-fnalias

aliasing is not assumed within functions, but it is assumed across calls.

-fp-model

Controls the semantics of floating-point calculations.

-fp-model precise : Tells the compiler to strictly adhere to value-safe optimizations when implementing floating-point calculations. It disables optimizations that can change the result of floating-point calculations.

These semantics ensure the reproducibility of floating-point computations for serial code, including code vectorized or auto-parallelized by the compiler, but they may slow performance. They do not ensure value safety or run-to-run reproducibility of other parallel code. Run-to-run reproducibility for floating-point reductions in OpenMP* code may be obtained for a fixed number of threads through the KMP_DETERMINISTIC_REDUCTION environment variable. For more information about this environment variable, see topic “Supported Environment Variables”.

The compiler assumes the default floating-point environment; you are not allowed to modify it.

Note that option fp-model precise implies fp-model source and option fp:precise implies fp:source.

Floating-point exception semantics are disabled by default. To enable these semantics, you must also specify -fp-model except or /fp:except.

-fp-model fast[=1|2] : Tells the compiler to use more aggressive optimizations when implementing floating-point calculations. These optimizations increase speed, but may affect the accuracy or reproducibility of floating-point computations.

Specifying fast is the same as specifying fast=1. fast=2 may produce faster and less accurate results.

Floating-point exception semantics are disabled by default and they cannot be enabled because you cannot specify fast and except together in the same compilation. To enable exception semantics, you must explicitly specify another keyword (see other keyword descriptions for details).

To enable exception semantics, you must explicitly specify another keyword (see other keyword descriptions for details).

-fp-model consistent : Tells the compiler to generate code that will give consistent, reproducible floating-point results for different optimization levels or between different processors of the same architecture . For more information, see the article titled: Consistency of Floating-Point Results using the Intel® Compiler, which is located in http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

-fp-model strict : Tells the compiler to strictly adhere to value-safe optimizations when implementing floating-point calculations and enables floating-point exception semantics. This is the strictest floating-point model. The compiler does not assume the default floating-point environment; you are allowed to modify it. Floating-point exception semantics can be disabled by explicitly specifying -fp-model no-except or /fp:except-.

-fp-model source : This option causes intermediate results to be rounded to the precision defined in the source code. It also implies keyword precise unless it is overridden by a keyword from Group A. The compiler assumes the default floating-point environment; you are not allowed to modify it.

-fp-model except : Tells the compiler to follow strict floating-point exception semantics.

-fpe-all

Allows some control over floating-point exception handling for each routine in a program at run-time.

-fpe-all=n (default -fpe-all=3)

n

Specifies the floating-point exception handling level. Possible values are:

0 : Floating-point invalid, divide-by-zero, and overflow exceptions are enabled. If any such exceptions occur, execution is aborted. This option sets the [Q]ftz option; therefore underflow results will be set to zero unless you explicitly specify -no-ftz (Linux and OS X) or /Qftz- (Windows).

To get more detailed location information about where the error occurred, use option traceback.

1 : All floating-point exceptions are disabled. Underflow results from SSE instructions, as well as x87 instructions, will be set to zero.

3 : All floating-point exceptions are disabled. Floating-point underflow is gradual, unless you explicitly specify a compiler option that enables flush-to-zero, such as [Q]ftz, O3, or O2. This setting provides full IEEE support.


-ftz

Flushes denormal results to zero when the application is in the gradual underflow mode. It may improve performance if the denormal values are not critical to your application’s behavior.

The [Q]ftz option has no effect during compile-time optimization.

The [Q]ftz option sets or resets the FTZ and the DAZ hardware flags. If FTZ is ON, denormal results from floating-point calculations will be set to the value zero. If FTZ is OFF, denormal results remain as is. If DAZ is ON, denormal values used as input to floating-point instructions will be treated as zero. If DAZ is OFF, denormal instruction inputs remain as is. Systems using Intel® 64 architecture have both FTZ and DAZ. FTZ and DAZ are not supported on all IA-32 architectures.

When the [Q]ftz option is used in combination with an SSE-enabling option on systems using IA-32 architecture (for example, the [Q]xSSE2 option), the compiler will insert code in the main routine to set FTZ and DAZ. When [Q]ftz is used without such an option, the compiler will insert code to conditionally set FTZ/DAZ based on a run-time processor check. The negative form of option [Q]ftz will prevent the compiler from inserting any code that might set FTZ or DAZ.

Option [Q]ftz only has an effect when the main program is being compiled. It sets the FTZ/DAZ mode for the process. The initial thread and any threads subsequently created by that process will operate in FTZ/DAZ mode.

If this option produces undesirable results of the numerical behavior of your program, you can turn the FTZ/DAZ mode off by using -no-ftz or /Qftz- in the command line while still benefiting from the O3 optimizations.

Every optimization option O level, except O0, sets [Q]ftz.

Value 0 for the [Q]fpe option sets [Q]ftz.

-ipo

Enables interprocedural optimization between files.

-ipo[n]    # n=0 if not specified
-no-ipo    # default


n is an optional integer that specifies the number of object files the compiler should create. The integer must be greater than or equal to 0.

This option enables interprocedural optimization between files. This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO).

When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

You cannot specify the names for the files that are created.

If n is 0, the compiler decides whether to create one or more object files based on an estimate of the size of the application. It generates one object file for small applications, and two or more object files for large applications.

If n is greater than 0, the compiler generates n object files, unless n exceeds the number of source files (m), in which case the compiler generates only m object files.

If you do not specify n, the default is 0.

En résumé :

• -ipo permet inlining des subroutines (introduction subroutine à la place de l’appel) où que soit la routine inutilement (dans fichier à part par exemple)
• -ip permet inlining des subroutines quand elles sont disponibles dans le fichier .F90 initial

-fma

-no-fma

Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor.

a <— a + b x c

If the instructions exist on the target processor, the compiler generates fused multiply-add (FMA) instructions.

However, if you specify -fp-model strict (Linux* OS and OS X*), but do not explicitly specify -fma, the default is -no-fma.

Description:

This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding.

This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x, -march (Linux OS and OS X), or /arch (Windows OS)

-prec-div

Improves precision of floating-point divides. It has a slight impact on speed.

This option inhibits any optimizations that can adversely affect the precision of a square root computation. The result is fully precise square root implementations, with some loss of performance.

With some optimizations, such as -msse2 (Linux*) or /arch:SSE2 (Windows*), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.

However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, use this option to disable the floating-point division-to-multiplication optimization. The result is more accurate, with some loss of performance.

default -prec-div

-no-prec-div

If you specify -no-prec-div (Linux* and OS X*) or /Qprec-div- (Windows*), it enables optimizations that give slightly less precise results than full IEEE division.

-prec-sqrt

Improves precision of square root implementations.

Default -no-prec-sqrt

-no-prec-sqrt

The compiler uses a faster but less precise implementation of square root.

However, the default is -prec-sqrt if any of the following options are specified: -O0, -fltconsistency, or -mp1

-traceback

Tells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time.

-fpe

Allows some control over floating-point exception handling for the main program at run-time.

-warn

Specifies diagnostic messages to be issued by the compiler.

-warn all : This is the same as specifying warn. This option does not set options warn errors or warn stderrors. To enable all the additional checking to be performed and force the severity of the diagnostic messages to be severe enough to not generate an object file, specify warn allwarn errors or warn allwarn stderrors.

-warn none Disables all warning messages. This is the same as specifying -nowarn, -w, -W0, -warn nogeneral

see more keywords on intel page

xCORE-AVX2

Tells the compiler which processor features it may target, including which instruction sets and optimizations it may generate

May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors. Optimizes for Intel(R) processors that support Intel(R) AVX2 instructions.

Debugging¶

notes from (intel_debug).

-O0

Disables optimizations so you can debug your program before any optimization is attempted. This is the default behavior when debugging.

Note : On Linux* and OS X*, -fno-omit-frame-pointer is set if either option -O0 or -g is specified.

-O1, O2, or O3

Specifies the code optimization level for applications. If you use any of these options, it is recommended that you use -debug extended when debugging.

-debug

Enables or disables generation of debugging information.

see man ifort for keywords

Default: varies

Normally, the default is -debug none and no debugging information is generated. However, on Linux* OS, the -debug inline-debug-info option will be enabled by default if you compile with optimizations (option -O2 or higher) and debugging is enabled (option -g).

For -debug full, -debug all, or -debug : -g

-g

Generates symbolic debugging information and line numbers in the object code for use by the source-level debuggers. Turns off O2 and makes -O0 (Linux* and OS X*) or /Od (Windows*) the default. The exception to this is if options O1, O2, or O3 are explicitly specified in the command line.

Debugging information produced, -O0 (Linux* and OS X*) or /Od (Windows*) enabled (meaning optimizations are disabled).

For Linux* and OS X*, -fp is also enabled for compilations targeted for IA-32 architecture.

-debug

Specifies settings that enhance debugging.

-debug extended (Linux* and OS X*)

-fp

Disables the ebp register in optimizations and sets the ebp register to be used as the frame pointer.

-Traceback

Causes the compiler to generate extra information in the object file, which allows a symbolic stack traceback.