In computing, especially digital signal processing, the multiply-add operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator, this operation modifies an accumulator a:
a ← a + ( b × c )
When done with floating point numbers, it might be performed with two roundings (typical in many DSPs), or with a single rounding. When performed with a single rounding, it is called a fused multiply–add (FMA).
In floating-point arithmetic
When done with integers, the operation is typically exact (computed modulo some power of two). However, floating-point numbers have only a certain amount of mathematical precision. That is, digital floating-point arithmetic is generally not associative or distributive. Therefore, it makes a difference to the result whether the multiply–add is performed with two roundings, or in one operation with a single rounding (a fused multiply–add). IEEE 754-2008 specifies that it must be performed with one rounding, yielding a more accurate result.
Fused multiply–add
A fused multiply–add (FMA or fmadd) is a floating-point multiply–add operation performed in one step, with a single rounding. That is, where an unfused multiply–add would compute the product b × c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire expression a + (b × c) to its full precision before rounding the final result down to N significant bits.
A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:
Dot product
Matrix multiplication
Polynomial evaluation (e.g., with Horner’s rule)
Newton’s method for evaluating functions (from the inverse function)
Convolutions and artificial neural networks
Multiplication in double-double arithmetic
Fused multiply–add can usually be relied on to give more accurate results. However, William Kahan has pointed out that it can give problems if used unthinkingly.
When implemented inside a microprocessor, an FMA can be faster than a multiply operation followed by an add.
Another benefit of including this instruction is that it allows an efficient software implementation of division and square root operations, thus eliminating the need for dedicated hardware for those operations.
Support
The FMA operation is included in IEEE 754-2008. The 1999 standard of the C programming language supports the FMA operation through the fma() standard math library function, and standard pragmas (#pragma STDC FP_CONTRACT) controlling optimizations based on FMA.
The fused multiply–add operation was introduced as “multiply–add fused” in the IBM POWER1 (1990) processor, but has been added to numerous other processors since then.
x86 processors with FMA3 and/or FMA4 instruction set