A floating-point number is represented by three quantities: the sign, the mantissa, and the exponent:
with and .
is called the mantissa, the basis, and e the exponent, with . is called the mantissa length. The condition makes the representation unique and saves, in the binary case (), one bit.
Two-floating point zeros, and , exist, both represented by the mantissa .
On a typical Intel processor, . To represent a number in the float type, 64 bits are used, namely, 1 bit for the sign, bits for the mantissa, and bits for the exponent . The upper bound for the exponent is consequently .
With this data, the smallest positive representable number is
, and the largest .
Note that floating-point numbers are not equally spaced in . There is, in particular, a gap at zero (see also [29]). The distance between and the first positive number is , while the distance between the first and the second is smaller by a factor . This effect, caused by the normalization , is visualized in Figure 2.1:
This gap is filled equidistantly with subnormal floating-point numbers to which such a result is rounded. Subnormal floating-point numbers have the smallest possible exponent and do not follow the normalization convention, .