$representar-numeros-fraccionarios-en-binario$

How to represent fractional numbers in Binary

4 min
Intermediate

We have already learned how to store both positive and negative integers. Now we are going to look at fractional numbers.

Fractional numbers, or as we commonly call them “numbers with decimals”, are those that do not correspond to whole parts (for example, 0.5, 2.31, 7.353 are fractional numbers).

Representing fractional numbers is more complicated than representing positive or negative integers. So we have different representation methods, with floating-point being the most common.

Fixed-Point Representation

One of the simplest ways to represent numbers with decimals in binary is the fixed-point method. In this approach, a fixed number of bits is assigned for the integer part and another fixed number for the fractional part of the number.

For example, in an 8-bit system with 4 bits for the integer part and 4 for the decimal part, the number 5.75 would be represented as 0101.1100.

This technique is straightforward and easy to implement, but it has many limitations. The precision is limited by the number of bits dedicated to the fractional part.

Furthermore, it is not dynamic and cannot handle numbers that exceed the range defined by the assigned bits.

Floating-Point Representation

The floating-point method is the de facto standard for representing numbers with decimals in most modern computers. This method allows greater flexibility and precision when handling numbers of different magnitudes.

The IEEE 754 standard is the most widely used for representing floating-point numbers.

In floating-point representation, a number is divided into three parts: the sign, the exponent, and the mantissa.

The sign indicates whether the number is positive or negative.
The exponent determines where the decimal point is.
The mantissa is the fractional part of the number.

For example, the number 5.75 in 32-bit floating-point format would be:

0 10000001 01110000000000000000000

Copied!

Sign: 0 (positive)
Exponent: 10000001 (129 in decimal)
Mantissa: 01110000000000000000000

This method allows handling numbers of very different magnitudes by adjusting the exponent. But it also has its limitations, especially in terms of precision for very small or very large numbers.

The representation is more complex than fixed-point representation and requires greater computational cost. But, in return, it allows us to cover a huge range of numbers.

Despite its versatility, the representation of decimal numbers in floating-point can lead to precision problems. This is because, in reality, we are not encoding an exact number but “a very close number”.

For example, the decimal number 0.1 cannot be represented precisely in the binary system and results in a repeating binary representation.

In floating-point, that number is:

Sign: 0
Exponent: 123 in decimal
Mantissa: 5033165

That is, the number you are actually representing is not 0.1, but

0.100000001490116119384765625

Copied!

It’s a small difference, but it generates many problems and apparent contradictions when programming.

Real examples of “weird things” that can happen to you:

Adding 0.1 ten times and subtracting 1.0 does not equal zero.

0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 - 1.0 = −2.77556×10^−17

Copied!

Or that subtracting these two numbers does not give you 0.00001, which would be “normal”.

1.00001 - 1.0 = 0.00000999999

Copied!

Or even that the result is different depending on the order in which you perform the operations.

(0.1 + 0.2) + 0.3 = 0.6000000000000001
0.1 + (0.2 + 0.3) = 0.6

Copied!

That is, floating-point numbers must be handled with caution. It’s not that they are bad, it’s that you need to understand how they work to know how to work with them.

Other Representations

There are other much less common, but equally interesting techniques for representing numbers with decimals in binary. Some of these include:

Midpoint notation: In this technique, a number is represented as the sum of two fixed-point numbers. This can be useful in situations where high precision is required and range is not a primary concern.
Fixed-comma method: Similar to fixed-point, but with a variable number of bits for the fractional part. This can allow for greater precision for certain numbers, but at the cost of flexibility in range.
Normalized floating-point: A variant of floating-point that guarantees the most significant bit of the mantissa is always 1, which improves precision and range compared to standard floating-point.

Fixed-Point Representation

Floating-Point Representation

Precision Problems

Other Representations