📅 Sat, Mar 30, 2024 ⏲️ ~ 4 minutes Intermediate

$representar-numeros-fraccionarios-en-binario$

How to represent fractional numbers in Binary

We have already learned to store both positive and negative integers. Now we are going to see fractional numbers.

Fractional numbers, or as we normally call them “numbers with decimals,” are those that do not correspond to whole parts. Example, 0.5, 2.31, 7.353, are fractional numbers.

Representing fractional numbers is more complicated than representing positive or negative integer numbers. So we have different forms of representation, the most common being floating point.

Fixed Point Representation

One of the simplest ways to represent numbers with decimals in binary is the fixed point method. In this approach, a fixed number of bits is assigned for the integer part and another fixed number for the fractional part of the number.

For example, in an 8-bit system with 4 bits for the integer part and 4 for the decimal part, the number 5.75 would be represented as 0101.1100.

This technique is direct and easy to implement, but it has many limitations. The precision is limited by the number of bits dedicated to the fractional part.

Also, it is not dynamic and cannot handle numbers that exceed the range defined by the assigned bits.

Floating Point Representation

The floating point method is the de facto standard for representing numbers with decimals in most modern computers. This method allows for greater flexibility and precision when handling numbers of different magnitudes.

The IEEE 754 standard is the most widely used for representing floating point numbers.

In floating point representation, a number is divided into three parts: the sign, the exponent, and the mantissa.

The sign indicates whether the number is positive or negative.
The exponent determines where the decimal point is.
The mantissa is the fractional part of the number.

For example, the number 5.75 in 32-bit floating point format would be:

0 10000001 01110000000000000000000

Sign: 0 (positive)
Exponent: 10000001 (129 in decimal)
Mantissa: 01110000000000000000000

This method allows handling numbers of very different magnitudes by adjusting the exponent. But it also has its limitations, especially in terms of precision for very small or very large numbers.

The representation is more complex than fixed point representation and requires more computational cost. But, in return, it allows us to cover a huge amount of numbers.

Precision Problems

Despite its versatility, floating point representation of decimal numbers can lead to precision problems. This is because in reality, we are not coding a number but “a very close number.”

For example, the decimal number 0.1 cannot be represented accurately in binary system and results in a periodic representation in binary.

In floating point, this number is

Sign: 0
Exponent: 123 in decimal
Mantissa: 5033165

That is, the number that you are really representing is not 0.1, but

0.100000001490116119384765625

It is a small difference, but it generates a lot of problems and apparent contradictions when programming.

Real examples of “weird things” that can happen to you

That adding ten times 0.1 and subtracting 1.0, is not zero.

0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 - 1.0 = −2.77556×10^−17

Or that when subtracting these two numbers, you do not get 0.00001, which would be “normal.”

1.00001 - 1.0 = 0.00000999999

Or even that the result is different depending on the order in which you do the operations.

(0.1 + 0.2) + 0.3 = 0.6000000000000001
0.1 + (0.2 + 0.3) = 0.6

That is, floating point numbers must be handled with caution. By

Other Representations

There are other much less common techniques, but equally interesting for representing decimal numbers in binary. Some of these include:

Midpoint notation

In this technique, a number is represented as the sum of two numbers in fixed point. This can be useful in situations where high precision is required and range is not a major concern.

Fixed point method

Similar to fixed point, but with a variable number of bits for the fractional part. This can allow for greater precision for certain numbers, but at the cost of flexibility in the range.

Normalized floating point

A variant of floating point that guarantees that the most significant bit of the mantissa is always 1, which improves precision and range compared to standard floating point.