# Numerical Precision

Main menu

Search

Discounts

Latest posts

We accept

## Buy custom Numerical Precision essay

Numerical precision refers to the degree of numerical quantity mostly given in bytes. The floating point in computing is a way in which numbers with a wide range are represented in an approximate manner. The floating point numbers in essence are those with decimal points. Generally, such wide numbers are expressed by using a given number of significant figures and then by noting in an exponential form. Most of those numbers are scaled down in binary, decimal or hexadecimal numbers. Floating point numbers are many times stored in exponential form, for instance, 0.00235 would be stored as 2.35×10^{-3, }and a number like 45893.4 would be represented as 4.58934×10^{4} (Poland, 1997).

The memory of a computer cannot store a floating point number such as 101.011 just the way it appears since we cannot store a decimal point as the representation would be really big. For that case, therefore, we break such a number into two different parts or structures, the mantissa and the base side. It would appear as M×B^{e}. M represents Mantissa, B is the base which can be in binary, decimal or hexadecimal; e represents the exponential which refers to a significant algorithm (Poland et al., 1997).

For instance, to store a floating number such as 3.25, we begin by converting the number into the binary form. This would be 011.01×2^{0}; we then proceed to normalize the mantissa. Most importantly, normalizing the mantissa helps get rid of any significant digits on the left from the decimal point. For this case, we move it to the left, and this becomes 0.1101; then we record that in the exponent. Since we have moved it to the left, we add it while we subtract any movement to the right. The representation would be then 0.1101×2^{2}, and the exponent is then converted to the binary form, 2 in binary form is 010. The representation of 3.25 then becomes 0.1101×2^{010}; we then proceed to express it in a form that could be readable and stored by the memory. Such a form would be to converting it into a word or a string of bits. Using the 8-bit format, we might use one bit to represent the mantissa and then the exponent would be represented using 2’s complement. Given that the mantissa is positive, we use 0 (zero); the exponent in 2’s complement becomes 010. We conclude by normalizing the mantissa. Hence such a number would be stored as 0 010 0100.

In calculation, the floating point numbers have a few limitations. They are represented in the computer hardware in a binary form, and as such most decimal fractions cannot be represented exactlyas binary fractions. For instance, the decimal fraction 0.125 has the value of 1/8, and, in the same way, the binary fraction 0.001 has a binary fraction of 1/8. The major difficulty is that, in general, the decimal floating point numbers entered into the computer are only approximated by the binary floating point numbers that are stored in it. Floating numbers also come along with representational errors.*The errors occur as a result of some decimal fractions lacking the ability of non-representational as exactly binary* fractions. This is the main reason why Python or Perl, C, C++, Java, Fortran often do not display the exact decimal number one expects. Sometimes, the results got after the computation are far more different from the expectation despite the correct floating point arithmetic established (Inacio & Ombres, 1996). Designing an efficient and a reliable hardware system or a software floating point system is tiresome and a somewhat tough task. The other limitation is that a few real numbers can be represented exactly in the floating point. Most of the numbers have to be rounded off to the nearest real number that can easily be represented. This results in the rounding error that can easily compromise the calculations (Inacio & Ombres, 1996).

There are a lot of advantages of using floating point numbers that in the long run lead to their continual and easy usage. The accuracy of calculation is often easily established. A wide range of figures can be conceptualized and worked with in this case. Given the storage method,it is easy tocompare two numbers to observe which is thefurthermost and the least. Also, the method allows a large number ranges to be stored using relatively less bits. I’m highly of the opinion that using the floating point format is memory efficient, especially when applying mixed precision that is using single precisions in some parts and double precisions in others (Overton, 2001).

Binary Coded Decimal refers to a computing system that uses a four digit binary code, to each digit from 0 through to 9 in a decimal numeral or a base of 10. There are six binary coded instructions, and they are divided into two formats. The formats are the packed and unpacked BCD formats. In the unpacked format,there is the allocation of one byte for every given decimal digit. The byte value ranges from 0 to 9, despite a byte being able to hold values from 0 to 255. The numbers are stored in memory mostly in consecutive bytes in the decreasing order of digits, just the very same order they would occur as a string (Inacio & Ombres, 1996). The numbers can then easily be converted to ASCII equivalent for the purpose of display. Digits can alsoo be stored in the reverse order with the advantage being that the number can easily be increased without the need of shifting other digits when a new significant digit arises.

In the packed BCD format, one byte is allocated for every pair of decimal digits beginning with the least significant digit. Usually, the lower value digit is kept in the area of the lower 4 bits, while the higher value is stored in the upper 4 bits of that given byte. In this format, a number is usually stored in the memory in consecutive bytes in the order of the decreasing digit values. The digits can also be kept in the reverse order as required by the FPU and FBLD instructions (Nikmehr, Phillips & Lim, 2006). BCD format storage is not memory efficient; for instance, an eight bit binary coded decimal element can represent the values from 0 to 99, while those same 8 bits, when holding a binary element, can represent the values from 0 to 255.

The floating point format is more efficient in memory usage as compared to the binary coded decimal format. In carrying out calculations, the floating point is more accurate than the binary coded decimal as long as the right keenness is obeyed. The floating point also gives more precision as compared to the BCD.

I would highly recommend the use of the floating point format, as it is efficient in its operation. It is much faster than the other format, and the calculations are accurate. The floating point format has a high precision and the memory storage is better. Benefitting knowingly from the errors that occur as a result of rounding off is unethical within the banking sector. With time, this would play a role in questioning the trust to the bank by the customers. Given that any bank is always considered the most secure place, the errors would then raise issues about the security of the customers as well as their investments and savings in the banks (Overton, 2001). To minimize and bring to a halt such problems, we can employ the use of the Round () function, using it to round off the numbers to the desired decimal places as specified by the calculation. The other option is to utilize the method of precision as displayed (Nikmehr, Phillips & Lim, 2006). The methods aid in preventing the floating point errors from affecting one’s work.

When comparing the floating point format and the exponential format, the exponential format is more accurate as the errors connected with rounding off do not occur, while the floating point format is marred with lots of errors. The floating point format processing speed is much faster as compared to the exponential format.

### Buy custom Numerical Precision essay

Related essays