Quote:

>Subject: Conversion of double precision

>Date: Sat, Nov 15, 1997 19:32 EST

>I have a file in which numbers are stored in double precision format. I'm

>trying to convert a these numbers to a number. In the file the numbers

>are represented by 8 bytes (64 bits). I'm looking to convert the 64bits

>into the corresponding number. I already know the bits 0 to 51 represent

>a mantissa, 52 to 62 the exponent and the 63th bit represent the sign

>bit. Can someone explain how to construct the original number. I'm

>programming in Clipper and Clipper does not have a function who does the

>conversion for me. If I can understand how the calculation with the 64

>bits work, I'll write a function to do the job.

>Thanks in advance,

>Ronny

All floating point numbers in computers are basically a binary version of the

familiar base-10 scientific notation.

Let e be the exponent, m the mantissa, and n the value of the number. And let

^ denote exponentation. Then you have 4 cases:

1. if 0 < e < 2047, then n = 1.m * 2^(e-1023) ["normal" numbers]

Note that the m is really only the fractional part (the part to the right of

the decimal point) of the true mantissa; there's an implied 1 to the left of

the binary decimal point. For example, if the number to be represented is 1.0

* 2^0, then e = 1023, and m = 0, *not 1*, because the 1 to the left of the

binary decimal point is already implied. Numbers of this form will always

have 53 bits of accuracy (52 bit of m + the implied digit of 1).

Most double precision #s will fall into case 1.

2. if e = 0, then n = 0.m * 2^(-1022) [denormals]

Note that, unlike the first case, the e=0 cases have an implied 0 rather than a

1 to the left of the binary decimal point, and like case 1, m is only the

fractional part of the true mantissa. Also note that the exponent is fixed to

-1022. So for example, the number 0 will be represented with e=0 and m=0.

Note that since it's a 0 rather than a 1 to the left of the binary decimal

point, the leading zeros in m, along with the implied 0, only serve as

placeholders rather than significant bits. And case 2 #s are also smaller

than case 1 numbers (the true mantissa is < 1 in case 2 while >= 1 in case 1

because of the difference in the implied bit, and the smallest true exponent

in case 1 is 1-1023 = -1022 = the exponent in case 2). All of these means

that the case 2 numbers are numbers too small to be represented with 53 bit of

accuracy as in case 1. These "underflowed" numbers are called denormals.

3. if e = 2047 and m = 0, then n = infinity.

Believe it or not, you could represent infinity as double precision value!

However, since normally we don't do math with infinities, any case 3 number

could be thrown out.

4. if e = 2047 and m > 0, then n = NaN (Not a Number).

This is the value for representing results like Sqrt(-1) and ln (-1) that are

not real numbers. Since they aren't valid numbers, these case 4 numbers must

be thrown out.

In real life, using the case 1 definition is good enough, since nearly all

numbers go in the case 1 definition. A few numbers, if any, might go into the

case 2 definition. Almost no numbers should go into the case 3 and 4

definitions.