Conversion of double precision 
Author Message
 Conversion of double precision

>Subject: Conversion of double precision

>Date: Sat, Nov 15, 1997 19:32 EST

>I have a file in which numbers are stored in double precision format. I'm
>trying to convert a these numbers to a number. In the file the numbers
>are represented by 8 bytes (64 bits). I'm looking to convert the 64bits
>into the corresponding number. I already know the bits 0 to 51 represent
>a mantissa, 52 to 62 the exponent and the 63th bit represent the sign
>bit. Can someone explain how to construct the original number. I'm
>programming in Clipper and Clipper does not have a function who does the
>conversion for me. If I can understand how the calculation with the 64
>bits work, I'll write a function to do the job.

>Thanks in advance,


All floating point numbers in computers are basically a binary version of the
 familiar base-10 scientific notation.

Let e be the exponent, m the mantissa, and n the value of the number.  And let
 ^ denote exponentation.  Then you have 4 cases:

1.    if 0 < e < 2047, then n = 1.m * 2^(e-1023)   ["normal" numbers]

Note that the m is really only the fractional part  (the part to the right of
 the decimal point) of the true mantissa; there's an implied 1 to the left of
 the binary decimal point.  For example, if the number to be represented is 1.0
 * 2^0, then e = 1023,  and m = 0, *not 1*, because the 1 to the left of the
 binary decimal point is already implied.  Numbers of this form will always
 have 53 bits of accuracy (52 bit of m + the implied digit of 1).
Most double precision #s will fall into case 1.

2. if e = 0, then n = 0.m * 2^(-1022)               [denormals]

Note that, unlike the first case, the e=0 cases have an implied 0 rather than a
 1 to the left of the binary decimal point, and like case 1, m is only the
 fractional part of the true mantissa.  Also note that the exponent is fixed to
 -1022.  So for example, the number 0 will be represented with e=0 and m=0.

Note that since it's a 0 rather than a 1 to the left of the binary decimal
 point, the leading zeros in m, along with the implied 0, only serve as
 placeholders rather than significant bits.  And case 2 #s are also smaller
 than case 1 numbers (the true mantissa is < 1 in case 2 while >= 1 in case 1
 because of the difference in the implied bit, and the smallest true exponent
 in case 1 is 1-1023 = -1022 = the exponent in case 2).  All of these means
 that the case 2 numbers are numbers too small to be represented with 53 bit of
 accuracy as in case 1.  These "underflowed" numbers are called denormals.

3. if e = 2047 and m = 0, then n = infinity.

Believe it or not, you could represent infinity as double precision value!
 However, since normally we don't do math with infinities, any case 3 number
 could be thrown out.

4. if e = 2047 and m > 0, then n = NaN (Not a Number).

This is the value for representing results like Sqrt(-1) and ln (-1) that are
 not real numbers.  Since they aren't valid numbers, these case 4 numbers must
 be thrown out.

In real life, using the case 1 definition is good enough, since nearly all
 numbers go in the case 1 definition.  A few numbers, if any, might go into the
 case 2 definition.  Almost no numbers should go into the case 3 and 4

Good luck and hope it helps.

Wed, 18 Jun 1902 08:00:00 GMT  
 [ 2 post ] 

 Relevant Pages 

1. proper way to use single and double precision?

2. Double Precision FFT

3. getting high precision arithmetic for BP7

4. Rounding Off or Precision Error

5. Currency Field lossing precision

6. Precision Timer

7. Arithmetic in multiple precision

8. Real precision

9. HELP !!! Table field precision

10. multiple precision arithmetic

11. Precision calculations

12. Precision of 'Reall datatype


Powered by phpBB® Forum Software