Author |
Message |
Lucas Monte #1 / 25
|
 va_ macros
hi, i tried some uses of va_arg but i can't figure out how i can get a double arg correctly : all double or float args are 'converted' like this: 6.45 -> 6.4499897569845646542164848... 124.1 -> 124.0894874562354415465454... some help would be very nice :) thx --- Regards, lucas
|
Sun, 27 Jun 2004 00:38:57 GMT |
|
 |
Mark A. Odel #2 / 25
|
 va_ macros
Quote: > hi, > i tried some uses of va_arg but i can't figure out how i can get a > double arg correctly : all double or float args are 'converted' like > this: 6.45 -> 6.4499897569845646542164848... > 124.1 -> 124.0894874562354415465454... > some help would be very nice :)
Looks right to me. Remember, floating point is not precise. -- - Mark A. Odell - Embedded Firmware Design, Inc. - http://www.embeddedfw.com
|
Sun, 27 Jun 2004 00:37:40 GMT |
|
 |
Ryan Henness #3 / 25
|
 va_ macros
Quote:
> hi, > i tried some uses of va_arg but i can't figure out how i can get a > double arg correctly : all double or float args are 'converted' like > this: 6.45 -> 6.4499897569845646542164848... > 124.1 -> 124.0894874562354415465454... > some help would be very nice :) > thx > --- > Regards, > lucas
Welcome to the world of floating point representations. The real number, 6.45, has no binary representation whatsoever. It has an approximation, something close to 6.4499897569845646542164848 with your particular implementation. This problem has nothing to do with the va_* macros. In general, floating point numbers have to be taken with a grain of salt. Ask yourself what your floats are representing -- you can then make a decision as to the proper way to deal with them. For example, to represent money, use an integer number of pennies, to avoid rounding errors at all. To represent thousands of feet, you would make sure to round off the float at one to three decimal places, as opposed to four{*filter*}. For the same reason, don't try to compare two floats for equality, either. (6.45 != 6.44998975...) Find a reasonable threshold at which you would call two similar numbers equal, and test against that. Hope this helps, Ryan.
|
Sun, 27 Jun 2004 01:00:15 GMT |
|
 |
Lawrence Kir #4 / 25
|
 va_ macros
On 8 Jan, in article
Quote:
>> hi, >> i tried some uses of va_arg but i can't figure out how i can get a >> double arg correctly : all double or float args are 'converted' like >> this: 6.45 -> 6.4499897569845646542164848... >> 124.1 -> 124.0894874562354415465454... >> some help would be very nice :) >Looks right to me. Remember, floating point is not precise.
But doubles should be accurate to more than 5 sig figs for simple operations. If these are really the numbers encountered Lucas should post his code so we can take a look at it. -- -----------------------------------------
-----------------------------------------
|
Sun, 27 Jun 2004 07:05:19 GMT |
|
 |
Joe Wrigh #5 / 25
|
 va_ macros
Quote:
> On 8 Jan, in article
> >> hi, > >> i tried some uses of va_arg but i can't figure out how i can get a > >> double arg correctly : all double or float args are 'converted' like > >> this: 6.45 -> 6.4499897569845646542164848... > >> 124.1 -> 124.0894874562354415465454... > >> some help would be very nice :) > >Looks right to me. Remember, floating point is not precise. > But doubles should be accurate to more than 5 sig figs for simple > operations. If these are really the numbers encountered Lucas should > post his code so we can take a look at it.
Can't guess where those numbers came from. Here's what I get. double d = 6.54; 01000000 00011001 11001100 11001100 11001100 11001100 11001100 11001101 Exp = 1025 (3) 000 00000011 Man = .11001 11001100 11001100 11001100 11001100 11001100 11001101 6.4500000000000002e+00 double d = 124.1; 01000000 01011111 00000110 01100110 01100110 01100110 01100110 01100110 Exp = 1029 (7) 000 00000111 Man = .11111 00000110 01100110 01100110 01100110 01100110 01100110 1.2409999999999999e+02 The "%.16e" format specifier is used here because less than 16 can mis-represent the double value. More than 16 is nonsense, not representable in a double. (I use DJGPP (gcc) on an x86 with 64-bit doubles.) --
"Everything should be made as simple as possible, but not simpler." --- Albert Einstein ---
|
Sun, 27 Jun 2004 12:33:41 GMT |
|
 |
Gabriel Sech #6 / 25
|
 va_ macros
Quote:
> hi, > i tried some uses of va_arg but i can't figure out how i can get a > double arg correctly : all double or float args are 'converted' like > this: 6.45 -> 6.4499897569845646542164848... > 124.1 -> 124.0894874562354415465454... > some help would be very nice :) > thx
THe others explained why it didnt work, I'll explain floating points a bit more. Floating points are stored by having 1 bit for the sign, a certain number of bits called the exponent, and a certain number called the mantissa. The value of float is: mantissa*2**exponent where ** is to the power of. This is why no floating point is exact (exceptions- some integers and any fraction that is a power of 2 (1/2,1/4,3/4,1/8 etc) can be exactly stored). The idea is that if you use enough bits in the mantissa, the inaccuracy is very small. THe difference for 6.45 was .0000102... So you had 5 sig figs of accuracy. Thats more than enough for almost naything. ANd if you need more, use a double or even a long double. Some other useful stuff about floats: -The inaccuracy is additive. If you add 2 floats, the inaccuracy of the sum can be anywhere from 0 to the sum of the 2 previous inaccuracies. Always assume worsecase, so every add doubles the inaccuracies -Multiplication multiplies the inaccuracies. So multiplication ends up with less inaccurate results than addition does. As a result, if accuracy is more important than performance multiple as much as possible. For example, do x*y+x*z instead of x*(y+z) And as a free gift, here is a function for you int IsEqual(float x,float y,float tolerance){ float difference=x-y; return (difference<=tolerance && difference>=-tolerance); Quote: }
This will return 1 if the 2 numbers are equal within the given tolerance, and 0 if not. So say your work has a tolerance of .01. Using if(IsEqual(x,y,.01)){ /*blah*/ Quote: }
Will do blah if x and y are within .01 of each other, inclusive. Gabe
|
Sun, 27 Jun 2004 17:08:11 GMT |
|
 |
Pai-Yi HSIA #7 / 25
|
 va_ macros
Quote:
> >> i tried some uses of va_arg but i can't figure out how i can get a > >> double arg correctly : all double or float args are 'converted' like > >> this: 6.45 -> 6.4499897569845646542164848... > >> 124.1 -> 124.0894874562354415465454... > >> some help would be very nice :) > But doubles should be accurate to more than 5 sig figs for simple > operations. If these are really the numbers encountered Lucas should > post his code so we can take a look at it.
Right. DBL_DIG should be greater or equal to 10. There is some problem in Lucas' code. paiyi
|
Sun, 27 Jun 2004 20:12:51 GMT |
|
 |
Lucas Monte #8 / 25
|
 va_ macros
Quote:
> On 8 Jan, in article
>>>hi, >>>i tried some uses of va_arg but i can't figure out how i can get a >>>double arg correctly : all double or float args are 'converted' like >>>this: 6.45 -> 6.4499897569845646542164848... >>> 124.1 -> 124.0894874562354415465454... >>>some help would be very nice :) >>Looks right to me. Remember, floating point is not precise. > But doubles should be accurate to more than 5 sig figs for simple > operations. If these are really the numbers encountered Lucas should > post his code so we can take a look at it.
in fact, i have to recode the exact printf function (for school). i can get some double exact values but i get wrong values for others; i'm now aware of the double and float pbs to store them in mem but i'd really like to know how the real printf function can print the correct value for all double args we pass to it? there must be a way to get the correct value of doubles even when they loosed precision... this would help me a lot, so thx; --- lucas
|
Mon, 28 Jun 2004 00:48:16 GMT |
|
 |
CBFalcone #9 / 25
|
 va_ macros
Quote:
> > >> i tried some uses of va_arg but i can't figure out how i can get a > > >> double arg correctly : all double or float args are 'converted' like > > >> this: 6.45 -> 6.4499897569845646542164848... > > >> 124.1 -> 124.0894874562354415465454... > > >> some help would be very nice :) > > But doubles should be accurate to more than 5 sig figs for simple > > operations. If these are really the numbers encountered Lucas should > > post his code so we can take a look at it. > Right. > DBL_DIG should be greater or equal to 10. > There is some problem in Lucas' code.
results here: c:\dnld\scratch>type junk.c #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { double d; float f; int i; if (argc < 2) printf("Usage: %s %s\n", argv[0], " number number ..."); else { for (i = 1; i < argc; i++) { if (1 != sscanf(argv[i], "%lf", &d)) return EXIT_FAILURE; f = d; printf("%d: %25.20f, %15.10f %s\n", i, d, (double)f, argv[i]); } } return 0; Quote: }
c:\dnld\scratch>gcc junk.c c:\dnld\scratch>a 6.45 124.1 1: 6.45000000000000017764, 6.4499998093 6.45 2: 124.09999999999999431566, 124.0999984741 124.1 --
Available for consulting/temporary embedded and systems. (Remove "XXXX" from reply address. yahoo works unmodified)
|
Mon, 28 Jun 2004 02:47:27 GMT |
|
 |
Pai-Yi HSIA #10 / 25
|
 va_ macros
Quote:
> > DBL_DIG should be greater or equal to 10. > double d; > float f; > f = d; > printf("%d: %25.20f, %15.10f %s\n", > i, d, (double)f, argv[i]);
I think CBFalconer have given a possible error which lucas makes in the above example. The precision of f is less than that of d. And turning back f to double type can not increase the precision. One question I encountered before under gcc compiler with some optimization level is that while assigning a double variable to a float variable and than assigning the float variable to another double variable, the precision digits for the two double rest the same. It is not logical because we DO NOT expect the assignment from float to double can increase the precision figures. However, a compiler with this kind of behavior is still standard comforming, isn't it? paiyi
|
Mon, 28 Jun 2004 17:13:34 GMT |
|
 |
Lawrence Kir #11 / 25
|
 va_ macros
On Thursday, in article
Quote:
>> > DBL_DIG should be greater or equal to 10. >> double d; >> float f; >> f = d; >> printf("%d: %25.20f, %15.10f %s\n", >> i, d, (double)f, argv[i]); >I think CBFalconer have given a possible error which lucas makes in the >above example. >The precision of f is less than that of d. >And turning back f to double type can not increase the precision.
It does increase the precision used to represent the value (assuming double has greater precision than float), but not the accuracy of the value. Quote: >One question I encountered before under gcc compiler with >some optimization level is that > while assigning a double variable to a float variable and than > assigning the float variable to another double variable, the > precision digits for the two double rest the same.
This is a confusing use of the term "precision". Quote: >It is not logical because we DO NOT expect the assignment from >float to double can increase the precision figures.
The issue here seems to be whether the assignment from double to float reduces the precision. Quote: >However, a compiler with this kind of behavior is still standard >comforming, isn't it?
No, a value stored in an object must be stored at the correct precision for that object. Intermediate floating point values in expressions except the results of casts can be represented at higher precision than the underlying type indicates. However implementing this correctly is expensive on architectures like x86 FPU and many compilers that target these architectures don't conform except possibly with some extra options. gcc has -ffloat-store but I'm not sure if that addresses the issue fully or not. -- -----------------------------------------
-----------------------------------------
|
Mon, 28 Jun 2004 20:00:19 GMT |
|
 |
Pai-Yi HSIA #12 / 25
|
 va_ macros
Quote:
> >One question I encountered before under gcc compiler with > >some optimization level is that > > while assigning a double variable to a float variable and than > > assigning the float variable to another double variable, the > > precision digits for the two double rest the same. > >However, a compiler with this kind of behavior is still standard > >comforming, isn't it? > No, a value stored in an object must be stored at the correct precision > for that object. Intermediate floating point values in expressions > except the results of casts can be represented at higher precision than > the underlying type indicates.
Suppose the mantissa part of the bit representation of some real number is 'abcdefghijkl' under double precesion and 'abcde' under single precision. (Here the letters a,b,...,and l represent 0 or 1) While assigning the float object to an double object, is the compiler required to "fill" up the extra bit part of the mantissa with '0'? If no, the implementation is free to add any bit pattern after 'abcde', which includes 'fghijkl'. Isn't it ? paiyi Quote: > However implementing this correctly > is expensive on architectures like x86 FPU and many compilers that > target these architectures don't conform except possibly with some > extra options. gcc has -ffloat-store but I'm not sure if that addresses > the issue fully or not.
|
Tue, 29 Jun 2004 03:00:44 GMT |
|
 |
Stephen Montgomery-Smit #13 / 25
|
 va_ macros
Quote:
> Some other useful stuff about floats: > -The inaccuracy is additive. If you add 2 floats, the inaccuracy of > the sum can be anywhere from 0 to the sum of the 2 previous > inaccuracies. Always assume worsecase, so every add doubles the > inaccuracies > -Multiplication multiplies the inaccuracies. So multiplication ends > up with less inaccurate results than addition does. As a result, if > accuracy is more important than performance multiple as much as > possible. For example, do x*y+x*z instead of x*(y+z)
Actually multiplication (and also division) adds the "relative errors". Example x = 2 with error 0.01, so relative error is 0.01/2 = 0.005 (some people might say 0.5% error). y = 20 with error 0.01, so relative error is 0.01/20 = 0.0005. x+y = 22 has error 0.02. x*y = 40 has relative error 0.0055, that is, the error is 40*0.0055 = 0.22. Caveats: 1. The formula for adding relative errors is really an approximation, but works with reasonable tolerance as long as the relative errors are much less than 1. 2. The act of performing the floating point arithmetic has the potential to add a small amount of extra error. Also, I don't see how x*y+x*z should be any more accurate than x*(y+z). -- Stephen Montgomery-Smith
http://www.math.missouri.edu/~stephen
|
Tue, 29 Jun 2004 05:38:00 GMT |
|
 |
Eric Smit #14 / 25
|
 va_ macros
Quote:
> Suppose the mantissa part of the bit representation of some real number > is 'abcdefghijkl' under double precesion and 'abcde' under single > precision. (Here the letters a,b,...,and l represent 0 or 1) > While assigning the float object to an double object, is the compiler > required to "fill" up the extra bit part of the mantissa with '0'? > If no, the implementation is free to add any bit pattern after 'abcde', > which includes 'fghijkl'. Isn't it ?
No. The compiler is not required to fill it up with zeros, nor is it allowed to add "any bit pattern". It has to produce the *correct* bit pattern to represent the double value that corresponds to the original float. Even if some real number R is represented as "abcde" in single precision and "abcdefghijkl" in double precision, there's no guarantee that the double precision "abcde0000000" represents a real number anywhere close to R. In fact, "abcde0000000" could be a trap representation, and not correspond to any representable real number. Also, it is entirely possible that the double representation does NOT contain any storage units with the same values as the float representation. In other words, if the single precision representation of some real number is "abcde", the double precision representation can be "fghijklmnopq". In this case, there's no guarantee that a single precision "fghij" represents anything; it may be a trap representation.
|
Tue, 29 Jun 2004 05:39:59 GMT |
|
 |
Gabriel Secha #15 / 25
|
 va_ macros
Quote:
> > -Multiplication multiplies the inaccuracies. So multiplication ends > > up with less inaccurate results than addition does. > Actually multiplication (and also division) adds the "relative errors". > Example
Doh. Right. Still, mutiplication has less error than addition which was the point I wanted to make, but you're right it isnt multiplicitive. Quote: > Caveats: > 1. The formula for adding relative errors is really an approximation, > but works with reasonable tolerance as long as the relative errors are > much less than 1. > 2. The act of performing the floating point arithmetic has the potential > to add a small amount of extra error. > Also, I don't see how x*y+x*z should be any more accurate than x*(y+z).
I'm not sure the math behind it, but in several engineering classes in college I was told to do my calculations multiplication first, addition second. It has to do with the percent errors and the fact that multiplying small errors gives a result with less error than summing the 2 numbers would. But I do remember one prof proving it. Gabe
|
Tue, 29 Jun 2004 10:57:25 GMT |
|
|
Page 1 of 2
|
[ 25 post ] |
|
Go to page:
[1]
[2] |
|