Floating Point arithmetic problem

Quote:

>This addition usually results in a few insignificant digits getting

>accumulated in my double variable. For eg.

> f = 12.99 is represented as 12.9900000000000002131628207

> f = 11.111 is represented as 11.1110000000000006536993169

>and so on. I am interested in comparisons with 4 digits precision.

>Now coming to the problem. The insignificant digits due to the

>floating point representation keep accruing and there comes a stage

>when the accrued value exceeds 0.0001 which results in failure of the

>if condition in the above block of code, when ideally no such thing

>should have occurred.

You are running into a fundamental problem with floating point

numbers. Assume that your values have a magnitude of 1e4 (which you

say is the limit). Assume further that 8-byte IEEE-754 arithmetic is

used. With approximately 17 digits of precision, the lsb is

approximately 1e-13. If you add 1e6 numbers and the contributed

rounding error from each addition is 1/2 lsb (a hand-waving, but

probably reasonable sort of assumption), the accumulated error would

be on the order of 5e-8. That is better than your tolerance of 1e-4.

The retained precision also depends on the order in which the numbers

are added. The above analysis assumes that the accumulated value

stays less than 1e4. If, instead, you added 5e5 positive values of

approximately 1e4, THEN 5e5 negative values of opposite value which

result in a theoretical sum near zero, the accumulated value will

temporarily go to approximately 5e9, where the typical 1/2 lsb error

would then be about 5e-9. Multiply that by 1e6 and get 5e-3, which is

greater than you tolerance.

Quote:

>All I need is a solution that will overcome this problem. Please

bear

>in mind that the loop is executed millions of times and hence any

>costly operation within the loop with drastically bring down

>performance.

Avoid accumulating large values which will eventually cancel. See if

your implementation supports a long double type with greater

precision than double.

Quote:

>I have a few options to overcome this problem :-

>* After each addition, covert 'd' to an unsigned long after

> multiplying by say 1e8, (thus truncating the unnecessary digits),

> and divide it by 1e8 to get back the original value.

That would, for arbitrary operands, make matters much worse by adding

additional noise. Your examples, though, suggest that all your values

might be multiples of .001 or .0001. If that is the case, you would

be much better off scaling your values by a multiplier that results in

all values being integers, i.e., multiplying by 10000, before summing

them. This will reduce the noise produced when adding the numbers,

probably enough to give you a perfect solution. Obviously you need to

then divide the final sum by the scaling value to get your intended

sum.

Thad