
Style Question (slightly O.T.)
Quote:
[snip]
> Eli - I agree that it's a religious issue. I also agree that
> for (i=0; i<n; i++)
> is simpler by convention to discern that the loop executes n times.
OK, this is based on really ancient stuff (C-64 6502 assembler). I always
figured that i<n would get compiled to something like:
LDA i
CMP n
BNZ address
Essentially, the CMP instruction performed a subtract (i-n) without actually
storing the result--it just set flags indicating the characteristics of the
result, and one of them was the Z (negative) flag.
Now, i<=n-1 would have to do something like:
LDA n
DEC
STA temp ; or maybe TAX
LDA i
CMP temp ; or maybe CMP x, if such an instruction was available.
BEQ address
BNZ address
this would lead to much more bloated and slower code. Now, let me say that
I never actually had a C compiler for my C-64, and often programmed it in
assembler directly, as that was the only way to get any speed out of it at
all. But I always like to remind myself that the C is getting translated
into machine code somehow, and even if the compiler is intelligent enough to
optimize i<=n-1 into i<n, why bet on it?
So, I got curious just now and decided to generate a COD listing using MSVC
6.0 and the following function:
int test_function(int arg)
{
int i;
int n=arg;
int result1=0;
int result2=0;
if (arg<2)
{
return 0;
}
for (i=0;i<=arg-1;i++)
{
result2+=i;
}
for (i=0;i<arg;i++)
{
result1+=i;
}
return result1+result2;
Quote:
}
The assembly listing is this:
PUBLIC _test_function
; COMDAT _test_function
_TEXT SEGMENT
_arg$ = 8
_test_function PROC NEAR ; COMDAT
; 257 : int i;
; 258 : int n=arg;
; 259 : int result1=0;
; 260 : int result2=0;
; 261 :
; 262 : if (arg<2)
mov edx, DWORD PTR _arg$[esp-4]
push esi
push edi
xor esi, esi
xor edi, edi
cmp edx, 2
jge SHORT $L2048
pop edi
; 263 : {
; 264 : return 0;
xor eax, eax
pop esi
; 278 : }
ret 0
$L2048:
; 265 : }
; 266 :
; 267 : for (i=0;i<=arg-1;i++)
lea ecx, DWORD PTR [edx-1]
xor eax, eax
test ecx, ecx
jl SHORT $L2051
$L2049:
; 268 : {
; 269 : result2+=i;
add edi, eax
inc eax
cmp eax, ecx
jle SHORT $L2049
$L2051:
; 270 : }
; 271 :
; 272 : for (i=0;i<arg;i++)
xor eax, eax
test edx, edx
jle SHORT $L2054
$L2052:
; 273 : {
; 274 : result1+=i;
add esi, eax
inc eax
cmp eax, edx
jl SHORT $L2052
$L2054:
; 275 : }
; 276 :
; 277 : return result1+result2;
lea eax, DWORD PTR [edi+esi]
pop edi
pop esi
; 278 : }
ret 0
_test_function ENDP
_TEXT ENDS
Now, x86 assembly is a LOT more complicated than 6502 was, which is why I
don't program it directly. However, if you study this for a few minutes,
what you see is:
the <= loop performs one subtraction at the beginning and uses a jle.
The < loop performs no subtraction at the beginning and uses a jl.
Now, the subtraction adds an instruction which makes the code bigger, so
that's an obvious disadvantage. The size of the loop code is the same.
More compelling is whether or not there is a performance difference between
jl and jle, because they will be executed many times during the loop.
Eliminating the extra setup code should be enough to convince anybody that
the < loop is better than the <= loop. I'd be curious to hear what some
real x86 hackers have to say about the performance of jl vs. jle.
As it stands, it looks like the usual C way of doing things is better, at
least for this particular compiler.
--Steve
[snip]