How to Push Funny Sized Parameters 
Author Message
 How to Push Funny Sized Parameters

I need to generate code that efficiently pushes
value parameters stored in memory subject to
the following constraints:

(1) The number of bytes pushed is always an
      even multiple of four bytes (Win32/efficiency
      requirements).

(2) The program does not touch any memory locations
      other than the actual values being passed as the
      parameter.

(3) The process of passing the data must not (ultimately)
      disturb any register values.

Note: if additional padding bytes are necessary to achieve
constraint (1), above, the extra bytes may contain any
value.

For example, if I have a one-byte variable "b" and a word
variable "w", one way to pass these items is as follows:

// Push a byte:

    push( eax );
    mov( b, al );
    xchg( eax, [esp] );

// Push a word:

    push( eax );
    mov( w, ax );
    xchg( eax, [esp] );

Pushing three bytes is a lot messier.  I'll ask for your
examples rather than post my own (btw, I'm looking
for efficient solutions, not simple "proof of concept"
solutions -- I want to incorporate this code into
the output of the HLA compiler).

The way I see it, there are two general cases:  one
case where the total number of bytes is four or less,
and the second case where there are five or more
bytes to be passed.  In this latter case, the code would
push all the data in groups of four bytes until there are
less than four bytes left to push, then it would use special
purpose code to push the remaining bytes.  This case is
different than the former case because the code can touch
up to four bytes before the last values to push.

Under no circumstances must the program "touch" any memory
locations before or after the data to push.  That could cause
a general protection fault since there is no guarantee that
memory outside the data's bounds is valid for reading.
Thanks,
Randy Hyde



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Quote:

>I need to generate code that efficiently pushes
>value parameters stored in memory subject to
>the following constraints:

>(1) The number of bytes pushed is always an
>      even multiple of four bytes (Win32/efficiency
>      requirements).

>(2) The program does not touch any memory locations
>      other than the actual values being passed as the
>      parameter.

>(3) The process of passing the data must not (ultimately)
>      disturb any register values.

Except flags?

Quote:
>Pushing three bytes is a lot messier.  I'll ask for your
>examples rather than post my own (btw, I'm looking
>for efficient solutions, not simple "proof of concept"
>solutions -- I want to incorporate this code into
>the output of the HLA compiler).

Is there any assumption about minimum processor?  The examples above
seem to imply 386+.

Quote:
>Under no circumstances must the program "touch" any memory
>locations before or after the data to push.  That could cause
>a general protection fault since there is no guarantee that
>memory outside the data's bounds is valid for reading.

I assume that last word should have been "writing?"  Two more perverse
questions:  is it acceptable to ignore the possibility of stack frame
relative (ebp) addressing for passed variables?  Are variables
guarenteed to be memory addresses?  E.g. are explicit #define style
constants not covered?

Ed



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Quote:

> I need to generate code that efficiently pushes
> value parameters stored in memory subject to
> the following constraints:

> (1) The number of bytes pushed is always an
>       even multiple of four bytes (Win32/efficiency
>       requirements).

> (2) The program does not touch any memory locations
>       other than the actual values being passed as the
>       parameter.

Unless this is going to work directly with memory mapped io, you can
safely relax that requirement to:

"The program does not touch any memory locations outside the dword(s)
(or even cache line(s)) that contain the value."

This is important because the compiler will know the actual alignment of
the variable, right?

Quote:
> (3) The process of passing the data must not (ultimately)
>       disturb any register values.

> Note: if additional padding bytes are necessary to achieve
> constraint (1), above, the extra bytes may contain any
> value.

> For example, if I have a one-byte variable "b" and a word
> variable "w", one way to pass these items is as follows:

> // Push a byte:

>     push( eax );
>     mov( b, al );
>     xchg( eax, [esp] );

That is short code, an alternative would be to use MOVZX instead of MOV,
and possibly get rid of the XCHG as well, since XCHG reg,[mem] will
always generate a BUS LOCK, which is very costly:

  push eax
  push eax
  movzx eax, byte ptr [b]
  mov [esp+4],eax
  pop eax

For a block of code pushing multiple parameters, I would use an explicit
SUB ESP,total_size instead:

  sub esp,8
  push eax
  movzx eax, byte ptr [b]
  mov [esp+8], eax
  movzx eax, word ptr [w]
  mov [esp+4], eax
  pop eax

Quote:
> Pushing three bytes is a lot messier.  I'll ask for your
> examples rather than post my own (btw, I'm looking
> for efficient solutions, not simple "proof of concept"
> solutions -- I want to incorporate this code into
> the output of the HLA compiler).

As I noted above, an aligned dword access which contains the needed item
can never generate a fault.

Quote:
> The way I see it, there are two general cases:  one
> case where the total number of bytes is four or less,
> and the second case where there are five or more
> bytes to be passed.  In this latter case, the code would
> push all the data in groups of four bytes until there are
> less than four bytes left to push, then it would use special
> purpose code to push the remaining bytes.  This case is
> different than the former case because the code can touch
> up to four bytes before the last values to push.

> Under no circumstances must the program "touch" any memory
> locations before or after the data to push.  That could cause
> a general protection fault since there is no guarantee that
> memory outside the data's bounds is valid for reading.

Again, this is not correct.

The code generated could be like this:

  allocate stack space, save eax & any other temp regs needed

  for each variable:
    if the variable does not straddle a dword boundary:
      Use MOV directly on the aligned base address, followed by a SHR if
needed.

    if it does span one or more dword boundaries:
      if the base address is aligned:
        loop while pushing dwords

      else
        put least significant aligned dword in eax
        Using both eax and ebx for temps, loop, while alternating eax
and ebx use:
          put the next dword into ebx
          shrd eax,ebx,8*bytes to shift
          mov [esp+current_offset],eax

This would be very close to optimal for the given constraints.

Terje

--

Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Hi Randall,

Quote:
> I need to generate code that efficiently pushes
> value parameters stored in memory subject to
> the following constraints:

    Both of the following examples use 2 dwords of stack space, the highest
of which gets over-written by the variable to be "pushed", and the lower one
would get over-written by the return address if the "push" is followed by a
call instruction.

;--------------------------------------------------
    The following is pretty generic code to push a one to four byte memory
variable onto the stack (padded to a dword).  BTW, I suspect you'll get a
lot of solutions similar to this.

PushMemVar  MACRO  memVar, SizeInBytes
push    ebp
mov     ebp, esp
push    eax
IF (SizeInBytes == 1)
    mov    al, BYTE PTR [memVar]
ELSE IF (SizeInBytes = 2)
    mov    ax, WORD PTR [memVar]
ELSE IF (SizeInBytes == 3)
    mov    ah, BYTE PTR [memVar + 2]
    bswap    eax
    mov    ax, WORD PTR [memVar]
ELSE IF (SizeInBytes == 4)
    mov    eax, DWORD PTR [memVar]
ENDIF
xchg    eax, [ebp]
mov     ebp, eax
pop     eax
ENDM

;---------------------------------------
    The following will only work if 12 (or more) bytes of stack space are
required, and will be efficient *if* the amount of work required to get the
data on the stack justifies the overhead.  ie.  where the user *has* to pass
that 1000000 byte array by value rather than by reference. :-/

'VarSize' is the amount of stack space (including padding) required for the
variables to be pushed.
[ebp - VarSize + 4] is the address of the lowest byte of the reserved stack
space, and
[ebp + 4] is the address of the highest byte of the reserved stack space.
push    ebp
mov     ebp, esp
mov     [ebp - VarSize + 4], eax
mov     eax, [ebp]
sub     esp, VarSize
mov     [ebp - VarSize], eax
    At this point eax has been "freed", and can be used to move the memory
variables, except for the last one (ie. the one to be stored at the lowest
address of the "reserved" space), to the appropriate locations.
    Use the "IF / ELSE" construct from the macro above to move the last
dword into eax, and the following code will store it (at the lowest
"reserved" address) and restore the registers (eax and ebp).
xchg    eax, [ebp - VarSize + 4]
pop     ebp

hope this helps,
-Brent
doomsday AT optusnet DOT com DOT au



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Quote:

>I need to generate code that efficiently pushes
>value parameters stored in memory subject to
>the following constraints:

>(1) The number of bytes pushed is always an
>      even multiple of four bytes (Win32/efficiency
>      requirements).

>(2) The program does not touch any memory locations
>      other than the actual values being passed as the
>      parameter.

>(3) The process of passing the data must not (ultimately)
>      disturb any register values.

Except flags?

Quote:
>Pushing three bytes is a lot messier.  I'll ask for your
>examples rather than post my own (btw, I'm looking
>for efficient solutions, not simple "proof of concept"
>solutions -- I want to incorporate this code into
>the output of the HLA compiler).

Is there any assumption about minimum processor?  The examples above
seem to imply 386+.

Quote:
>Under no circumstances must the program "touch" any memory
>locations before or after the data to push.  That could cause
>a general protection fault since there is no guarantee that
>memory outside the data's bounds is valid for reading.

I assume that last word should have been "writing?"  Two more perverse
questions:  is it acceptable to ignore the possibility of stack frame
relative (ebp) addressing for passed variables?  Are variables
guarenteed to be memory addresses?  E.g. are explicit #define style
constants not covered?

Ed



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Quote:

> I need to generate code that efficiently pushes
> value parameters stored in memory subject to
> the following constraints:

> (1) The number of bytes pushed is always an
>       even multiple of four bytes (Win32/efficiency
>       requirements).

> (2) The program does not touch any memory locations
>       other than the actual values being passed as the
>       parameter.

Unless this is going to work directly with memory mapped io, you can
safely relax that requirement to:

"The program does not touch any memory locations outside the dword(s)
(or even cache line(s)) that contain the value."

This is important because the compiler will know the actual alignment of
the variable, right?

Quote:
> (3) The process of passing the data must not (ultimately)
>       disturb any register values.

> Note: if additional padding bytes are necessary to achieve
> constraint (1), above, the extra bytes may contain any
> value.

> For example, if I have a one-byte variable "b" and a word
> variable "w", one way to pass these items is as follows:

> // Push a byte:

>     push( eax );
>     mov( b, al );
>     xchg( eax, [esp] );

That is short code, an alternative would be to use MOVZX instead of MOV,
and possibly get rid of the XCHG as well, since XCHG reg,[mem] will
always generate a BUS LOCK, which is very costly:

  push eax
  push eax
  movzx eax, byte ptr [b]
  mov [esp+4],eax
  pop eax

For a block of code pushing multiple parameters, I would use an explicit
SUB ESP,total_size instead:

  sub esp,8
  push eax
  movzx eax, byte ptr [b]
  mov [esp+8], eax
  movzx eax, word ptr [w]
  mov [esp+4], eax
  pop eax

Quote:
> Pushing three bytes is a lot messier.  I'll ask for your
> examples rather than post my own (btw, I'm looking
> for efficient solutions, not simple "proof of concept"
> solutions -- I want to incorporate this code into
> the output of the HLA compiler).

As I noted above, an aligned dword access which contains the needed item
can never generate a fault.

Quote:
> The way I see it, there are two general cases:  one
> case where the total number of bytes is four or less,
> and the second case where there are five or more
> bytes to be passed.  In this latter case, the code would
> push all the data in groups of four bytes until there are
> less than four bytes left to push, then it would use special
> purpose code to push the remaining bytes.  This case is
> different than the former case because the code can touch
> up to four bytes before the last values to push.

> Under no circumstances must the program "touch" any memory
> locations before or after the data to push.  That could cause
> a general protection fault since there is no guarantee that
> memory outside the data's bounds is valid for reading.

Again, this is not correct.

The code generated could be like this:

  allocate stack space, save eax & any other temp regs needed

  for each variable:
    if the variable does not straddle a dword boundary:
      Use MOV directly on the aligned base address, followed by a SHR if
needed.

    if it does span one or more dword boundaries:
      if the base address is aligned:
        loop while pushing dwords

      else
        put least significant aligned dword in eax
        Using both eax and ebx for temps, loop, while alternating eax
and ebx use:
          put the next dword into ebx
          shrd eax,ebx,8*bytes to shift
          mov [esp+current_offset],eax

This would be very close to optimal for the given constraints.

Terje

--

Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Hi Randall,

Quote:
> I need to generate code that efficiently pushes
> value parameters stored in memory subject to
> the following constraints:

    Both of the following examples use 2 dwords of stack space, the highest
of which gets over-written by the variable to be "pushed", and the lower one
would get over-written by the return address if the "push" is followed by a
call instruction.

;--------------------------------------------------
    The following is pretty generic code to push a one to four byte memory
variable onto the stack (padded to a dword).  BTW, I suspect you'll get a
lot of solutions similar to this.

PushMemVar  MACRO  memVar, SizeInBytes
push    ebp
mov     ebp, esp
push    eax
IF (SizeInBytes == 1)
    mov    al, BYTE PTR [memVar]
ELSE IF (SizeInBytes = 2)
    mov    ax, WORD PTR [memVar]
ELSE IF (SizeInBytes == 3)
    mov    ah, BYTE PTR [memVar + 2]
    bswap    eax
    mov    ax, WORD PTR [memVar]
ELSE IF (SizeInBytes == 4)
    mov    eax, DWORD PTR [memVar]
ENDIF
xchg    eax, [ebp]
mov     ebp, eax
pop     eax
ENDM

;---------------------------------------
    The following will only work if 12 (or more) bytes of stack space are
required, and will be efficient *if* the amount of work required to get the
data on the stack justifies the overhead.  ie.  where the user *has* to pass
that 1000000 byte array by value rather than by reference. :-/

'VarSize' is the amount of stack space (including padding) required for the
variables to be pushed.
[ebp - VarSize + 4] is the address of the lowest byte of the reserved stack
space, and
[ebp + 4] is the address of the highest byte of the reserved stack space.
push    ebp
mov     ebp, esp
mov     [ebp - VarSize + 4], eax
mov     eax, [ebp]
sub     esp, VarSize
mov     [ebp - VarSize], eax
    At this point eax has been "freed", and can be used to move the memory
variables, except for the last one (ie. the one to be stored at the lowest
address of the "reserved" space), to the appropriate locations.
    Use the "IF / ELSE" construct from the macro above to move the last
dword into eax, and the following code will store it (at the lowest
"reserved" address) and restore the registers (eax and ebp).
xchg    eax, [ebp - VarSize + 4]
pop     ebp

hope this helps,
-Brent
doomsday AT optusnet DOT com DOT au



Mon, 27 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters
IMHO, you already have the optimal solutions for 1-byte and 2-byte
parameters. As you say, n-byte parameters can be broken down to multiple
4-byte blocks and an optional 1-, 2- or 3-byte block. My problem with your
approach is that it rarely, if ever, generates optimal ASM code for
Pentium-class processors. Unless HLA has a post-processor that generates
properly-interleaved code -  any commercial C/C++ compiler is likely to
generate faster code, which sort of defeats the reason for using ASM in the
first place.

For the record, my implementation of the push operations would be:

1-byte:
    push    eax
    mov     al,byte ptr mem
    xchg    eax,[esp]                        ; non-pairable; asserts LOCK
signal!

2-byte:
    push    word ptr mem                ; operand-size prefix!

3-byte:
    push    eax
    mov     ah,byte ptr mem+2

    shl     eax,8                                     ; 386, Pentium+
    bswap   eax                                    ; 486 (not pairable)

    mov     ax,word ptr mem
    xchg    eax,[esp]

As I said above, I would try to interleeave this code with other operations
in order to get better performance on Pentium-class machines.

Daniel Pfeffer


Quote:
> I need to generate code that efficiently pushes
> value parameters stored in memory subject to
> the following constraints:

> (1) The number of bytes pushed is always an
>       even multiple of four bytes (Win32/efficiency
>       requirements).

> (2) The program does not touch any memory locations
>       other than the actual values being passed as the
>       parameter.

> (3) The process of passing the data must not (ultimately)
>       disturb any register values.

> Note: if additional padding bytes are necessary to achieve
> constraint (1), above, the extra bytes may contain any
> value.

> For example, if I have a one-byte variable "b" and a word
> variable "w", one way to pass these items is as follows:

> // Push a byte:

>     push( eax );
>     mov( b, al );
>     xchg( eax, [esp] );

> // Push a word:

>     push( eax );
>     mov( w, ax );
>     xchg( eax, [esp] );

> Pushing three bytes is a lot messier.  I'll ask for your
> examples rather than post my own (btw, I'm looking
> for efficient solutions, not simple "proof of concept"
> solutions -- I want to incorporate this code into
> the output of the HLA compiler).

> The way I see it, there are two general cases:  one
> case where the total number of bytes is four or less,
> and the second case where there are five or more
> bytes to be passed.  In this latter case, the code would
> push all the data in groups of four bytes until there are
> less than four bytes left to push, then it would use special
> purpose code to push the remaining bytes.  This case is
> different than the former case because the code can touch
> up to four bytes before the last values to push.

> Under no circumstances must the program "touch" any memory
> locations before or after the data to push.  That could cause
> a general protection fault since there is no guarantee that
> memory outside the data's bounds is valid for reading.
> Thanks,
> Randy Hyde



Tue, 28 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters

Quote:

> Unless this is going to work directly with memory mapped io, you can
> safely relax that requirement to:

> "The program does not touch any memory locations outside the dword(s)
> (or even cache line(s)) that contain the value."

> This is important because the compiler will know the actual alignment of
> the variable, right?

Currently, no it does not.  HLA emits MASM code.  For static variables
it refers to the object by Name and has no idea what it's alignment will
be in memory.

Quote:

>   push eax
>   push eax
>   movzx eax, byte ptr [b]
>   mov [esp+4],eax
>   pop eax

Since there is no requirement that the H.O. bytes be zero, wouldn't
"mov al, byte ptr [bx]" and "mov [esp+4], al" be even better?

Quote:

> For a block of code pushing multiple parameters, I would use an explicit
> SUB ESP,total_size instead:

>   sub esp,8
>   push eax
>   movzx eax, byte ptr [b]
>   mov [esp+8], eax
>   movzx eax, word ptr [w]
>   mov [esp+4], eax
>   pop eax

Certainly above some limit (yet to be determined by experiementation,
the stack allocation scheme is better.  At some point it is probably better
to use MOVSD, as well (haven't thought about it in years, so I'm not
sure where the break-even point it; I will have to figure this out, though.

Quote:
> As I noted above, an aligned dword access which contains the needed item
> can never generate a fault.

Since I don't have control over the variables (HLA is assembly, after all,
the programmer can do lots of ugly things to trip me up), I'm not at all
sure this is true.

E.g., consider the following:

type
    pt3d:
        record
            x:byte;
            y:byte;
            z:byte;
        endrecord;

Now suppose that someone has dynamically allocated an array of
pt3d records, perhaps as part of some other structure, so that
x, y, and z occupy bytes 4093, 4094, and 4095 on some page in
memory.  Let's also assume that the next page is unreadable
and will generate a fault.  Granted, this is very unlikely and takes
some work to set it up, but I can assure you that with appropriate
linker options I can cause this to happen.  Now suppose I do
something like the following:

    lea( ebx, pt3dVarAtEndOfPage); // Or some other code that computes this.
    passByVal( (type pt3d [ebx]) );

I can assure you that with the appropriate code (including
reading data from the user), the compiler will not be able
to determine the alignment of the variable statically.
So unless I'm dense and I'm missing something about Win32
memory organization or the 80x86, I'm not sure I can make
any assumptions about data alignment.  Even if I could, an
assembly language programmer could easily do some address
arithmetic that would invalidate my assumptions.

Quote:
> > Under no circumstances must the program "touch" any memory
> > locations before or after the data to push.  That could cause
> > a general protection fault since there is no guarantee that
> > memory outside the data's bounds is valid for reading.

> Again, this is not correct.

> The code generated could be like this:

>   allocate stack space, save eax & any other temp regs needed

>   for each variable:
>     if the variable does not straddle a dword boundary:
>       Use MOV directly on the aligned base address, followed by a SHR if
> needed.

>     if it does span one or more dword boundaries:
>       if the base address is aligned:
>         loop while pushing dwords

>       else
>         put least significant aligned dword in eax
>         Using both eax and ebx for temps, loop, while alternating eax
> and ebx use:
>           put the next dword into ebx
>           shrd eax,ebx,8*bytes to shift
>           mov [esp+current_offset],eax

> This would be very close to optimal for the given constraints.

> Terje

I think my big problem is that I cannot, at compile time, determine
if the variable is aligned.  So I guess we have to add one more
constraint.

Most likely the solution will look like:

(1) Allocate storage on the stack.
(2) push temporary registers.
(3) Move dwords until less than one dword available.
(4) move remaining bytes into a register and move
      the register into stack storage.

E.g.,

    sub( 4, esp );
    push( eax );
    mov( (type word pt3dVar), ax );
    mov( ax, [esp+4] );
    mov( (type byte pt3dVar), al );
    mov( al, [esp+6] );
    pop( eax );

I was hoping for something slightly more efficient (this was "my"
code I had alluded to in the original post), but perhaps this
is just going to be disgusting.

Oh well, I guess I can always tell HLA programmers to make sure
they choose their variable sizes appropriately.  After all, it is
assembly language and they need to take responsibility
for something.
Randy Hyde



Tue, 28 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters



Quote:
> IMHO, you already have the optimal solutions for 1-byte and 2-byte
> parameters.

Well, according to Terje, the XCHG is not optimal... Oh well.

Quote:
> As you say, n-byte parameters can be broken down to multiple
> 4-byte blocks and an optional 1-, 2- or 3-byte block. My problem with your
> approach is that it rarely, if ever, generates optimal ASM code for
> Pentium-class processors. Unless HLA has a post-processor that generates
> properly-interleaved code -  any commercial C/C++ compiler is likely to
> generate faster code, which sort of defeats the reason for using ASM in
the
> first place.

Passing parameters in HLA is a convenience for the programmer.  For time
critical code (and I leave that up to the programmer to determine when
it is necessary) the HLA programmer has two options:

(1) Use explicit PUSH and CALL instructions, e.g.,

        push( (type dword ThreeByteObject), eax );
        call SubWithThreeByteParm;

(2) Replace the parameter with a code sequence (good if there are many
parameters):

    SubWithThreeByteParm( code( push( (type dword ThreeByteParm))));

HLA is currently a prototype compile that generates MASM assembly output.
The roadmap I have planned out has a true (object code generation) compiler
for version 2.  Data flow analysis (necessary for optimization) is planned
for
version three.  Therefore, I will have to rely upon the end programmer for
producing optimal code for quite some time yet to come.  Fortunately,
HLA is assembly language and the programmer can ignore all the nice
HLL features when performance dictates.

Quote:

> For the record, my implementation of the push operations would be:

> 1-byte:
>     push    eax
>     mov     al,byte ptr mem
>     xchg    eax,[esp]                        ; non-pairable; asserts LOCK
> signal!

Terje recommends:

    push eax
    push eax
    mov al, byte ptr mem
    mov [esp+4], al
    pop eax

I'd be interested in seeing the timing differences.

Quote:

> 2-byte:
>     push    word ptr mem                ; operand-size prefix!

Does this not only push two bytes.
Actually, HLA currently generates:

    push 0
    push word ptr mem

which is more efficient than the XCHG version.

Quote:

> 3-byte:
>     push    eax
>     mov     ah,byte ptr mem+2

>     shl     eax,8                                     ; 386, Pentium+
>     bswap   eax                                    ; 486 (not pairable)

>     mov     ax,word ptr mem
>     xchg    eax,[esp]

I'm assuming you mean shl -or- bswap, not the sequence of the two (?)
Is bswap better than shl?  I seem to recall it was pretty ugly (though I'm
not up to speed on cycle timings for modern processors.

Quote:

> As I said above, I would try to interleeave this code with other
operations
> in order to get better performance on Pentium-class machines.

> Daniel Pfeffer

I wonder if you have any suggestions for code emission if the
four bytes immediately before the code are known to be readable
e.g., pushing seven bytes, we can treat the last three as part
of a four byte sequence as follows:

    push eax
    push dword ptr mem
    mov eax, mem+3
    xchg eax, [esp+3]        ;Ignore lock issue for now.

Clearly there is a misalignment problem here,
but I wonder if some trick like this could, ultimately,
be most efficient.

On a different subject, I was putting together a lab
for my class next quarter to demonstrate the cost
of misaligned data.  I was sequencing through
1,000,000 dword values backwards (to waste
the cache) as well as accessing a single dword
variable up{*filter*} million times (about 500,000,000
to be exact).  Granted, multitasking and other issues
are present, but I found a very small difference in
execution time between aligned and misaligned
access (i.e., 26 seconds vs. 33 seconds on a
266 MHz PII).  I expected this to be a little greater.
Does the PII cache mechanism do something
to overcome problems with misaligned data?
Randy Hyde



Wed, 29 May 2002 03:00:00 GMT  
 How to Push Funny Sized Parameters



Quote:
>to be exact).  Granted, multitasking and other issues
>are present, but I found a very small difference in
>execution time between aligned and misaligned
>access (i.e., 26 seconds vs. 33 seconds on a
>266 MHz PII).  I expected this to be a little greater.
>Does the PII cache mechanism do something
>to overcome problems with misaligned data?

        It is my understanding that the PII only suffers
from misalignment penalties if the data crosses a cache
line boundary.  Cache lines are 32 bytes wide.  So you
could try reading one dword per cache line several
thousand times and then one dword straddling 2 cache
lines.

                        John Stewart



Wed, 29 May 2002 03:00:00 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. Pushing parameter field

2. Pushing Eight Byte Parameters

3. (OT) This just gets funnier and funnier

4. module path:funny bug, funny solution

5. stack size parameter?

6. Variable sized parameter lists.

7. FORTH Kernel Design -- Parameter Stack Size?

8. DPMI parameter block size ???

9. allow size of parameter array to be inferred?

10. Size of array parameter in f2008

11. Passing a parameter Rexx to JCL and return a parameter to JCL

12. VHDL to Verilog - parameter which is log2 of another parameter

 

 
Powered by phpBB® Forum Software