Threading speed 
Author Message
 Threading speed

Some new results on the threading speeds of various processors:

                                        gcc     sub-
Machine                 Processor       version routine direct  indir.  switch
DECStation 5000/125     R3000 25MHz     2.2.2   19.8s   17.7s   25.8s   46.5s
HP/Apollo 425           68040 25MHz     2.2.2   38.3s+  22.3s   30.1s   63.5s
HP/Apollo 720           HP-PA 50MHz     2.3.2   15.54s  10.99s  15.05s  19.74s
SPARCStation 1          Cypress 20MHz   2.2.2   31.2s   33.2s   47.9s   73.6s
486                     486DX2 50Mhz    2.2.2d  20.3s*  14.4s   14.6s   21.5s

+manually unoptimized to become realistic
*with -fomit-frame-pointer; otherwise: 34.6s

The benchmark consists of a loop that contains nine NEXTs and a
looping instruction (a termination test and a jump back for subroutine
threaded code), i.e. it primarily measures NEXT speed. This loop is
executed 10,000,000 times (resulting in 100,000,000 NEXTs and a bit of
overhead). It fits completely into the caches of the measured
machines.

I posted the code a few months ago. If you are interested, I will mail
it to you.

The numbers are user times measured with the "time" command. The
assembly code generated by the GNU C Compiler was inspected and found
realistic, with one exception: I had to unoptimize the assembly code
for subroutine threading on the 68040, since the compiler allocated
the address of the function "next" to a register.

Thanks to Bernd Paysan for the values on the SPARCStation and the HP 700.
Bernd does not know whether the SPARCStation is a 1 or 2; I guess from
its slowness that it's a SPARCStation 1. Thanks to Franz Puntigam for
the values on the 486.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed



Mon, 05 Jun 1995 22:23:51 GMT  
 Threading speed
:
: The benchmark consists of a loop that contains nine NEXTs and a
: looping instruction (a termination test and a jump back for subroutine
: threaded code), i.e. it primarily measures NEXT speed. This loop is
: executed 10,000,000 times (resulting in 100,000,000 NEXTs and a bit of
: overhead). It fits completely into the caches of the measured
: machines.
:
: I posted the code a few months ago. If you are interested, I will mail
: it to you.
:

I am new to this group and I have not seen the original posting. I'd
be interseded to know what this test measures.

I get the following measurements on my machine:

: DUMMY ;
CODE CDUMMY ret ok
: DD  FOR DUMMY NEXT ; ok
: CC FOR CDUMMY NEXT ;
: TARA  FOR NEXT ; ok
COUNTER 100000000 TARA TIMER 13000 ok
COUNTER 100000000 DD   TIMER 31000 ok
COUNTER 100000000 CC   TIMER 25000 ok

The timer is in miliseconds, the resolution is 1 second.

This is overall time by the (date&time system call) on an unburdened
machine.

The machine: SGI Indigo, R3000, 33 MHz.
Here is how the obect code looks like (via dbx with a break point at b
for easy entering):

' DUMMY b [3] Process  2944 (pf) stopped at [b, :$10000028]
*[b, 0x10000028]        

Quote:
>ua $s0

 [RETRY, 0x100109e4]    addiu   sp,sp,-4                        PP> DUMMY
 [RETRY, 0x100109e8]    addiu   sp,sp,4                         PP> ;
 [RETRY, 0x100109ec]    jr      ra
 [RETRY, 0x100109f0]    lw      ra,4(sp)
 [RETRY, 0x100109f4]    spec05  zero,zero,zero
 [RETRY, 0x100109f8]    b       0x100103ec
 [RETRY, 0x100109fc]    bgezl   s2,0x10021b54
 [RETRY, 0x10010a00]    c3.10   0
 [RETRY, 0x10010a04]    jr      ra                              PP> CODE CDUMMY ret
 [RETRY, 0x10010a08]    lw      ra,4(sp)
 [RETRY, 0x10010a0c]    srl     zero,zero,0
 [RETRY, 0x10010a10]    beq     zero,at,0x10013174
 [RETRY, 0x10010a14]    sll     t0,a0,16
 [RETRY, 0x10010a18]    addiu   sp,sp,-4                        PP> : DD
 [RETRY, 0x10010a1c]    move    ra,s0                           PP> FOR
 [RETRY, 0x10010a20]    addiu   sp,sp,-4
Quote:
>u

 [RETRY, 0x10010a24]    lw      s0,0(s8)
 [RETRY, 0x10010a28]    addiu   s8,s8,4
 [RETRY, 0x10010a2c]    addiu   ra,ra,-1
 [RETRY, 0x10010a30]    sw      ra,4(sp)                        PP> here we branch from NEXT
 [RETRY, 0x10010a34]    jal     RETRY                           PP> DUMMY
 [RETRY, 0x10010a38]    sw      ra,0(sp)
 [RETRY, 0x10010a3c]    nop                                     PP> NEXT Note, that the nop cannot be optimised.
 [RETRY, 0x10010a40]    bne     ra,zero,0x10010a30
 [RETRY, 0x10010a44]    addiu   ra,ra,-1
 [RETRY, 0x10010a48]    lw      ra,8(sp)                        
 [RETRY, 0x10010a4c]    addiu   sp,sp,4
 [RETRY, 0x10010a50]    addiu   sp,sp,4                         PP> ; An optimisation for saving space possible here
 [RETRY, 0x10010a54]    jr      ra
 [RETRY, 0x10010a58]    lw      ra,4(sp)
 [RETRY, 0x10010a5c]    nop
 [RETRY, 0x10010a60]    add     a0,t3,v0

The PP> comments are mine (Penio Penev>)

jal     Jump and link
ra      Return Address - register, where jal stores the return address
s0      Top of Stack
sp      Return stack pointer
at      Temp register

-- Penio



Tue, 06 Jun 1995 08:18:49 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Threading speed

2. Speed..Speed..Speed

3. high-speed software timing and multi-threading

4. Top Speed Muli Thread in MS-DOS box ?

5. oracle thread test - speed results

6. oracle & threads - need speed

7. Speed up with threads

8. Why does using threads not speed up things?

9. Threads creating threads creating threads...

10. thread, threading, mutex modules and non-threading interpreters

11. Perl speed vs. Python speed

12. integer*8 speed vs integer*4 speed

 

 
Powered by phpBB® Forum Software