ICM vs. L and LTR 
Author Message
 ICM vs. L and LTR

I've noticed lately that some recent IBM code uses
the sequence:

    L     Rx,AREA
    LTR   Rx,Rx
    BNZ   SOMEWHERE

and I was wondering if

    ICM   Rx,15,AREA
    BNZ   SOMEWHERE

would be faster?  It looks to me like the two code segments
would produce identical results (ignoring for the moment that
the LOAD instruction could take an index register, and that
ICM wasn't available on the 360, and that LOAD on a 360 could
cause a specification exception).

I'm sure that LTR is very fast, but is L + LTR faster than
ICM?  What, if any, advantages does the L + LTR sequence offer
above ICM?

Thanks.
--

IBM Systems Programmer          | UUCP:      ...uunet!ingr!b30!dwayneb!dwayne
Intergraph Corp., M.S. GD3002   | Voice:     (205) 730-3795
Huntsville, AL  35894-0001      | FAX:       (205) 730-3300



Mon, 30 Jan 1995 00:37:57 GMT  
 ICM vs. L and LTR

Quote:
>I've noticed lately that some recent IBM code uses
>the sequence:
>    L     Rx,AREA
>    LTR   Rx,Rx
>    BNZ   SOMEWHERE
>and I was wondering if
>    ICM   Rx,15,AREA
>    BNZ   SOMEWHERE
>would be faster?  It looks to me like the two code segments
>would produce identical results (ignoring for the moment that
>the LOAD instruction could take an index register, and that
>ICM wasn't available on the 360, and that LOAD on a 360 could
>cause a specification exception).

Sounds right.

Quote:
>I'm sure that LTR is very fast, but is L + LTR faster than
>ICM?  What, if any, advantages does the L + LTR sequence offer
>above ICM?

In a pipelined implementation, unless ICM 15 is implemented to be equivalent to
L/LTR (which would require decoding ICM 15 differently than the general case of
ICM by skipping masking of the loaded result before a compare), which I find
unlikely, it would seem reasonable that the L/LTR would execute faster because
the LTR would execute immediately after the result of the L is available (via
bypass circuitry).  In defense of ICM 15: it does save two bytes :-).

In support of my interpretation... there was a thread recently in comp.arch on
claims that use of RISC-y ISA subsets on CISC processors can trade somewhat
increased code space for much higher performance.

--
John R. Grout
University of Illinois, Urbana-Champaign
Center for Supercomputing Research and Development




Mon, 30 Jan 1995 03:06:10 GMT  
 ICM vs. L and LTR

Quote:
>>    L     Rx,AREA
>>    LTR   Rx,Rx
>>    BNZ   SOMEWHERE
>>and I was wondering if
>>    ICM   Rx,15,AREA
>>    BNZ   SOMEWHERE

>>would be faster?  It looks to me like the two code segments

Sorry, I can't supply references, but I read an article recently that
discussed the very issue and several related myths.  According to the
article the L, LTR was actually faster in older (much older) machines,
but in since like the 370/168 the ICM is actually faster.  If it's really
important I could see if I could dig up the article.

--Kevin

+=====================================+=====================================+

+-------------------------------------+-------------------------------------+
| No Affiliations, No Disclaimers, No Apologies                             |



Mon, 30 Jan 1995 12:28:52 GMT  
 ICM vs. L and LTR

Quote:
>>    L     Rx,AREA
>>    LTR   Rx,Rx
>>    BNZ   SOMEWHERE
>>and I was wondering if
>>    ICM   Rx,15,AREA
>>    BNZ   SOMEWHERE

>>would be faster?  It looks to me like the two code segments

Sorry, I can't supply references, but I read an article recently that
discussed the very issue and several related myths.  According to the
article the L, LTR was actually faster in older (much older) machines,
but in since like the 370/168 the ICM is actually faster.  If it's really
important I could see if I could dig up the article.

--Kevin

+=====================================+=====================================+

+-------------------------------------+-------------------------------------+
| No Affiliations, No Disclaimers, No Apologies                             |



Mon, 30 Jan 1995 12:28:52 GMT  
 ICM vs. L and LTR

I tested this on a 3084Q once: an empty BCT loop takes 100ns, ICM 15,15,0
takes 100ns, and L 15,0; LTR 15,15 takes 74ns. I didn't try to test any
of the subtle pipelining effects so these figures should be taken with a
pinch of salt.



Mon, 30 Jan 1995 19:25:18 GMT  
 ICM vs. L and LTR
I've noticed lately that some recent IBM code uses
the sequence:

    L     Rx,AREA
    LTR   Rx,Rx
    BNZ   SOMEWHERE

and I was wondering if

    ICM   Rx,15,AREA
    BNZ   SOMEWHERE

would be faster?  It looks to me like the two code segments
would produce identical results (ignoring for the moment that
the LOAD instruction could take an index register, and that
ICM wasn't available on the 360, and that LOAD on a 360 could
cause a specification exception).

I'm sure that LTR is very fast, but is L + LTR faster than
ICM?  What, if any, advantages does the L + LTR sequence offer
above ICM?

Thanks.
--

IBM Systems Programmer          | UUCP:      ...uunet!ingr!b30!dwayneb!dwayne
Intergraph Corp., M.S. GD3002   | Voice:     (205) 730-3795
Huntsville, AL  35894-0001      | FAX:       (205) 730-3300



Mon, 30 Jan 1995 00:37:57 GMT  
 ICM vs. L and LTR

Quote:
>I've noticed lately that some recent IBM code uses
>the sequence:

>    L     Rx,AREA
>    LTR   Rx,Rx
>    BNZ   SOMEWHERE

>and I was wondering if

>    ICM   Rx,15,AREA
>    BNZ   SOMEWHERE

>would be faster?

A paper in the CME Newsletter #88 entitled "Measuring 370 Instructions"
(by Ed Stewart) talks about the author's effort to measure how fast
individual instructions are on a 3081K CPU.  Pipelining and alignment
and god knows what else all affect instruction execution rates, so
IBM doesn't quote these much anymore.

Anyway, he clocked the following instructions at the indicated MIPS:

   ICM   0,15,W1             10.4287
   L     0,W1                41.8938
   LTR   0,1                 41.7416

So it appears that -- at least on a 3081K -- the L/LTR sequence is
roughly twice as fast as the ICM.

Quote:
>--

>IBM Systems Programmer          | UUCP:      ...uunet!ingr!b30!dwayneb!dwayne
>Intergraph Corp., M.S. GD3002   | Voice:     (205) 730-3795
>Huntsville, AL  35894-0001      | FAX:       (205) 730-3300

- David Andrews



Mon, 30 Jan 1995 23:30:01 GMT  
 ICM vs. L and LTR

Quote:
>I've noticed lately that some recent IBM code uses
>the sequence:
>    L     Rx,AREA
>    LTR   Rx,Rx
>    BNZ   SOMEWHERE
>and I was wondering if
>    ICM   Rx,15,AREA
>    BNZ   SOMEWHERE
>would be faster?  It looks to me like the two code segments
>would produce identical results (ignoring for the moment that
>the LOAD instruction could take an index register, and that
>ICM wasn't available on the 360, and that LOAD on a 360 could
>cause a specification exception).

Sounds right.

Quote:
>I'm sure that LTR is very fast, but is L + LTR faster than
>ICM?  What, if any, advantages does the L + LTR sequence offer
>above ICM?

In a pipelined implementation, unless ICM 15 is implemented to be equivalent to
L/LTR (which would require decoding ICM 15 differently than the general case of
ICM by skipping masking of the loaded result before a compare), which I find
unlikely, it would seem reasonable that the L/LTR would execute faster because
the LTR would execute immediately after the result of the L is available (via
bypass circuitry).  In defense of ICM 15: it does save two bytes :-).

In support of my interpretation... there was a thread recently in comp.arch on
claims that use of RISC-y ISA subsets on CISC processors can trade somewhat
increased code space for much higher performance.

--
John R. Grout
University of Illinois, Urbana-Champaign
Center for Supercomputing Research and Development




Mon, 30 Jan 1995 03:06:10 GMT  
 ICM vs. L and LTR
I tested this on a 3084Q once: an empty BCT loop takes 100ns, ICM 15,15,0
takes 100ns, and L 15,0; LTR 15,15 takes 74ns. I didn't try to test any
of the subtle pipelining effects so these figures should be taken with a
pinch of salt.


Mon, 30 Jan 1995 19:25:18 GMT  
 ICM vs. L and LTR

 (Dwayne A. Blumenberg) writes:

Quote:
>I've noticed lately that some recent IBM code uses
>the sequence:

>    L     Rx,AREA
>    LTR   Rx,Rx
>    BNZ   SOMEWHERE

>and I was wondering if

>    ICM   Rx,15,AREA
>    BNZ   SOMEWHERE

>would be faster?

A paper in the CME Newsletter #88 entitled "Measuring 370 Instructions"
(by Ed Stewart) talks about the author's effort to measure how fast
individual instructions are on a 3081K CPU.  Pipelining and alignment
and god knows what else all affect instruction execution rates, so
IBM doesn't quote these much anymore.

Anyway, he clocked the following instructions at the indicated MIPS:

   ICM   0,15,W1             10.4287
   L     0,W1                41.8938
   LTR   0,1                 41.7416

So it appears that -- at least on a 3081K -- the L/LTR sequence is
roughly twice as fast as the ICM.

Quote:
>--

>IBM Systems Programmer          | UUCP:      ...uunet!ingr!b30!dwayneb!dwayne
>Intergraph Corp., M.S. GD3002   | Voice:     (205) 730-3795
>Huntsville, AL  35894-0001      | FAX:       (205) 730-3300

- David Andrews



Mon, 30 Jan 1995 23:30:01 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. L/LTR vs ICM

2. ICM vs. L and LTR

3. exec ls -l vs. exec "ls -l"

4. Use of ICM

5. ICM & STCM

6. ICM, STCM, CLM - Off-Topic.

7. ICM - Set Condition Code?

8. Ever heard of ICM REXX?

9. Learn what's inside LTR Volume 10 # 3

10. Learn what's inside LTR Volume 9 # 3

11. Learn what's inside LTR Volume 10 # 1’

12. NEW_parallel.lbb from LTR 8.4 doesn't work

 

 
Powered by phpBB® Forum Software