RISC vs. CISC memory usage 
Author Message
 RISC vs. CISC memory usage


        >Does anybody have any information on the relative memory usage
        >of a given program (ie. a C++ program) to a RISC processor (ie. PowerPC)
        >versus a CISC processor (ie. 68k)?  The assertion that is being challenged
        >here is that the RISC processor uses up to 2x the memory due to word
        >alignment issues.

---Where significant discrepancy occurs between RISC and CISC
relates to character instructions.

   On CISC, a _single_ character instruction can move n bytes of data.  The
instruction is held in the instruction register, requiring no memory
traffic other than data movements.  On the other hand, a similar
operation on a RISC machine requires instructions to move a byte,
to perform 2 increments, and a test.  The execution of these
instructions generate memory traffic, which must also compete
with memory traffic generated by the data movement.  On some RISC
systems, an instruction buffer is used, or cache/caches, which can
considerably reduce memory demands.

   On the more complex of the CISC moves, where blank filling may
occur, RISC would require even more instructions.

   Character instructions on CISC machines are used for searching,
translating, moving, comparing, and logical operations, etc.

   On some systems, there are CISC instructions for integer
operations, such as block load/store of registers.  Others include
loop control (increment, compare, test, branch).  Such instructions
again tend to reduce code space (and memory traffic), compared to
RISC.



Tue, 07 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage
Look at PowerMac vs. 680x0 Mac. The OS calls are the same, the environment
is the same, but a PowerMac needs 50% to 100% more RAM to run the same
stuff (according to software vendor claims for minimum memory required.




Wed, 08 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

Quote:

[snip]
>>   On CISC, a _single_ character instruction can move n bytes of data.  The
>>instruction is held in the instruction register, requiring no memory
>>traffic other than data movements.  On the other hand, a similar
>>operation on a RISC machine requires instructions to move a byte,
>>to perform 2 increments, and a test.  The execution of these
>>instructions generate memory traffic, which must also compete
>>with memory traffic generated by the data movement.  On some RISC
>>systems, an instruction buffer is used, or cache/caches, which can
>>considerably reduce memory demands.

>On RISC you would not transfer a block a byte at a time. You would use
>word or multi-word moves for this. It is true that you still need a
>loop for this, but what you lose in number of executed instructions
>you win in using word or multi-word transfers instead of byte
>transfers.

But a well-designed CISC machine will similarly perform word or multi-word
transfers when doing "string" operations (eg, the 370 MVC instruction).  And
setup for one of these instructions is pretty much by definition no worse than
the setup for a RISC loop.

One interesting point is that early (and optimistic) RISC efficiency and
compactness estimates were weighted in favor of RISC since they compared to
the 370 instruction set, and the 370 instruction set has a major deficiency in
that it lacks a reasonable complement of immediate instructions (whereas most
RISC designs have a rich set of immediate instructions).  When compared to a
machine with a richer set of immediate instructions RISC didn't do nearly as
well.

Dan Hicks



Mon, 13 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage



        >>---Where significant discrepancy occurs between RISC and CISC
        >>relates to character instructions.

        >>   On CISC, a _single_ character instruction can move n bytes of data.  The
        >>instruction is held in the instruction register, requiring no memory
        >>traffic other than data movements.  On the other hand, a similar
        >>operation on a RISC machine requires instructions to move a byte,
        >>to perform 2 increments, and a test.  The execution of these
        >>instructions generate memory traffic, which must also compete
        >>with memory traffic generated by the data movement.  On some RISC
        >>systems, an instruction buffer is used, or cache/caches, which can
        >>considerably reduce memory demands.

        >On RISC you would not transfer a block a byte at a time. You would use
        >word or multi-word moves for this. It is true that you still need a
        >loop for this, but what you lose in number of executed instructions
        >you win in using word or multi-word transfers instead of byte
        >transfers.

---Character strings tend not to have lengths that are multiples of
2 or 4, nor do they tend to commence on word boundaries.

---As for multi-word transfers, CISC does this as well when you
move/compare (i.e., transfers in blocks), so you retain the
advantages of byte operations. (i.e., RISC provides no
particular advantage, and you still have the problems
of word transfers, rather than byte transfers)

        >>   On the more complex of the CISC moves, where blank filling may
        >>occur, RISC would require even more instructions.

        >True, but you forget to count the instructions needed to set up the
        >parameters for the block-move instructions.

---No, I haven't forgotten.  But there's one *mighty* important
difference: the instructions to set up such a CISC move/compare
are executed just *once* and only once, not for *every* byte moved.
However, the same instructions are needed to set up RISC (load
addresses of source/destination/length of transfer).
   The move/compare itself requires just one instruction to be
held in control -- not a loop.

 >Also, if you just have a
        >tiny bit of cache the limiting factor is not the instruction fetching
        >(which is cacheable) but the memory moving (which generally is not).

---Instruction caching doesn't help if the loop won't fit in the cache.

        >>   On some systems, there are CISC instructions for integer
        >>operations, such as block load/store of registers.  Others include
        >>loop control (increment, compare, test, branch).  Such instructions
        >>again tend to reduce code space (and memory traffic), compared to
        >>RISC.

        >Some RISCs (at least ARM and PowerPC) have load/store multiple
        >registers.

---If you can't beat them, join them, eh!



Fri, 17 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage


        >>On RISC you would not transfer a block a byte at a time. You would use
        >>word or multi-word moves for this. It is true that you still need a
        >>loop for this, but what you lose in number of executed instructions
        >>you win in using word or multi-word transfers instead of byte
        >>transfers.

        >But a well-designed CISC machine will similarly perform word or multi-word
        >transfers when doing "string" operations (eg, the 370 MVC instruction).  And
        >setup for one of these instructions is pretty much by definition no worse than
        >the setup for a RISC loop.

---Quite so.  With the IBM MVC/CLC family, it's even better --
there's no setup.  Just do the move.  For a variable-length move
with MVC/CLC etc, the setup is 2 instructions (IC, EX) if minimum
string length is 1, and 4 instructions otherwise.

        >One interesting point is that early (and optimistic) RISC efficiency and
        >compactness estimates were weighted in favor of RISC since they compared to
        >the 370 instruction set, and the 370 instruction set has a major deficiency in
        >that it lacks a reasonable complement of immediate instructions (whereas most
        >RISC designs have a rich set of immediate instructions).  When compared to a
        >machine with a richer set of immediate instructions RISC didn't do nearly as
        >well.

---And while the S/370 isn't over-endowed with immediate
instructions, let's not forget SR (for generating zero),
LA for generating 1 <= c <= 4095, BCTR for decrementing,
and SR/BCTR for -1, & LA for small increments.

   The lack of immediate instructions on S/370 is surprising, for
some first-generation machines provided many more: -1, 1, 0, 2**31,
2**16, 2**17, 2**4, 2**9, etc



Fri, 17 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage


: >---Character strings tend not to have lengths that are multiples of
: >2 or 4, nor do they tend to commence on word boundaries.

: No, but (typical) RISC code will start with a few instructions to
: handle transferral up to the first word/multi-word boundary, then a
: loop for the main part and finally a few instructions to handle the
: unaligned tail end.

        But what happens when source and target are at
        different "unalignments" ?

--
   -----------------------------------------------------

   "Just because it worked doesn't mean it works." -- me



Fri, 17 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

Quote:


>: >---Character strings tend not to have lengths that are multiples of
>: >2 or 4, nor do they tend to commence on word boundaries.
>: No, but (typical) RISC code will start with a few instructions to
>: handle transferral up to the first word/multi-word boundary, then a
>: loop for the main part and finally a few instructions to handle the
>: unaligned tail end.
>    But what happens when source and target are at
>    different "unalignments" ?

You load in a word, shift it and store it. The remaining part (that
was shifted out of the word) is combined with part of the next word
etc. Again, this is something that also has to be done in a CISC if it
wants to exploit word moves (does x86 exploit word moves if source and
destination have different alignments?).

Granted, this increases the code size for the RISC code. But that can
be regained by using a subroutine call to an optimized block-move
routine, at the cost of a subroutine call. On most RISCs subroutine
calls are not more costly than normal jumps (as the return address is
stored in a register instead of memory). A good way of optimizing
block-move is to generate specialized code for each alignment
combination (16 total for byte boundaries in two 32 bit words).




Sat, 18 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

[snip]

Quote:
>Note that a CISC implementation of block-move will have to do the same
>thing if it wants to exploit word/multi-word moves, only in micro-code
>instead of machine code. Hence, there is no obvious speed advantage of
>the CISC approach, only space.

This is only true of low-end CISC machines.  The high-end machines have
dedicated storage interface logic that can optimize multi-byte moves in
hardware, meaning that little if any time is spent figuring out how to do the
move.

Dan Hicks



Sat, 18 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage



        >>        >On RISC you would not transfer a block a byte at a time. You would use
        >>        >word or multi-word moves for this. It is true that you still need a
        >>        >loop for this, but what you lose in number of executed instructions
        >>        >you win in using word or multi-word transfers instead of byte
        >>        >transfers.

        >>---Character strings tend not to have lengths that are multiples of
        >>2 or 4, nor do they tend to commence on word boundaries.

        >No, but (typical) RISC code will start with a few instructions to
        >handle transferral up to the first word/multi-word boundary, then a
        >loop for the main part and finally a few instructions to handle the
        >unaligned tail end.

---That sounds like a subroutine's worth of work.

        >Note that a CISC implementation of block-move will have to do the same
        >thing if it wants to exploit word/multi-word moves, only in micro-code
        >instead of machine code. Hence, there is no obvious speed advantage of
        >the CISC approach, only space.

---While CISC might well do that, it doesn't have to.  It might,
for example, hand the move over to a sub-controller while it gets
on with some other work.

   The point about CISC is that the overall operation is specified
by a SINGLE request.  That single request can be optimized in a
variety of ways (about which, incidentally, the user doesn't
have to worry).  In the case of RISC, however, the overal task
is not obvious to the processor -- hence optimization is not
feasable at teh processor level (other than in a general way
as is performed for all instructions).

        >>        >>   On the more complex of the CISC moves, where blank filling may
        >>        >>occur, RISC would require even more instructions.

        >>        >True, but you forget to count the instructions needed to set up the
        >>        >parameters for the block-move instructions.

        >>---No, I haven't forgotten.  But there's one *mighty* important
        >>difference: the instructions to set up such a CISC move/compare
        >>are executed just *once* and only once, not for *every* byte moved.
        >>However, the same instructions are needed to set up RISC (load
        >>addresses of source/destination/length of transfer).
        >>   The move/compare itself requires just one instruction to be
        >>held in control -- not a loop.

        >If the argument is space, the once/many execution counts are
        >irrelevant. And I agree, yes, RISCs also have to initialize. But the
        >initialization code is no longer than on a CISC, so the ratio between
        >block moves is not 1 on CISC to many on RISC, but some on CISC to more
        >on RISC.

---A whole family of CISC character instructions on S/370 do NOT require
ANY initialization.  The comparison of 1 CISC to many RISC is perfectly
valid.

 I don't claim that RISC will have equally good code density
        >to CISC _in_this_particular_case_, I just claim that _overall_ the
        >difference is not much. I can give examples of single RISC
        >instructions that would require many CISC instructions (on typical
        >CISCs). One such is the (32 bit) ARM instruction

        >    BICGE R0,R1,R2,LSR R3

        >Which, if the GE condition is set, ands R1 to the negation of R2
        >shifted right by a number of bits specified by R3 and puts the result
        >in R0. How many instructions will that take on an i486 (or your
        >favourite CISC)?

---Perhaps an interesting instruction, but how often is it
executed?  I can't recall ever having had a need for
such an instruction.  I don't mind having to write a few
extra lines for this operation if ever I should have the
need for it (which is probably never).

        >> >Also, if you just have a
        >>        >tiny bit of cache the limiting factor is not the instruction fetching
        >>        >(which is cacheable) but the memory moving (which generally is not).

        >>---Instruction caching doesn't help if the loop won't fit in the cache.

        >If you can't fit a block-move loop into your cache, you have a _very_
        >small cache indeed.

        >>        >Some RISCs (at least ARM and PowerPC) have load/store multiple
        >>        >registers.

        >>---If you can't beat them, join them, eh!

        >Who says load/store multiple registers are more CISC than RISC? You
        >can, of course, argue that it was in CISCs before RISCs, but that is
        >true for e.g. addition too. Is addition more CISC than RISC?

---Multiple load/store generally require more than one machine cycle,
and therefore don't fit in the category of the RISC mould.



Tue, 21 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

Quote:


>>        >    BICGE R0,R1,R2,LSR R3

>>        >Which, if the GE condition is set, ands R1 to the negation of R2
>>        >shifted right by a number of bits specified by R3 and puts the result
>>        >in R0. How many instructions will that take on an i486 (or your
>>        >favourite CISC)?

>And, even if you wanted to do this, how could you possibly specify it in C?

The instruction would be used to (conditionally) extract a set of bits from
R1. A C compiler might use it for accessing bit fields.

+--------------------------------------------+
|              Michael Quinlan               |

|       http://www.primenet.com/~mikeq       |
|     If it doesn't fit, you must acquit!    |
+--------------------------------------------+



Wed, 22 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

Quote:




>    >>        >On RISC you would not transfer a block a byte at a time. You would use
>    >>        >word or multi-word moves for this. It is true that you still need a
>    >>        >loop for this, but what you lose in number of executed instructions
>    >>        >you win in using word or multi-word transfers instead of byte
>    >>        >transfers.

>    >>---Character strings tend not to have lengths that are multiples of
>    >>2 or 4, nor do they tend to commence on word boundaries.

>    >No, but (typical) RISC code will start with a few instructions to
>    >handle transferral up to the first word/multi-word boundary, then a
>    >loop for the main part and finally a few instructions to handle the
>    >unaligned tail end.

>    >Note that a CISC implementation of block-move will have to do the same
>    >thing if it wants to exploit word/multi-word moves, only in micro-code
>    >instead of machine code. Hence, there is no obvious speed advantage of
>    >the CISC approach, only space.

>---While CISC might well do that, it doesn't have to.  It might,
>for example, hand the move over to a sub-controller while it gets
>on with some other work.

>   The point about CISC is that the overall operation is specified
>by a SINGLE request.  That single request can be optimized in a
>variety of ways (about which, incidentally, the user doesn't
>have to worry).  In the case of RISC, however, the overal task
>is not obvious to the processor -- hence optimization is not
>feasable at teh processor level (other than in a general way
>as is performed for all instructions).

 You are not considering the fact that the definition of the high-level
language might not map well to the CISC instruction you're considering.

 Thus, although that CISC instruction was a nice/large semantic unit,
which itself could be optimize; it's never used, or even worse; not used
by the high-level compilers in a way that takes advantage of the
optimization.

 The advantage of RISC here is that it free's the optimization from
the machine-level to the language-implementation level.  So, the burden
of optimization falls to the compiler writer, not the chip designer.
A major goal of most RISC designs is to present a regular instruction
set that can be cheaply implemented, and can be highly optimized by
a compiler.

 It must then be realized, that most RISC instruction sets are not designed
to be easy for a "human" to write to.  But, at the time of the 370, many
instructions were designed to be easy for a programmer to accomplish;
and (sometimes) hard for a compiler - because the majority of code
was written in ASM.

        - Dave Rivers -
--
Yoiks and Away!



Fri, 24 Apr 1998 03:00:00 GMT  
 RISC vs. CISC memory usage



        >>>    >    BICGE R0,R1,R2,LSR R3
        >>>
        >>>    >Which, if the GE condition is set, ands R1 to the negation of R2
        >>>    >shifted right by a number of bits specified by R3 and puts the result
        >>>    >in R0. How many instructions will that take on an i486 (or your
        >>>    >favourite CISC)?
        >>
        >>And, even if you wanted to do this, how could you possibly specify it in C?

        >The instruction would be used to (conditionally) extract a set of bits from
        >R1. A C compiler might use it for accessing bit fields.

---It's the AND with the negation part that gets me.  The negative?
I can think of lots better ways to extract a subfield.

Incidentally, as it's already in a register, a shift and an AND
will do it on S/370.



Fri, 01 May 1998 03:00:00 GMT  
 RISC vs. CISC memory usage

Quote:

>>The instruction would be used to (conditionally) extract a set of bits from
>>R1. A C compiler might use it for accessing bit fields.

>---It's the AND with the negation part that gets me.  The negative?
>I can think of lots better ways to extract a subfield.

>Incidentally, as it's already in a register, a shift and an AND
>will do it on S/370.

You would have to build the mask for the N or NR instruction (there is no
'AND' instruction mnemonic on the S/370). I assume the reason it took the
negative (one's complement?) was so it could use the same mask for setting
the field as for extracting the field.

+--------------------------------------------+
|              Michael Quinlan               |

|       http://www.primenet.com/~mikeq       |
|     If it doesn't fit, you must acquit!    |
+--------------------------------------------+



Sat, 02 May 1998 03:00:00 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. RISC vs. CISC memory usage

2. RISC vs. CISC -- SPECmarks

3. RISC vs. CISC

4. RISC vs. CISC

5. Newbie Quest: CISC vs RISC and P7?

6. Smalltalk: RISC vs CISC

7. CISC Microcode (was Re: RISC Mainframe)

8. Pentium/II/III/Pro CISC or RISC

9. Memory usage of PHP script/arrays and memory restriction/error

10. optimizing memory usage of large in memory structures

11. memory usage (how to debug a memory leak?)

12. RISC vs CISC? Call a spade a spade?

 

 
Powered by phpBB® Forum Software