Instruction Timings 
Author Message
 Instruction Timings

Last time I saw published instruction timings was for machines like the System / 360 model 30, 40, 50, 65 etc.  These machines had no cache, little or no pipeline, etc, so fairly accurate instruction
timings could be published.  BTW this was about 1970.

With modern machines having caches between the ALU and main memory, multiple engines, internal pipelines, parallel execution, instruction lookahead, etc, etc, etc published timings are a thing of the
past.  First, the formulae would be so complex as to defy reasonable calculation, since one would need to allow for each of the above mechanisms.  Second, the timings would be further influenced by
basic or ec-mode.  Third, the timings could be drastically affected by the paging mechanism, TLB's and other mechanisms when operating in ec-mode.  Fourth, IBM, and the other vendors just ain't
published instruction timings for decades.

Yup, I am interested, since certain portions of my product line are extremely instruction stream intensive - the dynamic disassembler / code pattern recognizer portion of the Edge Portfolio Analyzer
dims the lights when running.  But the only approach is to do actual code timings in real world environments, even then the CPU you are running on can make a significant difference - i.e. P/390 vs
3090J vs 9021 vs CMOS (which of 3 versions), etc, etc.
Rex Widmer
Builder of software archeology tools and other strange programs to help survive in a legacy based world.



Tue, 14 Sep 1999 03:00:00 GMT  
 Instruction Timings

Does anyone know of a publication which describes instruction timings
for the 370 instructions across various machines?  Examples of those
which may provide them for individual machines?

Thanks for the assistance.



Tue, 14 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Does anyone know of a publication which describes instruction timings
> for the 370 instructions across various machines?  Examples of those
> which may provide them for individual machines?

> Thanks for the assistance.

For many of the old 370 series machines, the Functional Characteristics
manuals would include timing tables for the instructions.  However,
these were only nominal values which made assumptions about cache hits
and relocation overhead which meant that the tables were mostly useful
for comparision purposes.

Back in the 370/158 timeframe the company I was working for was getting
ready to sue the Itel computer leasing company to get out of a contract,
using as justification their misrepresentation of a 370/155 with a DAT
as being equivalent to a 370/158.  We spent a lot of effort measuring
the elapsed time of instruction sequences; the effect of the cache gave
us trouble for a while, but we eventually came up with good proof for
our argument.  Unfortuantely in the meantime Itel went into bankrupcy
and never came out.



Tue, 14 Sep 1999 03:00:00 GMT  
 Instruction Timings

I don't know the current publication, but IBM used to publish a book
called the S/370 System Summary which had the cycle time for each
machine and storage fetch times and width of access.

That didn't go into each instruction, but could give an idea of relative
times.

Actually, the basic principle behind S/360 was NOT to get into
individual timings as that would limit a program to one machine.  S/360
programs are supposed to be able to run on any machine of the family as
long as there's enough hardware to support it.  CPU timing isn't
supposed to matter.



Wed, 15 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Back in the 370/158 timeframe the company I was working for was getting
> ready to sue the Itel computer leasing company to get out of a contract,
> using as justification their misrepresentation of a 370/155 with a DAT
> as being equivalent to a 370/158.  We spent a lot of effort measuring
> the elapsed time of instruction sequences; the effect of the cache gave
> us trouble for a while, but we eventually came up with good proof for
> our argument.  Unfortuantely in the meantime Itel went into bankrupcy
> and never came out.

Eric, before I came to Candle I was with CSC. We installed a number of
370/155 with dat boxes, and faster memory, through CDC. These were for
a number of facility management contracts. We never claimed they were
the equivelent of a 158 in performance. In fact, they were about 10%
slower and 35% cheaper.

If you went with the IBM upgrade they were about 20% slower and 10%
more expensive. The only justification for the IBM upgrade was  you:
  1. Already had the 155.
  2. Had an "only blue" mentality

The 165 upgrade was so expensive it was cheaper to buy another machine.

Only folks with 145 were lucky, because the machine had all the virtual
hardware, you only needed a new "bios" with a floppy disk. The 145
announcement should have been a giveaway for people, it had the
Execute Local instruction documented and was obviously a virtual
machine.
Later versions of the pop removed the Execute Local description, which
was still used for all the emulators.

Have fun



Wed, 15 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Actually, the basic principle behind S/360 was NOT to get into
> individual timings as that would limit a program to one machine.  S/360
> programs are supposed to be able to run on any machine of the family as
> long as there's enough hardware to support it.  CPU timing isn't
> supposed to matter.

But certain coding practices do. For example, boundry alignment can make
a big difference in instruction execution. Granted, the boundries have
gotten even larger (full word, double word, cache frame, etc), but it
still makes a difference.

Keeping a small reference set to keep code on the same page can affect
system performance. This is locality of reference, and it does
eliminate a lot of paging, thus performance.

"pure structured programming" can be hazardous to the performanc of your
system (this was my opening line in a performance seminar I gave). If
you
define structured programming as code you write today and I have to
maintain a year from now and don't curse you (the 2nd line I used in
that
seminar), then you can bend pure to make useful. I took a CICS
application
that was consuming 63% of the machine in its own tcb, and reduced it to
17%, plus system overhead. I save half a machine without changing the
application - just doing tweaking, recompiling, and in one case,
replacing a cobol table search with an assembler binary search.



Wed, 15 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Does anyone know of a publication which describes instruction timings
> for the 370 instructions across various machines?  Examples of those
> which may provide them for individual machines?

> Thanks for the assistance.

Basically, ALL of the more common instructions will execute in
one machine cycle. Depending on several factors:
1) If RR instruction, then ALL but MR and DR = 1 cycle,
2) MR/DR Variable depending on operand factors. I.E., there is
   a "worst case" scenario for both of them.
3) RX Instructions are one cycle if the storage (2nd) operand
   is in cache, two if not. Again, M and D being the exceptions.
   CVB/CVD may also take longer. I've never tested them...
4) SI Instructions = 1 cycle if cache hit, else 2.
5) SS Instructions = one or two cycles per bus width fetch.
   In other words, if you do a MVC from FOO to BAR and the length
   of the move is evenly divisible by the width of the bus, mostly
   8 or 16 bytes, AND neither operand is in cache at the time of
   initial instruction fetch, then it's two cycles for the first
   block, and one cycle for each subsequent block...
6) Floating point operations without the vector facility NOR an
   accelerator, such as that provided in the older 4361 are
   generally one or two cycles, depending upon operand cache
   availability, etc. Again, multiply, divide, and the new
   square root instruction will be notable exceptions.

Finally, the following kind of sequence can lead to massive
debugging headaches unless you realize that one instruction that
follows another may in fact complete BEFORE the first one.

   DR  R0,R2
   LR  R3,R4

Note that there is NO operand dependency from the DR to the LR.
So, it is quite likely that the LR will complete before the DR.
If the DR experiences some kind of problem, like an underflow,
the PSW may point to the instruction AFTER the LR!? This
phenomenon is known as the "Pipeline Effect" and has been one
of the major criticisms of certain RISC processors like the SPARC
and MIPS chip-sets.

Hope this helps to clarify somehow - but I doubt it! It's a very
confusing mess, and something that ALL vendors that I've worked
with are loath to divulge! The best suggestion I can offer is to
write a tiny little test program that executes a sequence of
instructions many thousands of times, for many (up to 100)
interations.

Something like this pseudo-code:

   DO i=1 to 100
      Clock1 = STCK
      DO 1000
         <AnInstructionSequenceToBeTimed>
      END
      Clock2 = STCK
      ClockStack[i] = Clock2 - Clock1
   END
   <Display All Entries in ClockStack>
   <Pick the lowest value you find as the fastest execution time>

Good luck.....
Ciao,
Bill B.



Wed, 15 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

>Only folks with 145 were lucky, because the machine had all the virtual
>hardware, you only needed a new "bios" with a floppy disk. The 145
>announcement should have been a giveaway for people, it had the
>Execute Local instruction documented and was obviously a virtual
>machine.
>Later versions of the pop removed the Execute Local description, which
>was still used for all the emulators.

I learned a lot from our 145.  It came with a listing of
the microcode, so you could actually see how the instructions
were implemented.  I don't recall the Execute Local instruction.
Do you remember the op-code, mnemonic, format and purpose?
(I try to collect these things, more for trivia than anything
else.)

David Bond



Thu, 16 Sep 1999 03:00:00 GMT  
 Instruction Timings


Quote:

> Basically, ALL of the more common instructions will execute in
> one machine cycle. Depending on several factors:

               etc...

Thanks to all for the fascinating responses, especially this one.

I am considering methods of optimizing a mainframe assembler module I may
rewrite, and raw cpu utilization is a primary concern.

It's nice to see I am not the only remaining practitioner who deals with
the nuances of mainframe assembler language.  I have actually considered
purchasing a good 370 assembler text, just as a reference, and find there
are precious few still being produced... Most are out of print.  There
are a few remaining, so if anyone has a suggestion I'd be interested.

Actually, I suppose I should just order IBM's technical publications
directly.  IBM did a superb job with the original Principles of
Operations manuals for the 360/370 line, and, though I don't know what
the corresponding manuals cost now (anyone know?), they used to be a
bargain to buy as well (not so for the textbook variety!).

Thanks again for the assistance.



Thu, 16 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:


> >Only folks with 145 were lucky, because the machine had all the virtual
> >hardware, you only needed a new "bios" with a floppy disk. The 145
> >announcement should have been a giveaway for people, it had the
> >Execute Local instruction documented and was obviously a virtual
> >machine.
> >Later versions of the pop removed the Execute Local description, which
> >was still used for all the emulators.

> I learned a lot from our 145.  It came with a listing of
> the microcode, so you could actually see how the instructions
> were implemented.  I don't recall the Execute Local instruction.
> Do you remember the op-code, mnemonic, format and purpose?
> (I try to collect these things, more for trivia than anything
> else.)

> David Bond

The operand was the address of a table layout. The table included
base address, offset, range, etc. It was a page/segment setup,
but not described in those words. It actually set one of the
control registers.

The instruction could be used by each of emulators (eg 1401, 1410,
7010, etc). The model 65, even earlier, had DIL (do interpretive
loop) for the 7094 emulation. That was close, but no cigar on the
virtual capability.

The 145 announcement was in 1970, so I really don't remember
much about it. I know I was reading the details and yelled out to
my boss "Hey. IBM just discovered virtual in their production machines
(the 360/67 was already in existance running that nice code from
the Cambridge Scientific Center (CP) and its Cambridge Monitor System
(CMS).
Of course the DAT boxes on that beast were bigger than the new CMOS
machines in total.

I am sitting at my desk top with the laptop plugged into it. Sitting
here realizing the laptop has more horsepower than the 360/91 (or
95). That machine had 750ns memory (95 had the first 2mb as 120ns),
and a 60ns machine cycle with 6 floating point boxes, 3 fixed point
boxes, and an asynchronous storage mover. The disks were originally
2314 (29BM per spindle).

My laptop has 32MB RAM, 1GB disk, ..... How times change. I get to
hate myself in about 4-6 months.



Thu, 16 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Finally, the following kind of sequence can lead to massive
> debugging headaches unless you realize that one instruction that
> follows another may in fact complete BEFORE the first one.

>    DR  R0,R2
>    LR  R3,R4

> Note that there is NO operand dependency from the DR to the LR.
> So, it is quite likely that the LR will complete before the DR.
> If the DR experiences some kind of problem, like an underflow,
> the PSW may point to the instruction AFTER the LR!? This
> phenomenon is known as the "Pipeline Effect" and has been one
> of the major criticisms of certain RISC processors like the SPARC
> and MIPS chip-sets.

S0CA (I believe) is imprecise interrupt. I got one 4 subroutines
away from the MVC that got a storage violation at the end of
its life on the 360/91. Fun to find.


Thu, 16 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:
> The instruction could be used by each of emulators (eg 1401, 1410,
> 7010, etc). The model 65, even earlier, had DIL (do interpretive
> loop) for the 7094 emulation. That was close, but no cigar on the
> virtual capability.

I wonder how many people are still doing 14xx or 70xx emulation.  My own
employer did so up to just a few years ago.  Anybody have any idea?

Quote:
> I am sitting at my desk top with the laptop plugged into it. Sitting
> here realizing the laptop has more horsepower than the 360/91 (or

While the CPU of a x86 PC today has more horsepower than the older S/360
mainframes, how would one compare _total throughput_?  Once we added a
spooler and a simple CICS region (MTCS?) on our 360-40, we were cranking
quite a bit of work through it.  I think the channel structure of the old
S/360 would out perform the "bus" structure of the x86 PC design.
  Comments?


Fri, 17 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> > programs are supposed to be able to run on any machine of the family as
> > long as there's enough hardware to support it.  CPU timing isn't
> > supposed to matter.
Bob wrote...
> But certain coding practices do. For example, boundry alignment can make
> a big difference in instruction execution. Granted, the boundries have
> gotten even larger (full word, double word, cache frame, etc), but it
> still makes a difference.

I'm confused.  Wouldn't the practices you mention (ie boundary alignment)
apply to all machines?  Are boundaries different on different CPUs (I
would think not.)

Indeed, I thought a lot of that environmental considerations can vary at
run time on any given machine, depending on what else is running and what
resources are available.

In any event, if you were to code an application specific to a particular
CPU model (ie a 3090J), aren't you then locking yourself in to that model?
Again, I thought the whole premise of S/360 was CPU independence.

At my company, we have 3 big IBM mainframes (actually one is an Hitachi).
We shuffle programs between them all the time as our work needs require.
Today, the application programmers don't even know what kind of machine
they're running on...various upgrades go on all the time.



Fri, 17 Sep 1999 03:00:00 GMT  
 Instruction Timings



 >In any event, if you were to code an application specific to a particular
 >CPU model (ie a 3090J), aren't you then locking yourself in to that
model?

Nope. You're simply optimizing for that model.

Let's face it: if you're not out to save cycles, why the heck are you
coding it in assembler?

It's been (in the past) pretty safe to optimize for a machine; most
optimizations carry over fairly well.

There've been a few surprises, of course. Like packed decimal on the CMOS
machines. Then again, if you're using decimal it's probably because it was
speced out with decimal, and you can't really change that anyhow.

 >Again, I thought the whole premise of S/360 was CPU independence.

You mean model-independence?

The only thing I can recall was the guarantee that your code today would
run on tomorrow's machine.



Fri, 17 Sep 1999 03:00:00 GMT  
 Instruction Timings

Quote:

> Bob wrote...
> > But certain coding practices do. For example, boundry alignment can make
> > a big difference in instruction execution. Granted, the boundries have
> > gotten even larger (full word, double word, cache frame, etc), but it
> > still makes a difference.

> I'm confused.  Wouldn't the practices you mention (ie boundary alignment)
> apply to all machines?  Are boundaries different on different CPUs (I
> would think not.)

One of the key issues turns out to be alignment within a cache segment.
If you can keep a loop from crossing a cache boundary, you will avoid a
performance penalty.  These cache segments would be larger on the more
powerful machines.  This was not something one would ordinarily worry
about.

Quote:

> Indeed, I thought a lot of that environmental considerations can vary at
> run time on any given machine, depending on what else is running and what
> resources are available.

That makes getting instruction timings very interesting.  If you want
precise results, you either control the environment (run stand-alone) or
perform enough trials to get statistical significance.

Quote:

> In any event, if you were to code an application specific to a particular
> CPU model (ie a 3090J), aren't you then locking yourself in to that model?
> Again, I thought the whole premise of S/360 was CPU independence.

Yes and no.  The code will still run when you implement a more powerful
machine, and if the hardware speeds things up enough, there will often
be no point in re-optimizing.

Optimization is a bad thing to do; it makes the code harder to
understand and maintain, it is very expensive in terms of programmer
time, and the result sometimes applies only to specific hardware as you
note.  You only do optimization when there is an economic justification
that overrides these penalties.



Fri, 17 Sep 1999 03:00:00 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Instruction timings

2. Instruction Timings

3. Instruction timings, revisited

4. Instruction timing

5. Pentium Instruction Timings

6. Instruction timings?

7. Where do I find instruction timings?

8. need instruction timings

9. Where can I find instruction timings?

10. Instruction timing

11. Instruction timings for Pentium II ?

12. Instruction timings

 

 
Powered by phpBB® Forum Software