Test and Set (TS) vs Compare and Swap (CS) 
Author Message
 Test and Set (TS) vs Compare and Swap (CS)

Disclaimer: I'm an absolute novice at assembler, thus I'm
asking this question, regarding some inherited code.

Consider the following:

LOOP TS FLAG
BNZ LOOP
...
...
...
FLAG DC '0'

My understanding of the TS instruction is that it tests the high-order
bit, and then sets the entire byte to 1's. Doesn't that mean that on
the second interation of the loop it's always going to pass? If so,
shouldn't the code really be written like this?

LOOP TS FLAG
BZ WORK
NI FLAG,'80'
B LOOP
MORE blah blah blah

But I'm not convinced this works. What if another task actually has the
flag?  Is this a situation where I should be using the Compare and Swap
(CS) instruction?

Thanks

Carl



Sat, 29 Nov 2003 01:33:35 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

> Disclaimer: I'm an absolute novice at assembler, thus I'm
> asking this question, regarding some inherited code.

> Consider the following:

> LOOP TS FLAG
> BNZ LOOP
> ...
> ...
> ...
> FLAG DC '0'

> My understanding of the TS instruction is that it tests the high-order
> bit, and then sets the entire byte to 1's. Doesn't that mean that on
> the second interation of the loop it's always going to pass? If so,
> shouldn't the code really be written like this?

> LOOP TS FLAG
> BZ WORK
> NI FLAG,'80'
> B LOOP
> MORE blah blah blah

> But I'm not convinced this works. What if another task actually has the
> flag?  Is this a situation where I should be using the Compare and Swap
> (CS) instruction?

> Thanks

> Carl

TS was defined for multiprocessor work ... the thread/processor that
actually sets the flag is the only one that clears the flag (when it
is done).  The byte can be used as a "lock" for thread/processor
serialization.

other threads/processors that don't set the flag .... can either do TS
"spin-loop" on the flag ... attempting to catch it when it has been
"cleared" (by the thread/processor that actually holds/set the
flag/lock) ... or go off into some fancier serialization code
(possibly some form of wait ... or some combination; spin-loop a
maximum number of times before going off into more complex
serialization).

TS was defined in the 60s and used on the 360 model 65 and 360 model
67 multiprocessors.

Charlie Salisbury's work on fine-grain locking resulted in the Compare
& Swap instruction (aka his initials are CAS ... which was the
original mnemonic for compare & swap).

random ref:
http://www.garlic.com/~lynn/2001e.html#73

the task given by the "owners" of POP was to come up for a programming
paradigm for CAS where it was useful in a single processor environment
(not just multiprocessor) ... which gave rise to the programming notes
about multi-threaded serialization that works on both single and multi
processor configurations.

CAS can be used in a manner similar to TS (i.e. for setting/obtaining
a lock) ... however CAS can also be used for various operations
involving atomic storage update avoiding having to perform
serialization via a separate locking operation.

--



Sat, 29 Nov 2003 02:49:54 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

>Disclaimer: I'm an absolute novice at assembler, thus I'm
>asking this question, regarding some inherited code.
>Consider the following:
>LOOP TS FLAG
>BNZ LOOP
>...
>...
>...
>FLAG DC '0'
>My understanding of the TS instruction is that it tests the high-order
>bit, and then sets the entire byte to 1's. Doesn't that mean that on
>the second interation of the loop it's always going to pass? If so,
>shouldn't the code really be written like this?
>LOOP TS FLAG
>BZ WORK
>NI FLAG,'80'
>B LOOP
>MORE blah blah blah
>But I'm not convinced this works. What if another task actually has the
>flag?  Is this a situation where I should be using the Compare and Swap
>(CS) instruction?

Typical code would be

LOOP    TS  FLAG
        BNZ LOOP
*** Do protected computation
        MVI FLAG,0

The idea is that the first process to hit the code flows through.
Second and latter processes spin on the loop.  When the first process
has finished the protected computation, the MVI resets the flag, and
that allows one of the other processes to enter the critical region.

Yes CS is more flexible, and can often be used more efficiently.  But
to be maximally effective, it requires a different coding approach
from the above.



Sat, 29 Nov 2003 08:46:29 GMT  
 Test and Set (TS) vs Compare and Swap (CS)
There used to be a problem with CS in that if two lock words in the same
cache line were hit by two different engines, you'd get incorrect results.
The second lock word wouldn't be caught as the line was destaged (written
from cache back to memory) by the first engine. The immediate solution was
to separate the lock words by an amount sufficiently large enough so they
would fall in different cache lines...



Quote:

> Yes CS is more flexible, and can often be used more efficiently.  But
> to be maximally effective, it requires a different coding approach
> from the above.



Sat, 29 Nov 2003 21:50:34 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

>TS was defined in the 60s and used on the 360 model 65 and 360 model
>67 multiprocessors.

TS was part of the "Standard Instruction Set" that was implemented on
all 360 models. Although it was essential in multiprocessors, it was
sometimes useful even on uniprocessors. I used it on a uniprocessor in
DOS/360 as a cross-partition lock, since DOS/360 did not have OS/360's
ENQ/DEQ.

- Jim Saum



Sun, 30 Nov 2003 00:18:03 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:
> The idea is that the first process to hit the code flows through.
> Second and latter processes spin on the loop.  When the first process
> has finished the protected computation, the MVI resets the flag, and
> that allows one of the other processes to enter the critical region.

> Yes CS is more flexible, and can often be used more efficiently.  But
> to be maximally effective, it requires a different coding approach
> from the above.

a typical CS operation loads the current value of the storage location
and the new value of the storage location (which can be something as
simple as incrementing the current value) and then does an atomic
replace, iff the current value hasn't been replaced. The original idea
for CS was that storage locations requiring simple atomic update could
be done w/o implementing a blocking lock bracketing the operation (aka
incrementing/decrementing counters, management of control blocks on
LIFO, push/pop chains, etc).

to implement a CS locking scheme, adopt the convention that the
unlocked "state" is zero ..  so the code first zeros a register
... and to be realy fancy load the "thread id" of the current process
(assuming never equal to zero; tcb address or whatever) into the
replacement register and then execute compare & swap (doing a atomic
replace, iff the storage location value is currently zero). This
is logically equivalent to TS (although in TS the storage location
"unlocked state" is implicitly zero, while it has to be explicitly
specified in a CS convention).

This CS approach not only "acquires" the lock if nobody currently
"owns" the lock ... but acquires ownership using the current
thread/process identifier. The CS will fail if the storage location
isn't already zero and the code can spin (as possible in the TS case)
... or do whatever else it deems necessary. When the thread/process
that "owns" the lock is complete ... it zeros the storage location.
There are various kinds of debugging that can be done ... for instance
it can double check that it actually still "owns" the lock in question
by checking its current thread/process id with the value stored in the
lock.

--



Thu, 04 Dec 2003 07:00:58 GMT  
 Test and Set (TS) vs Compare and Swap (CS)
One critical point to mention, in this is that ALL update access to this
field MUST be done via CS, ie when the owner of the lock is finished, it
must load 0 into the replacement register, its "thread id" (or whatever was
stored into the lockword) then use CS to reset the lockword back to zero.



Quote:

> > The idea is that the first process to hit the code flows through.
> > Second and latter processes spin on the loop.  When the first process
> > has finished the protected computation, the MVI resets the flag, and
> > that allows one of the other processes to enter the critical region.

> > Yes CS is more flexible, and can often be used more efficiently.  But
> > to be maximally effective, it requires a different coding approach
> > from the above.

> a typical CS operation loads the current value of the storage location
> and the new value of the storage location (which can be something as
> simple as incrementing the current value) and then does an atomic
> replace, iff the current value hasn't been replaced. The original idea
> for CS was that storage locations requiring simple atomic update could
> be done w/o implementing a blocking lock bracketing the operation (aka
> incrementing/decrementing counters, management of control blocks on
> LIFO, push/pop chains, etc).

> to implement a CS locking scheme, adopt the convention that the
> unlocked "state" is zero ..  so the code first zeros a register
> ... and to be realy fancy load the "thread id" of the current process
> (assuming never equal to zero; tcb address or whatever) into the
> replacement register and then execute compare & swap (doing a atomic
> replace, iff the storage location value is currently zero). This
> is logically equivalent to TS (although in TS the storage location
> "unlocked state" is implicitly zero, while it has to be explicitly
> specified in a CS convention).

> This CS approach not only "acquires" the lock if nobody currently
> "owns" the lock ... but acquires ownership using the current
> thread/process identifier. The CS will fail if the storage location
> isn't already zero and the code can spin (as possible in the TS case)
> ... or do whatever else it deems necessary. When the thread/process
> that "owns" the lock is complete ... it zeros the storage location.
> There are various kinds of debugging that can be done ... for instance
> it can double check that it actually still "owns" the lock in question
> by checking its current thread/process id with the value stored in the
> lock.

> --




Tue, 09 Dec 2003 12:13:12 GMT  
 Test and Set (TS) vs Compare and Swap (CS)
On Fri, 22 Jun 2001 04:13:12 GMT, "Wayne Driscoll"

Quote:

>One critical point to mention, in this is that ALL update access to this
>field MUST be done via CS, ie when the owner of the lock is finished, it
>must load 0 into the replacement register, its "thread id" (or whatever was
>stored into the lockword) then use CS to reset the lockword back to zero.

And the reason for this is...?  The proposed scheme was a spin, and
since no one mentioned recovery let's assume that only the holder of
the lock ever releases (zeroes) it.  Why must that be done with CS?
----------
Bill Manry - IBM Products Division - Oracle Corp. USA
These are my opinions, not Oracle's.


Wed, 10 Dec 2003 11:48:26 GMT  
 Test and Set (TS) vs Compare and Swap (CS)
[ posted and emailed ]



Quote:
> And the reason for this is...?  The proposed scheme was a spin, and
> since no one mentioned recovery let's assume that only the holder of
> the lock ever releases (zeroes) it.  Why must that be done with CS?

In theory, if setting the value back to "cleared" isn't atomic, the waiting
process could detect the semaphore's availability and proceed to acquire it,
and have the original process re-clear it as part of the non-atomic clearing
process.

For example, suppose the semaphore word were the last word of a page, and
the clearing process was clearing it with an MVC that was 5 bytes long, also
clearing the first byte of the next page. A page fault would interrupt the
MVC, but since the MVC is not specified as atomic, the zeroed value might be
available long enough for the waiting process to believe it had acquired the
semaphore. When the second page becomes available, the MVC would re-run, and
re-clear that semaphore, leaving the critical code unprotected.

I'm not sure if the S/390 architecture would protect against this particular
scenario, but in theory, the clearing needs to be done in an atomic fashion.
On S/390, relatively few memory-update instructions are guaranteed to be
atomic (though many are, in practice), so using CS is safest.

--



Wed, 10 Dec 2003 17:27:34 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

>[ posted and emailed ]


>> And the reason for this is...?  The proposed scheme was a spin, and
>> since no one mentioned recovery let's assume that only the holder of
>> the lock ever releases (zeroes) it.  Why must that be done with CS?
>In theory, if setting the value back to "cleared" isn't atomic, the waiting
>process could detect the semaphore's availability and proceed to acquire it,
>and have the original process re-clear it as part of the non-atomic clearing
>process.

As long as it is cleared in a block-coherent operation, there should
be no problem.  A store of a word from a register should do fine.

Quote:
>For example, suppose the semaphore word were the last word of a page, and
>the clearing process was clearing it with an MVC that was 5 bytes long, also
>clearing the first byte of the next page.

I expect that MVC updates physical memory in blocks that are multiples
of the bandwidth of the memory bus.


Wed, 10 Dec 2003 23:21:29 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

> In theory, if setting the value back to "cleared" isn't atomic, the waiting
> process could detect the semaphore's availability and proceed to acquire it,
> and have the original process re-clear it as part of the non-atomic clearing
> process.

> For example, suppose the semaphore word were the last word of a page, and
> the clearing process was clearing it with an MVC that was 5 bytes long, also
> clearing the first byte of the next page. A page fault would interrupt the
> MVC, but since the MVC is not specified as atomic, the zeroed value might be
> available long enough for the waiting process to believe it had acquired the
> semaphore. When the second page becomes available, the MVC would re-run, and
> re-clear that semaphore, leaving the critical code unprotected.

> I'm not sure if the S/390 architecture would protect against this particular
> scenario, but in theory, the clearing needs to be done in an atomic fashion.
> On S/390, relatively few memory-update instructions are guaranteed to be
> atomic (though many are, in practice), so using CS is safest.

two possible conventions for CS use

1) updating counters, pointers, etc. ... which can only be done
by atomic operations

2) locking conventions that are typically zero/non-zero

The issue here is can a non-atomic instruction be used to clear a
"lock" to zero?

The issue would be a spinning atomic instruction see the results of a
storage update by a non-atomic instruction, believe the lock to be
"free" (because it has been set to zero), change it to non-zero, and
then have the non-atomic instruction wipe out the recently set
non-zero value.

Note that MVC is a not a partially executable instruction (like MVCL,
at least in all the 360 & 370 hardware manuals that I dealt with))
... it is supposedly has to either be able to completely execute or
not execute (i.e. if the ending address would cause a page fault then
that has to be pretested before starting the instruction).

Now, normal MVC shouldn't be able to cause the problem referenced
(i.e.  it couldn't set the word to zero, have an atomic instruction
spinning on "zero" change the value to non-zero, and then the MVC
rerun and wipe out the value set by the atomic CS instruction) ...
the general instruction retry facility didn't preclude such events (at
least in 370 hardware manuals that I dealt with) ... aka because of
various fault conditions leading to instruction retry ... any of the
non-atomic instructions could result in a storage location being
changed multiple times (and other processor operations wouldn't
necessarily be locked out).

So, at least at the time of the introduction of CS on 370
... non-atomic instructions were defined to not result in the effect
described ... but (at least) instruction retry of non-atomic
instructions could result in the condition.

Now, 370/125 when initially shipped had a bug that was just the
opposite (to the mvc crossing page boundary). The MVCL microcode did
the ending address pretest (like done for a MVC) and didn't initiate
the instruction at all if the ending-address check failed (i.e. the
long instructions were suppose to incrementally execute a byte at a
time as well as be interruptable). The 370/125 MVCL microcode had to
be retrofitted in the field to correctly correspond to the definition
of MVCL.

--



Thu, 11 Dec 2003 00:40:22 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:
> So, at least at the time of the introduction of CS on 370
> ... non-atomic instructions were defined to not result in the effect
> described ... but (at least) instruction retry of non-atomic
> instructions could result in the condition.

modulo the scenerio like MVCL where most of the non-zero value was
already zero except for some leading bytes/bits ... the leading bits
became zero (via something like MVCL), which satisified the CS
requirements, the value replaced, and then the instruction proceeded
to zero the remaining bits/bytes (wipeing out portion of the value set
by CS).

To the extent the CS convention had at least a 1) non-zero bit in the
low-order bit and 2) checked for zero it shouldn't be a problem
(except under the instruction retry scenerio). It could be a problem
in my original scenerio of using an address (rather than having specific
non-zero bit pattern) ... which could randomly have any amount of zero
bits in the lower order byte positions.

--



Thu, 11 Dec 2003 00:52:50 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:

>On Fri, 22 Jun 2001 04:13:12 GMT, "Wayne Driscoll"

>>One critical point to mention, in this is that ALL update access to this
>>field MUST be done via CS, ie when the owner of the lock is finished, it
>>must load 0 into the replacement register, its "thread id" (or whatever was
>>stored into the lockword) then use CS to reset the lockword back to zero.
>And the reason for this is...?  The proposed scheme was a spin, and
>since no one mentioned recovery let's assume that only the holder of
>the lock ever releases (zeroes) it.  Why must that be done with CS?

The POP seems to say that this is required for multiple CPU access.
If you are just doing spin locks, TS should work.  It might be that CS
is faster than TS, though.  

Another thing about CS that has been said in one of these newsgroups,
is that you must use the new value that CS loads when the compare fails.
Because of the way that the cache works, it is possible that the wrong
value will be loaded by a subsequent load instruction.  This may also
cause problems clearing without using CS.

-- glen



Fri, 12 Dec 2003 06:52:08 GMT  
 Test and Set (TS) vs Compare and Swap (CS)

Quote:
>Note that MVC is a not a partially executable instruction (like MVCL,
>at least in all the 360 & 370 hardware manuals that I dealt with))

Just a note of triva, the MVC instruction would have been interruptable
on 360/44 machines that had the optional feature (I've forgotten what it
was called) that supported variable length instructions.

This isn't relevant to the current context (CS vs. TS), of course, but
it is an oddity of the S/360 line of computers.

--
No electrons were injured in the preparation of this message.





Fri, 12 Dec 2003 10:32:02 GMT  
 Test and Set (TS) vs Compare and Swap (CS)
In the proposed design only the holder of a held lock can update it,
so the only benefit of using CS to clear it is that CS includes the
so-called checkpoint serialization function, the need for which is
doubtful in this situation.  (If desired, it can be done separately as
a 'BCR 15,0' instruction.)  S/390 operand consistency rules mean
either a ST or a 4-byte MVC of zeroes is sufficiently atomic for this
purpose.  Note that the lockword must be aligned because it is the
target of a CS when the lock is being acquired.

Besides, if you use CS to clear it, there is the nagging little
problem of whether (and what) to code for the following conditional
branch. ;-)

I'm not suggesting that the proposed spin lockword is a good design,
by the way.  Just commenting on the suggestion that CS was required
for clearing the lock...it isn't.

/b

On Sat, 23 Jun 2001 16:40:22 GMT, Anne & Lynn Wheeler

Quote:


>> In theory, if setting the value back to "cleared" isn't atomic, the waiting
>> process could detect the semaphore's availability and proceed to acquire it,
>> and have the original process re-clear it as part of the non-atomic clearing
>> process.

>> For example, suppose the semaphore word were the last word of a page, and
>> the clearing process was clearing it with an MVC that was 5 bytes long, also
>> clearing the first byte of the next page. A page fault would interrupt the
>> MVC, but since the MVC is not specified as atomic, the zeroed value might be
>> available long enough for the waiting process to believe it had acquired the
>> semaphore. When the second page becomes available, the MVC would re-run, and
>> re-clear that semaphore, leaving the critical code unprotected.

>> I'm not sure if the S/390 architecture would protect against this particular
>> scenario, but in theory, the clearing needs to be done in an atomic fashion.
>> On S/390, relatively few memory-update instructions are guaranteed to be
>> atomic (though many are, in practice), so using CS is safest.

>two possible conventions for CS use

>1) updating counters, pointers, etc. ... which can only be done
>by atomic operations

>2) locking conventions that are typically zero/non-zero

>The issue here is can a non-atomic instruction be used to clear a
>"lock" to zero?

>The issue would be a spinning atomic instruction see the results of a
>storage update by a non-atomic instruction, believe the lock to be
>"free" (because it has been set to zero), change it to non-zero, and
>then have the non-atomic instruction wipe out the recently set
>non-zero value.

>Note that MVC is a not a partially executable instruction (like MVCL,
>at least in all the 360 & 370 hardware manuals that I dealt with))
>... it is supposedly has to either be able to completely execute or
>not execute (i.e. if the ending address would cause a page fault then
>that has to be pretested before starting the instruction).

>Now, normal MVC shouldn't be able to cause the problem referenced
>(i.e.  it couldn't set the word to zero, have an atomic instruction
>spinning on "zero" change the value to non-zero, and then the MVC
>rerun and wipe out the value set by the atomic CS instruction) ...
>the general instruction retry facility didn't preclude such events (at
>least in 370 hardware manuals that I dealt with) ... aka because of
>various fault conditions leading to instruction retry ... any of the
>non-atomic instructions could result in a storage location being
>changed multiple times (and other processor operations wouldn't
>necessarily be locked out).

>So, at least at the time of the introduction of CS on 370
>... non-atomic instructions were defined to not result in the effect
>described ... but (at least) instruction retry of non-atomic
>instructions could result in the condition.

>Now, 370/125 when initially shipped had a bug that was just the
>opposite (to the mvc crossing page boundary). The MVCL microcode did
>the ending address pretest (like done for a MVC) and didn't initiate
>the instruction at all if the ending-address check failed (i.e. the
>long instructions were suppose to incrementally execute a byte at a
>time as well as be interruptable). The 370/125 MVCL microcode had to
>be retrofitted in the field to correctly correspond to the definition
>of MVCL.

>--


/b
--
Bill Manry - Oracle Corporation USA
These are my opinions, not necessarily Oracle's.


Fri, 12 Dec 2003 23:07:51 GMT  
 
 [ 37 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. TEST-AND-SET (or COMPARE-AND-SWAP)

2. Strange use of compare-and-swap

3. Strange use of compare-and-swap

4. Manually setting CR0.TS bit?

5. good name for GET-CURRENT SWAP SET-CURRENT

6. Ts 3.1 src in Ts 1.17

7. Swapping values [XOR][SWAP]

8. Will setting $test change $_REQUEST['test']

9. Refactored Plus All Missing CS ANSI Tests In SIF

10. Refactored Plus All Missing CS ANSI Tests In SIF

11. to CS: or not to CS: in F-PC assembler

12. AKCL vs Fortran vs ML time test

 

 
Powered by phpBB® Forum Software