New Mathematical bit-twiddle - stage on of a isPerfectSquare()
Author Message
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

I have a pseudo-code idea (it maps directly onto Alpha assembly)
If anyone can turn it into optimal x86 code so that I can submit that
too to the GMP guys, I'd be most grateful.

A basic issquare test has 3 stages:
1) shift out pairs of training zeroes, and then check the lowest3 bits
are '001'. If not, bail out. (Bail out ratio 5/6)
2) check if the number is a quadratic residue modulo a handful of
small primes or prime powers. If not, bail out. (Bail out ratio =
(p-1)/2p per prime, so with a dozen primes that's most of them)
3) find the square root, is it exact?

Anyway - I've think I worked out how to do stage 1 in 5 clock ticks on
a superscalar architecture! I'd like some feedback on whether you
think it's a goer or not, please.

In glorious annotated pseudo assembler :
R1 <- input x
-------------
R2 <- Rzero - R1   ; -x (what do you mean you have no Rzero?)
R3 <- 0b1010...010 ; odd numbered bits

R4 <- R2 & R1      ; lowest set bit, the '1'!
R1 >>= 1           ; get the bit above the '1' to line up with it

R2 <- 3*R4         ; turns the '1' into a '11' mask
R3 &= R4           ; is the '1' an odd numbered bit?

; * could bailout here if R3 is non-0
R1 &= R2           ; isolate the two bits above the '1'

R1 |= R3           ; has either gone wrong?
--------------
R1 -> output 0 if potential square
else not

Of course there are 2 bailout conditions (R1 has a set bit or R3 has a
set bit) and you can quit at the * line if you want.

O(1) time, 4 or 5 ticks, even for 64-bit values, where the
'traditional' method could sometimes loop dozens of times!

Phil

Sat, 24 Jan 2004 20:38:52 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

Quote:

> I have a pseudo-code idea (it maps directly onto Alpha assembly)
> If anyone can turn it into optimal x86 code so that I can submit that
> too to the GMP guys, I'd be most grateful.

> A basic issquare test has 3 stages:
> 1) shift out pairs of training zeroes, and then check the lowest3 bits
> are '001'. If not, bail out. (Bail out ratio 5/6)

Why do you check the lowest *three* bits ?
If you shift out pairs of trailing zeroes (divide by a power of 4),
then I guess you want to check the quotient modulo 4,
i.e. look at the least *two* bits.

Let n be the input number.
Let k be the largest power of 4 dividing n.

n = m * 4^k     m > 0, k >= 0
m != 1 mod 4  ==>  n is not a square

In C:

/* assumes unsigned long is 32 bits */

int
is_square (unsigned long n)
{
unsigned long mask = n & -n;    /* mask = least bit of n */

Quote:
}

In x86 assembler:

Input:    EAX
Output:   ZF = 0    iff input is not a square
Clobbers: EAX, ECX, EDX

mov  edx, eax
neg  eax
and  eax, edx
lea  ecx, [2*eax]
and  ecx, edx
or   eax, ecx
mov  edx, eax
and  eax, 55555555h
cmp  eax, edx

---
Joe Leherbauer             Leherbauer at telering dot at

"Somewhere something incredible is waiting to be known."
-- Isaac Asimov

Sat, 24 Jan 2004 22:45:24 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

Quote:

>> I have a pseudo-code idea (it maps directly onto Alpha assembly)
>> If anyone can turn it into optimal x86 code so that I can submit that
>> too to the GMP guys, I'd be most grateful.

<snip>

I couldn't quite follow the original pseudo code, it didn't make sense to me
based on the math that I did at university (not saying it wouldn't work, just
that it's based on an algorithm I never came across and I ahve trouble seeing
how it would work without a reasonable example showing how it would work with
some real numbers).

What I can see is that your code would always return z. I'll show you why with

Quote:
>/* assumes unsigned long is 32 bits */

>int
>is_square (unsigned long n)
>{
>  unsigned long mask = n & -n;    /* mask = least bit of n */

n & -n = 0 (taking -n as the same as neg in assembly)
This is because each bit that is set in n will be unset in -n and vice versa.
Quote:
>  mask |= n & (mask << 1);        /* mask = least 2 bits of n */

mask still equals 0, because you are orring it with 0 (orignal number AND 0)
Quote:

return 0 (0 AND 0x55555555)
Quote:
>}

>In x86 assembler:

>Input:    EAX
>Output:   ZF = 0    iff input is not a square
>Clobbers: EAX, ECX, EDX

>  mov  edx, eax
edx = eax
>  neg  eax
edx = -eax
>  and  eax, edx

eax = 0
you just anded 0s with 1s and 1s with 0s
Quote:
>  lea  ecx, [2*eax]
ecx = 0 (ecx = 2*0)
>  and  ecx, edx
ecx still = 0
>  or   eax, ecx
eax = 0 (0 OR 0)
>  mov  edx, eax
edx = 0
>  and  eax, 55555555h

eax = 0 (0 AND 55555555h)

Quote:
>  cmp  eax, edx
z flag set (0 = 0)

I can't offer a better way to implement the algorithm without taking more time
to try and understand it, becasue it just didn't make sense to me, but I know
your code would always return 0 or z (C or asm versions respectively)

debs

Sun, 25 Jan 2004 07:30:03 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

I just spotted a mistake in my logic when correcting your code, I read NEG as
NOT.

I'm going to have to take a good look at the algorithm tomorrow and check it
out properly, it would be a real fast test for a square number if it works :)

I just tested your code (in a de{*filter*}) with an input of 5, and it gives the
response that 5 is a square number, so there is a flaw in there somwhere.
Unfortunately I am not up to date on my algorithms, so I cannot say where the
error is, but I can see something is wrong. I'd be interested to see when (and
if) you have a working version of the code, because I am interested in
producing a simple arbitrary precision calculator for my OS when I get farther
on in the development, as a way of doing soem real testing, and fast
algorithms would help a lot in producing something that is worth talking about
:)

Quote:

>>In x86 assembler:

>>Input:    EAX
>>Output:   ZF = 0    iff input is not a square
>>Clobbers: EAX, ECX, EDX

>>  mov  edx, eax
>edx = eax
>>  neg  eax
>edx = -eax
>>  and  eax, edx
>eax = 0
>you just anded 0s with 1s and 1s with 0s
>>  lea  ecx, [2*eax]
>ecx = 0 (ecx = 2*0)
>>  and  ecx, edx
>ecx still = 0
>>  or   eax, ecx
>eax = 0 (0 OR 0)
>>  mov  edx, eax
>edx = 0
>>  and  eax, 55555555h
>eax = 0 (0 AND 55555555h)
>>  cmp  eax, edx
>z flag set (0 = 0)

debs

Sun, 25 Jan 2004 08:03:26 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

:n & -n = 0 (taking -n as the same as neg in assembly)
:This is because each bit that is set in n will be unset in -n and vice
:versa.

The above is not correct. You are describing the behavior of the "not"
instruction. The "neg" instruction returns the value of the operand
subtracted from zero. For example, let n = 3h. Then -n = 0fffffffdh, and n
& -n = 1, which is indeed the least significant bit of n.

-- Chuck Crayne
-----------------------------------------------------------

-----------------------------------------------------------

Sun, 25 Jan 2004 08:30:42 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

Quote:

> I just spotted a mistake in my logic when correcting your code, I read NEG as
> NOT.

> I'm going to have to take a good look at the algorithm tomorrow and check it
> out properly, it would be a real fast test for a square number if it works :)

> I just tested your code (in a de{*filter*}) with an input of 5, and it gives the
> response that 5 is a square number, so there is a flaw in there somwhere.

Perhaps I haven't properly named my function.
So you have misunderstood the purpose of my code.
It does *not* rigorously determine whether a number is a perfect square.
It is just step 1 of such a procedure.

A function is_perfect_square() is implemented like this:

Step 1: reject quadratic non-residues mod 4
Step 2: reject quadratic non-residues modulo small primes or prime powers
(2^k, 3^2, 5, 7, ...)
Step 3: extract square root

In order to avoid an expensive square root computation,
steps 1 and 2 *quickly* filter out many non-squares
while letting all squares and a few non-squares through.

Step 1:
Let n = m * 2^(2k) be the input number.
Obviously, 2^(2k) = 4^k is a perfect square.
If m == 0 or 1 mod 4 then m, and therefore also n *might* be a square.
But if m == 2 or 3 mod 4 then m, hence n is a *proven* non-square.

So step 1 can be implemented like this
(assuming 32-bit word size and 2's complement arithmetic):

int is_probable_square (unsigned long n)
{
unsigned long lsb = n & -n;    /* locate least non-zero bit of n */
lsb = n & (3*lsb);             /* lsb = least 2 bits of n */
return (lsb & 0x55555555) == lsb;

Quote:
}

So 0, 4, 5, 9, 36 are *probably* squares, 3, 7, 28 are *definitely* non-squares.

The x86 assembler version (slightly improved):

Input:    EAX
Output:   ZF (zero flag) = 0  if input is a non-square
Clobbers: EAX, EDX

mov  edx, eax
neg  eax
and  eax, edx
lea  eax, [2*eax+eax]
and  eax, edx
mov  edx, eax
and  eax, 55555555h
cmp  eax, edx

---
Joe Leherbauer             Leherbauer at telering dot at

"Somewhere something incredible is waiting to be known."
-- Isaac Asimov

Sun, 25 Jan 2004 17:50:26 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

Quote:
>Perhaps I haven't properly named my function.
>So you have misunderstood the purpose of my code.
>It does *not* rigorously determine whether a number is a perfect square.
>It is just step 1 of such a procedure.

The confusion came from your use of the following line:

Quote:
>>Output:   ZF = 0    iff input is not a square

which says "zero flag is set if and only if input is not a square". iff is
much more specific than if :)

Quote:
>A function is_perfect_square() is implemented like this:

>Step 1: reject quadratic non-residues mod 4
>Step 2: reject quadratic non-residues modulo small primes or prime powers
>        (2^k, 3^2, 5, 7, ...)
>Step 3: extract square root

does indicate that this is only a part of the overall function. What I have
I'll explain:

Quote:

>int is_probable_square (unsigned long n)

that makes more sense, it shows that the function is not intended to determine
on it's own for all numbers if they are squares (if I'm being picky, it's
because I am considering writing an arbitrary precision calculator, and I need
to be aware of problems like this in algorithms in order to try and avoid them
myself)

Quote:
>{
>  unsigned long lsb = n & -n;    /* locate least non-zero bit of n */

/* locates VALUE of least non-zero bit, which makes a big difference */
/* to the overall result. If it had been intended to be the position */
/* of the bit (as I read it) then it would have been wrong <G> */

Quote:
>  lsb = n & (3*lsb);             /* lsb = least 2 bits of n */
>  return (lsb & 0x55555555) == lsb;
>}

>So 0, 4, 5, 9, 36 are *probably* squares, 3, 7, 28 are *definitely* non-squares.

yes, that works for me :)

Quote:
>The x86 assembler version (slightly improved):

>Input:    EAX
>Output:   ZF (zero flag) = 0  if input is a non-square

If ZF = 0 then input is a non square.
The way you put it, every non-square would set the zero flag (that would
matter if someone came back to this code snippet at another time and thought
it worked as commented, and the OP was asking how to implement an algorithm in
x86 as he does not know x86 assembly).

Quote:
>Clobbers: EAX, EDX

>  mov  edx, eax
>  neg  eax
>  and  eax, edx
>  lea  eax, [2*eax+eax]
>  and  eax, edx
>  mov  edx, eax
>  and  eax, 55555555h
>  cmp  eax, edx

I would add a couple of things to the initial algorithm (not sure how to
describe them in the same mathematical notation you use to describe them)

If (x mod 8)==1 then x might be a square
if (x mod 4)==0 then x might be a square

this would produce the following changes to your code(in nasm format):

Input:    EAX
Output:   if Carry flag set then number is not a square
if not set, then it might be a square

Is_possible_square:

mov     edx,eax         ;
and     edx,06h         ; mask out bits 1 and 2
or      edx,edx         ; does edx == 0?
jnz     .1              ; if not, jump and set carry

; this is where your code would come in

.0
mov     edx,eax
neg     eax
and     eax,edx
lea     eax,[2*eax+eax]
and     eax,edx
mov     edx,eax
and     eax,55555555h
cmp     eax,edx

; end of original code

jnz     .2
.1
stc
.2
ret

The code I added basically spots all even numbers which are not multiples of 4
and odd numbers which do not take the form 8n+1 (although it doesn't
ditinguish between them all), so it counts out a lot of numbers instantly.

This code does not include the code for removing trailing pairs of zeros (that
would only need 4 more lines of code, with the number of pairs stored in CL)

the first four lines of code take out 5/6 of all numbers, so although it uses
more code overall than your routine on it's own it does help to speed up the
routine by removing 3 clock cycles from 5/6 of calculations (on a pentium) and
only adds 5-7 clock cycles to the remainder (5 if your code doesn't detect a
definite non square, 7 if it does). Hence it would (assuming a worstt case
scenario of your routine detecting all numbers as non squares, which it
obviously wouldn't) take an average of 6.5 cycles per number instead of 8 and
find more non squares.

Of course, this code doesn't allow for the possibility that the original
number might have been a power of 4 (something I would consider coding for).

It made a big difference when I understood what you were doing :)

debs

Mon, 26 Jan 2004 02:35:44 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()

Quote:

> The confusion came from your use of the following line:

> >>Output:   ZF = 0    iff input is not a square

> which says "zero flag is set if and only if input is not a square". iff is
> much more specific than if :)

You are right.  Sorry.

If you are interested, here is a C version of is_probable_square()
which checks quadratic-residuosity modulo small primes and prime powers.
It uses tables of quadratic residues which are encoded in bit vectors.

Example:
The quadratic residues mod 5 are 0, 1, 4.
These are encoded in the number 2^0 + 2^1 + 2^4 = 19 = 0x13.

#define pow2(n) (1 << (n))

int is_probable_square (unsigned long N)
{
if ( (pow2(N % 32) & 0x2030213) == 0 )   /* mod 2^5 */
return 0;
if ( (pow2(N % 31) & 0x121D47B7) == 0 )
return 0;
if ( (pow2(N % 29) & 0x13D122F3) == 0 )
return 0;
if ( (pow2(N % 23) & 0x5335F) == 0 )
return 0;
if ( (pow2(N % 19) & 0x30AF3) == 0 )
return 0;
if ( (pow2(N % 17) & 0x1A317) == 0 )
return 0;
if ( (pow2(N % 13) & 0x161B) == 0 )
return 0;
if ( (pow2(N % 11) & 0x23B) == 0 )
return 0;
if ( (pow2(N % 7) & 0x017) == 0 )
return 0;
if ( (pow2(N % 9) & 0x93) == 0 )
return 0;
if ( (pow2(N % 5) & 0x13) == 0 )
return 0;

return 1;

Quote:
}

---
Joe Leherbauer             Leherbauer at telering dot at

"Somewhere something incredible is waiting to be known."
-- Isaac Asimov

Mon, 26 Jan 2004 16:22:21 GMT
New Mathematical bit-twiddle - stage on of a isPerfectSquare()
??????

--
[1;32m Origin:  [33m  [37mcs.npttc.edu.tw   [31m From:  [36m163.24.247.1 [m

Mon, 26 Jan 2004 21:38:49 GMT

 Page 1 of 1 [ 9 post ]

Relevant Pages