Quote:

>Perhaps I haven't properly named my function.

>So you have misunderstood the purpose of my code.

>It does *not* rigorously determine whether a number is a perfect square.

>It is just step 1 of such a procedure.

The confusion came from your use of the following line:

Quote:

>>Output: ZF = 0 iff input is not a square

which says "zero flag is set if and only if input is not a square". iff is

much more specific than if :)

Your explanation in your last post:

Quote:

>A function is_perfect_square() is implemented like this:

>Step 1: reject quadratic non-residues mod 4

>Step 2: reject quadratic non-residues modulo small primes or prime powers

> (2^k, 3^2, 5, 7, ...)

>Step 3: extract square root

does indicate that this is only a part of the overall function. What I have

found hard to follow in your code is that some of the comments are not clear.

I'll explain:

Quote:

>int is_probable_square (unsigned long n)

that makes more sense, it shows that the function is not intended to determine

on it's own for all numbers if they are squares (if I'm being picky, it's

because I am considering writing an arbitrary precision calculator, and I need

to be aware of problems like this in algorithms in order to try and avoid them

myself)

Quote:

>{

> unsigned long lsb = n & -n; /* locate least non-zero bit of n */

/* locates VALUE of least non-zero bit, which makes a big difference */

/* to the overall result. If it had been intended to be the position */

/* of the bit (as I read it) then it would have been wrong <G> */

Quote:

> lsb = n & (3*lsb); /* lsb = least 2 bits of n */

> return (lsb & 0x55555555) == lsb;

>}

>So 0, 4, 5, 9, 36 are *probably* squares, 3, 7, 28 are *definitely* non-squares.

yes, that works for me :)

Quote:

>The x86 assembler version (slightly improved):

>Input: EAX

>Output: ZF (zero flag) = 0 if input is a non-square

If ZF = 0 then input is a non square.

The way you put it, every non-square would set the zero flag (that would

matter if someone came back to this code snippet at another time and thought

it worked as commented, and the OP was asking how to implement an algorithm in

x86 as he does not know x86 assembly).

Quote:

>Clobbers: EAX, EDX

> mov edx, eax

> neg eax

> and eax, edx

> lea eax, [2*eax+eax]

> and eax, edx

> mov edx, eax

> and eax, 55555555h

> cmp eax, edx

I would add a couple of things to the initial algorithm (not sure how to

describe them in the same mathematical notation you use to describe them)

If (x mod 8)==1 then x might be a square

if (x mod 4)==0 then x might be a square

this would produce the following changes to your code(in nasm format):

Input: EAX

Output: if Carry flag set then number is not a square

if not set, then it might be a square

Is_possible_square:

mov edx,eax ;

and edx,06h ; mask out bits 1 and 2

or edx,edx ; does edx == 0?

jnz .1 ; if not, jump and set carry

; this is where your code would come in

.0

mov edx,eax

neg eax

and eax,edx

lea eax,[2*eax+eax]

and eax,edx

mov edx,eax

and eax,55555555h

cmp eax,edx

; end of original code

jnz .2

.1

stc

.2

ret

The code I added basically spots all even numbers which are not multiples of 4

and odd numbers which do not take the form 8n+1 (although it doesn't

ditinguish between them all), so it counts out a lot of numbers instantly.

This code does not include the code for removing trailing pairs of zeros (that

would only need 4 more lines of code, with the number of pairs stored in CL)

the first four lines of code take out 5/6 of all numbers, so although it uses

more code overall than your routine on it's own it does help to speed up the

routine by removing 3 clock cycles from 5/6 of calculations (on a pentium) and

only adds 5-7 clock cycles to the remainder (5 if your code doesn't detect a

definite non square, 7 if it does). Hence it would (assuming a worstt case

scenario of your routine detecting all numbers as non squares, which it

obviously wouldn't) take an average of 6.5 cycles per number instead of 8 and

find more non squares.

Of course, this code doesn't allow for the possibility that the original

number might have been a power of 4 (something I would consider coding for).

It made a big difference when I understood what you were doing :)

debs