Crazy bittwiddle algorithms needed!
Author 
Message 
Phil Carmod #1 / 28

Crazy bittwiddle algorithms needed!
I'm trying to throw lots of bits into lots of buckets. It is more of an 'algorithmic' brainstorm I'm after rather than actual code. Say I have a 130 bit bitstring (in memory) with arbitrary bits set. I wish to then count the number of bits set in each of the N cyclic bit posistions. e.g. if N=17, I want Sum[0] = bit[0]+bit[17]+bit[34]+bit[51]+...+bit[119] And of course I want all 17 sums. I want to do this for a whole range of N, but generally N less than the word size will be most common. The length of the bitstring will be constant for the runtime of the code. I am prepared to rebuild my app for every run, but I'd rather have a datadriven approach. I am prepared to have a separate routine for every N (there will be about 15 values used), but again I'd rather keep my L1 clean, and have a simpler data driven approach. (though having say 3 versions of the code for small/medium/large N would be great). I think that having 17 buckets in the above is wasteful, as each sum can only reach 7. So in theory I could do something like the following (forgive the high level notation, I think better with symbols!) Case N=19: Input[0..4] holds the data Mask3s = 01001001001001001001001001001001b Maskto19 = 01111111111111111111b Bucket0 = 0 Bucket1 = 0 Bucket2 = 0 ; First word Bucket0 = Input[0]&Mask3s Bucket1 = (Input[0]>>1)&Mask3s Bucket2 = (Input[0]>>2)&Mask3s ; Now combine the top bits with the low bits Bucket0 = (Bucket0&Maskto19)+(Bucket0>>19) Bucket1 = (Bucket1&Maskto19)+(Bucket1>>19) Bucket1 = (Bucket2&Maskto19)+(Bucket2>>19) ; second word ; Now what? I can't rotate, as the top bits don't belong below the bottom ones. ; So do I need to treat this as 3 chunks? ;  the top bit of the 2nd 19 bits; the whole 3rd span of 19 bits; the bottom of the 4th span? ; etc... The other thing I thought of was trying to make 19bits the basic quantity, rather than the word width. So I'd step through the words more slowly, but have simpler operations at each stage. The other thing I thought of was the 'noddy' method. Scan each bit. If it's set, put it in the approprate bucket. The _other_ thing I though of was to have the numbers in a list rather than as a bitstring. Then I do a mod on each number, and increment the resulting bucket. I want to avoid DIVs, though, as they'll be slow in the processor family I'll be using. I'm sure there must be a 'sum bits in parallel' method, but the multiword nature increases the number of possibilities. What do people advise? Phil

Sat, 05 Jul 2003 22:14:47 GMT 


Christian Schuber #2 / 28

Crazy bittwiddle algorithms needed!
mh... I think that's fairly easy input: esi = pointer to bitstream ecx = number of bits to accumulate edx = distance between bits (=17 if you want to add every 17th bit) edi = first bit to test output: eax = number of set bits xor ebx, ebx xor eax, eax loop: bt [esi], edi // copies bit no. (edi) of bitstream to carryflag setc bl // sets bl=1 if carry set, otherwise 0 add eax, ebx // accumulate to eax add edi, edx // next bit... dec ecx jnz loop assumes of course, you're working in a 32 bit flat memory model... a 386er processor is at least required for the BT and SETcc commands Christian

Sun, 06 Jul 2003 05:48:14 GMT 


Randall Hyd #3 / 28

Crazy bittwiddle algorithms needed!
I suspect I'd mask the large value with a mask containing ones in the desired bit positions (this mask coming from a lookup table) and then using the generic bit counting algorithm. The table lookup is somewhat expensive, but I suspect it would be faster than other simple alternatives. Randy Hyde
Quote: > I'm trying to throw lots of bits into lots of buckets. It is more of an > 'algorithmic' brainstorm I'm after rather than actual code. > Say I have a 130 bit bitstring (in memory) with arbitrary bits set. I > wish to then count the number of bits set in each of the N cyclic bit > posistions. > e.g. if N=17, I want Sum[0] = > bit[0]+bit[17]+bit[34]+bit[51]+...+bit[119] > And of course I want all 17 sums. > I want to do this for a whole range of N, but generally N less than the > word size will be most common. > The length of the bitstring will be constant for the runtime of the > code. I am prepared to rebuild my app for every run, but I'd rather have > a datadriven approach. > I am prepared to have a separate routine for every N (there will be > about 15 values used), but again I'd rather keep my L1 clean, and have a > simpler data driven approach. (though having say 3 versions of the code > for small/medium/large N would be great). > I think that having 17 buckets in the above is wasteful, as each sum can > only reach 7. So in theory I could do something like the following > (forgive the high level notation, I think better with symbols!) > Case N=19: > Input[0..4] holds the data > Mask3s = 01001001001001001001001001001001b > Maskto19 = 01111111111111111111b > Bucket0 = 0 > Bucket1 = 0 > Bucket2 = 0 > ; First word > Bucket0 = Input[0]&Mask3s > Bucket1 = (Input[0]>>1)&Mask3s > Bucket2 = (Input[0]>>2)&Mask3s > ; Now combine the top bits with the low bits > Bucket0 = (Bucket0&Maskto19)+(Bucket0>>19) > Bucket1 = (Bucket1&Maskto19)+(Bucket1>>19) > Bucket1 = (Bucket2&Maskto19)+(Bucket2>>19) > ; second word > ; Now what? I can't rotate, as the top bits don't belong below the > bottom ones. > ; So do I need to treat this as 3 chunks? > ;  the top bit of the 2nd 19 bits; the whole 3rd span of 19 bits; the > bottom of the 4th span? > ; etc... > The other thing I thought of was trying to make 19bits the basic > quantity, rather than the word width. So I'd step through the words more > slowly, but have simpler operations at each stage. > The other thing I thought of was the 'noddy' method. > Scan each bit. If it's set, put it in the approprate bucket. > The _other_ thing I though of was to have the numbers in a list rather > than as a bitstring. Then I do a mod on each number, and increment the > resulting bucket. I want to avoid DIVs, though, as they'll be slow in > the processor family I'll be using. > I'm sure there must be a 'sum bits in parallel' method, but the > multiword nature increases the number of possibilities. What do people > advise? > Phil

Sun, 06 Jul 2003 05:48:19 GMT 


Richard Pavlice #4 / 28

Crazy bittwiddle algorithms needed!
Quote:
> I'm trying to throw lots of bits into lots of buckets. It is more of an > 'algorithmic' brainstorm I'm after rather than actual code. > Say I have a 130 bit bitstring (in memory) with arbitrary bits set. I > wish to then count the number of bits set in each of the N cyclic bit > posistions. > e.g. if N=17, I want Sum[0] = > bit[0]+bit[17]+bit[34]+bit[51]+...+bit[119] > And of course I want all 17 sums. > I want to do this for a whole range of N, but generally N less than the > word size will be most common. > The length of the bitstring will be constant for the runtime of the > code. I am prepared to rebuild my app for every run, but I'd rather have > a datadriven approach. > I am prepared to have a separate routine for every N (there will be > about 15 values used), but again I'd rather keep my L1 clean, and have a > simpler data driven approach. (though having say 3 versions of the code > for small/medium/large N would be great). > I think that having 17 buckets in the above is wasteful, as each sum can > only reach 7. So in theory I could do something like the following > (forgive the high level notation, I think better with symbols!) > Case N=19: > Input[0..4] holds the data > Mask3s = 01001001001001001001001001001001b > Maskto19 = 01111111111111111111b > Bucket0 = 0 > Bucket1 = 0 > Bucket2 = 0 > ; First word > Bucket0 = Input[0]&Mask3s > Bucket1 = (Input[0]>>1)&Mask3s > Bucket2 = (Input[0]>>2)&Mask3s > ; Now combine the top bits with the low bits > Bucket0 = (Bucket0&Maskto19)+(Bucket0>>19) > Bucket1 = (Bucket1&Maskto19)+(Bucket1>>19) > Bucket1 = (Bucket2&Maskto19)+(Bucket2>>19) > ; second word > ; Now what? I can't rotate, as the top bits don't belong below the > bottom ones. > ; So do I need to treat this as 3 chunks? > ;  the top bit of the 2nd 19 bits; the whole 3rd span of 19 bits; the > bottom of the 4th span? > ; etc... > The other thing I thought of was trying to make 19bits the basic > quantity, rather than the word width. So I'd step through the words more > slowly, but have simpler operations at each stage. > The other thing I thought of was the 'noddy' method. > Scan each bit. If it's set, put it in the approprate bucket. > The _other_ thing I though of was to have the numbers in a list rather > than as a bitstring. Then I do a mod on each number, and increment the > resulting bucket. I want to avoid DIVs, though, as they'll be slow in > the processor family I'll be using. > I'm sure there must be a 'sum bits in parallel' method, but the > multiword nature increases the number of possibilities. What do people > advise?
Assuming you have to total each case (from 0 to N1), it seems the 'noddy' method, as you say, is the only practical way to go. I would set up a routine like this: ;esi = bitstring pointer ;ebx = bit bucket pointer ;ch = number of dwords in bit string ;cl = cycle count (N) mov dl,cl ;cycle count
dec ch ;dword counter
mov dh,32 ;32 bits in dword lodsd ;get dword in eax
shr eax,1 ;lowest bit into carry adc cp[edi],0 ;add to total inc edi ;next bit bucket dec dl ;cycle counter
mov dl,cl ;reset cycle counter mov edi,ebx ;reset bit bucket pointer sub dh,cl ;reduce bit count by N
add dh,cl ;restore deficit
shr eax,1 ;lowest bit into carry adc cp[edi],0 ;add to total inc edi ;next bit bucket dec dh ;dword done yet?
I haven't tested this. Hope it helps.  Richard Pavlicek Web site: http://www.rpbridge.net

Sun, 06 Jul 2003 07:42:47 GMT 


Rick C. Hodgi #5 / 28

Crazy bittwiddle algorithms needed!
How about an algorithm where you take 17 bytes (or whatever the maximum number will be under your current Norder) and shift the bits from your source through the carry into each one in sequence (the first bit into the first byte, the 2nd bit into the 2nd byte, 3rd into the 3rd....17th into the 17th, then 18th into 1st, etc. When you're finished you'll have 17 bytes, each one representing its order sum via itself through a fixed table. Logically, there will be a bit in each location in the byte from the correct locations obtained from the source. And to find the sum do a simple lookup based against a fixed table using itself as the index. The result will tell the number of set bits. If your order falls below a certain threshold then your lookup will have to be increased to accommodate. At that point it might be a trade off for storage space and a simple postprocessing countbit algorithm. :) Your lookup table would be such that: 00000000b = 0 00000001b = 1 00000010b = 1 00000011b = 2 00000100b = 1 00000101b = 2 00000111b = 3 00001000b = 1 .. .. .. 11111100b = 6 11111101b = 7 11111111b = 8 mov esi,offset source_bits mov ecx,total_num_of_bits mov edi,offset output_area mov edx,edi add edx,17 ; Whatever order you're wanting top_loop: mov ebx,8 inner_loop: shl byte ptr [esi] rcl byte ptr [edi] inc edi cmp edi,edx jbe still_going mov edi,edx still_going: dec ebx jz next_part loop inner_loop next_part: inc esi loop top_loop mov edi,edx ; Right now the data at [edi] will represent ; seven{*filter*} values that can be applied to the ; lookup table above to determine their total ; Use an algorithm similar to this to extract the totals mov ecx,17 next_lookup: movzx ebx,byte ptr [edi] movzx eax,byte ptr lookup_table[ebx] ; Right now, eax contains the total of the bits for this element ; Display it, do whatever ... inc edi loop next_lookup  Rick C. Hodgin

Sun, 06 Jul 2003 11:51:34 GMT 


Phil Carmod #6 / 28

Crazy bittwiddle algorithms needed!
Quote:
> mh... I think that's fairly easy > input: esi = pointer to bitstream > ecx = number of bits to accumulate > edx = distance between bits (=17 if you want to add every 17th > bit) > edi = first bit to test > output: eax = number of set bits
Hmmm, I did say: Quote: > > And of course I want all 17 sums.
You've assumes that the bit pitch is < register size. I think I forgot to explicitly say that, but that is the case (I intend to port this to my Alpha, where the pitch will reach 63) Phil

Sun, 06 Jul 2003 21:11:54 GMT 


Christian Schuber #7 / 28

Crazy bittwiddle algorithms needed!
Quote: > Hmmm, I did say: > > > And of course I want all 17 sums.
Mh... apply the algorithm 17 times in a row :) Quote: > You've assumes that the bit pitch is < register size
not at all, should work with any given pitch (<2^32)...

Mon, 07 Jul 2003 02:36:01 GMT 


Terje Mathise #8 / 28

Crazy bittwiddle algorithms needed!
Quote:
> > mh... I think that's fairly easy > > input: esi = pointer to bitstream > > ecx = number of bits to accumulate > > edx = distance between bits (=17 if you want to add every 17th > > bit) > > edi = first bit to test > > output: eax = number of set bits > Hmmm, I did say: > > > And of course I want all 17 sums. > You've assumes that the bit pitch is < register size. I think I forgot > to explicitly say that, but that is the case (I intend to port this to > my Alpha, where the pitch will reach 63)
I've had problems with my newsfeed, so I didn't see the original request, but I'll assume I understood everything correctly from the context: This is an interesting problem, I think you could do all sums in parallel with bitslice arithmetic: First split the input into blocks of N slices of the bit pitch length (N = 2^n1), then add all together using binary full adders: I.e. starting with 15*17 = 255 bits, you can compress this down to 4 17bit registers. A full adder takes three input bits (a, b and carry_in) and generates two result bits (sum and carry_out). The logic is simple: t = a ^ b; sum = t ^ carry_in; carry_out = (a & b)  (t & carry_in); Doing this you can convert three input slices to two, seven slices to three and 15 slices to 4. If the pitch is less or equal to half the register width, then we'll simply repeat it as many times as will fit, and do the final accumulation at the end. This process could be repeated until it covered the full input array, but it is probably much faster to keep the number of temporary variables at something that will fit in the selected processor. On x86 I'd use 15 slices which will fit inside the 8 64bit MMX regs, on an Alpha you could probably get away with up to 2024 accumulator regs, corresponding to 1 to 16 Mbit arrays, but it would be better to make the input size divisible by the register size, to simplify the alignment code. Terje 
Using selfdiscipline, see http://www.eiffel.com/discipline "almost all programming can be viewed as an exercise in caching"

Mon, 07 Jul 2003 02:36:06 GMT 


Phil Carmod #9 / 28

Crazy bittwiddle algorithms needed!
Quote:
> I've had problems with my newsfeed, so I didn't see the original > request, but I'll assume I understood everything correctly from the > context: > This is an interesting problem, I think you could do all sums in > parallel with bitslice arithmetic: > First split the input into blocks of N slices of the bit pitch length (N > = 2^n1), then add all together using binary full adders: > I.e. starting with 15*17 = 255 bits, you can compress this down to 4 > 17bit registers.
I'm glad you chose that example  the most likely problems I will want to tackle are summing up to 192 bits into between 17 and 59 buckets (all pitches are prime numbers), so we're in a similar ballpark. If I understand correctly, then you're accumulating the counts 'perpendicularly' across several registers. This certainly classifies as a 'crazy bittwiddle', in my book, congratulations! I have prior knowledge that the bitstring will be nowhere near full, so the maximum of 15 in your example will never be attained, and 7 is more than enough. That decreases the register requirement by 1. I do need to do some comparison arithmetic on the sums too, I need to classify them into 3 cases: sum<=1; sum=2, sum>=3. Assuming the LSBs are in register R0, then sum>=3 <=> R2  (R1&R0) sum==2 <=> R1 & ~(R1R2) sum<=1 <=> ~(R1R2) That's fairly neat too. Quote: > A full adder takes three input bits (a, b and carry_in) and generates > two result bits (sum and carry_out). > The logic is simple: > t = a ^ b; > sum = t ^ carry_in; > carry_out = (a & b)  (t & carry_in);
This is 3 => 2. OK, I have no problem with that. Quote: > Doing this you can convert three input slices to two, seven slices to > three and 15 slices to 4.
I don't see directly how to go from 7 to 3. I can grok 9=>6=>4 as two 3=>2s, but a fast 7 to 3 evades me (It's probably a standard technique). The 7=>3 truth table doesn't appear to be as simple as the 3=>2 one, so I'm floundering. Quote: > If the pitch is less or equal to half the register width, then we'll > simply repeat it as many times as will fit, and do the final > accumulation at the end.
Yes, this is a double win as you do fewer 'sums' and each one accumulated fewer bits, Quote: > This process could be repeated until it covered the full input array, > but it is probably much faster to keep the number of temporary variables > at something that will fit in the selected processor.
With an O(n!) algorithm (I know, but maths is sometimes like that, at least it's NP) across spans that seem to be O(n.log(p)^2) I'll never get a chance to tackle spans larger than 500 bits (n=30, and I'm currently looking at n=20 cases, with spans of 200 bits). Quote: > On x86 I'd use 15 slices which will fit inside the 8 64bit MMX regs, on > an Alpha you could probably get away with up to 2024 accumulator regs, > corresponding to 1 to 16 Mbit arrays, but it would be better to make the > input size divisible by the register size, to simplify the alignment > code.
There's no register problem on the Alpha certainly (and I rarely code in assembly on that, I write C and verify that the output is what I would have written anyway!). I've never programmed MMX code, is it supported in recent versions of g++/gas (I'm on Linux)? Phil

Tue, 08 Jul 2003 02:37:26 GMT 


Terje Mathise #10 / 28

Crazy bittwiddle algorithms needed!
Quote:
> > I've had problems with my newsfeed, so I didn't see the original > > request, but I'll assume I understood everything correctly from the > > context: > > This is an interesting problem, I think you could do all sums in > > parallel with bitslice arithmetic: > > First split the input into blocks of N slices of the bit pitch length (N > > = 2^n1), then add all together using binary full adders: > > I.e. starting with 15*17 = 255 bits, you can compress this down to 4 > > 17bit registers. > I'm glad you chose that example  the most likely problems I will want > to tackle are summing > up to 192 bits into between 17 and 59 buckets (all pitches are prime > numbers), so we're in a similar ballpark. > If I understand correctly, then you're accumulating the counts > 'perpendicularly' across several registers.
Yes. Quote: > This certainly classifies as a 'crazy bittwiddle', in my book, > congratulations!
:) Quote: > I have prior knowledge that the bitstring will be nowhere near full, so > the maximum of 15 in your example will never be attained, and 7 is more > than enough. That decreases the register requirement by 1.
This helps a lot! Quote: > I do need to do some comparison arithmetic on the sums too, I need to > classify them into 3 cases: > sum<=1; sum=2, sum>=3. > Assuming the LSBs are in register R0, then > sum>=3 <=> R2  (R1&R0) > sum==2 <=> R1 & ~(R1R2) > sum<=1 <=> ~(R1R2) > That's fairly neat too. > > A full adder takes three input bits (a, b and carry_in) and generates > > two result bits (sum and carry_out). > > The logic is simple: > > t = a ^ b; > > sum = t ^ carry_in; > > carry_out = (a & b)  (t & carry_in); > This is 3 => 2. OK, I have no problem with that. > > Doing this you can convert three input slices to two, seven slices to > > three and 15 slices to 4. > I don't see directly how to go from 7 to 3. > I can grok 9=>6=>4 as two 3=>2s, but a fast 7 to 3 evades me (It's > probably a standard technique). > The 7=>3 truth table doesn't appear to be as simple as the 3=>2 one, so > I'm floundering.
You don't generate a direct 7>3 table, instead you do multiple full adds: Assume we have a macro with three inputs and 2 outputs: (carry_out, sum) = full_add(a, b, carry_in) The inputs are named as (a, b, c, d, e, f, g), so the code becomes: (s1, s0) = full_add(a, b, c); (t1, t0) = full_add(d, e, f); (u1, sum0) = full_add(s0, t0, g); (sum2, sum1) = full_add(s1, t1, u1); I.e. you need 4 full adds to convert 7 bits to 3, and you'll need 11 to go from 15 bits to 4. The pattern is pretty obvious: You need one full_add for each bit compression. With a maximum count of 7, and no need to classify more than (>= 3), you could simply do a few full_adds, and then switch to a different approach, where you work with carry_save numbers, propagating each new input block in parallel, using half adders: carry_out = a & b; sum = a ^ b; This requires 4 accumulator regs: sum0, sum1, sum2 and carry1: The code to propagate an input array looks like this: sum2 = sum1 & carry1; // Saturating adds! sum1 ^= carry1; carry1 = sum0 & input; sum0 ^= input; Quote: > > On x86 I'd use 15 slices which will fit inside the 8 64bit MMX regs, on > > an Alpha you could probably get away with up to 2024 accumulator regs, > > corresponding to 1 to 16 Mbit arrays, but it would be better to make the > > input size divisible by the register size, to simplify the alignment > > code. > There's no register problem on the Alpha certainly (and I rarely code in > assembly on that, I write C and verify that the output is what I would > have written anyway!). I've never programmed MMX code, is it supported > in recent versions of g++/gas (I'm on Linux)?
NASM should handle it, that's all you need! Terje 
Using selfdiscipline, see http://www.eiffel.com/discipline "almost all programming can be viewed as an exercise in caching"

Tue, 08 Jul 2003 08:55:29 GMT 


Rick C. Hodgi #11 / 28

Crazy bittwiddle algorithms needed!
Quote: >> > First split the input into blocks of N slices of the bit pitch length (N >> > = 2^n1), then add all together using binary full adders: >> > I.e. starting with 15*17 = 255 bits, you can compress this down to 4 >> > 17bit registers.
Why add? If you take the bits as they come in and build a sequence of bit patterns (buckets he calls them) then you can use those bit patterns against a lookup table and you'll have your summations. The example I posted the other day does this and it works for up to 8bit patterns.  Rick C. Hodgin

Tue, 08 Jul 2003 12:52:27 GMT 


Terje Mathise #12 / 28

Crazy bittwiddle algorithms needed!
Quote:
> >> > First split the input into blocks of N slices of the bit pitch length (N > >> > = 2^n1), then add all together using binary full adders: > >> > I.e. starting with 15*17 = 255 bits, you can compress this down to 4 > >> > 17bit registers. > Why add? If you take the bits as they come in and build a sequence of > bit patterns (buckets he calls them) then you can use those bit > patterns against a lookup table and you'll have your summations. The > example I posted the other day does this and it works for up to 8bit > patterns.
Rick, as I stated I never got the initial post, so I do not have the complete background. There are obviously many different ways to handle this problem, depending upon the actual input data. With a guaranteed max of 7 for each bit position, it might simply be most efficient to have 3 accumulator regs, and then split each input into 3 blocks: sum0 += input & 0o11111111; // Octal constant! sum1 += input & 0o02222222; sum2 += input & 0o04444444; This can handle up to 30 bits in each input block, to get all the way to 32 would require a shift operation on the the two last lines. Since all bit pitches would be primes, 32 is impossible, so a single shift is enough to handle the case of 31: sum0 += input & 0o11111111; sum1 += input & 0o02222222; sum2 += (input>>2) & 0o11111111; This results in 9 basic twooperand opcodes for each new input block. Using my half adders from yesterday: sum2 = sum1 & carry1; // Saturating adds! sum1 ^= carry1; carry1 = sum0 & input; sum0 ^= input; I get just 7 ops for each new input block, while at the same time automatically handling any overflows in the individual counts. Terje 
Using selfdiscipline, see http://www.eiffel.com/discipline "almost all programming can be viewed as an exercise in caching"

Tue, 08 Jul 2003 22:13:25 GMT 


Rick C. Hodgi #13 / 28

Crazy bittwiddle algorithms needed!
Quote: >With a guaranteed max of 7 for each bit position, it might simply be >most efficient to have 3 accumulator regs, and then split each input >into 3 blocks: > sum0 += input & 0o11111111; // Octal constant! > sum1 += input & 0o02222222; > sum2 += input & 0o04444444;
I either don't understand your solution (more likely) or I don't understand the original problem. The original poster said he had a fixed number of bits in a source string and he needed to sum every Nth bit, such as each 17th bit being equal to string[0] + string[17] + string[37].... + string[whatever the particular iteration would still fit within the maximum length of the original string]. How does your solution do that? My solution is to cycle through the source string in an iterative manner building a series of output bytes/words/dwords (depending on maximum output size) containing the Nth order bit patterns as they originally appeared in the source string. Once that series is built each element can be applied to a simple 256byte lookup table which would provide you the sums. One lookup would be required for 8bit or less quantities, 2 lookups and one add would be required for 16bit to 9bit quantities, 3 lookups and two adds would be required for 24bit to 17bit operations, etc. In the alternative larger lookup table could be provided to reduce the number of adds.  Rick C. Hodgin

Wed, 09 Jul 2003 01:37:29 GMT 


Terje Mathise #14 / 28

Crazy bittwiddle algorithms needed!
Quote:
> >With a guaranteed max of 7 for each bit position, it might simply be > >most efficient to have 3 accumulator regs, and then split each input > >into 3 blocks: > > sum0 += input & 0o11111111; // Octal constant! > > sum1 += input & 0o02222222; > > sum2 += input & 0o04444444; > I either don't understand your solution (more likely) or I don't > understand the original problem. > The original poster said he had a fixed number of bits in a source > string and he needed to sum every Nth bit, such as each 17th bit being > equal to string[0] + string[17] + string[37].... + string[whatever the > particular iteration would still fit within the maximum length of the > original string]. > How does your solution do that? > My solution is to cycle through the source string in an iterative > manner building a series of output bytes/words/dwords (depending on > maximum output size) containing the Nth order bit patterns as they > originally appeared in the source string. Once that series is built > each element can be applied to a simple 256byte lookup table which > would provide you the sums. One lookup would be required for 8bit or > less quantities, 2 lookups and one add would be required for 16bit to > 9bit quantities, 3 lookups and two adds would be required for 24bit > to 17bit operations, etc. In the alternative larger lookup table > could be provided to reduce the number of adds.
OK, I understand! You are basically compressing the bit string, extracting only the wanted bits, and then counting the number of set bits in the result. Right? My code is quite different, I calculate all 17 sums for a pitch 17 calculation in parallel. I.e. first extract each 17bit slice, align into the bottom part of a register, and then accumulate each bit position simultaneously: input = get17bits(); sum2 = sum1 & carry1; sum1 ^= carry1; carry1 = sum0 & input; sum0 ^= input; If you want to avoid the explicit storage of carry, then the code gets some more serial latency: input = get17bits(); carry = sum0 & input; sum0 ^= input; sum2 = sum1 & carry; sum1 ^= carry; Terje 
Using selfdiscipline, see http://www.eiffel.com/discipline "almost all programming can be viewed as an exercise in caching"

Wed, 09 Jul 2003 05:17:02 GMT 


Rick C. Hodgi #15 / 28

Crazy bittwiddle algorithms needed!
Quote: >string[0] + string[17] + string[37]....
That should be [34], not [37]. Typo. Same finger, different hand. :)  Rick C. Hodgin

Wed, 09 Jul 2003 05:17:07 GMT 


Page 1 of 2

[ 28 post ] 

Go to page:
[1]
[2] 
