seeing the bits in a byte... 
Author Message
 seeing the bits in a byte...

how access the bits in a byte (unsigned char)?

As I saw this is only possible using the boolean operators and the
*toa(source_numeric_value, *dest_string, radix) functions - (where radix is
the conversion base)
Is there any other way??

thanks,

Henrique Seganfredo

--



Sat, 16 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

   how access the bits in a byte (unsigned char)?

   As I saw this is only possible using the boolean operators and the
   *toa(source_numeric_value, *dest_string, radix) functions - (where radix is
   the conversion base)
   Is there any other way??

There are no *toa functions in C, and the operators most often used
for Boolean operations, !, ||, and &&, aren't too useful for accessing
bits.  I suggest that you take a look at the bitwise operators &, |,
and ^, as well as << and >>.  They're the easiest ways to deal with
bits.
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:

> how access the bits in a byte (unsigned char)?

> As I saw this is only possible using the boolean operators and the
> *toa(source_numeric_value, *dest_string, radix) functions - (where radix is
> the conversion base)
> Is there any other way??

Say you want the n'th bit from byte.

{
char byte;
short n, bit;
...
byte = (char) something();

/* now you want to do a bitwise AND of byte with n^2
   which gives 0 if n'th bit was 0 and n^2 if it was 1 */
/* than do the right shift to get either 0 or 1 */

bit = ((byte & (1 << n)) >> n);

/* bit is now the value of the n'th bit of byte
   (counting from n=0 to n=7, if a char has 8 bits) */
...

Quote:
}

My question to clcm: How does the endian of the machine you're running
this on affect things?
When you run this on a byte with (bin): 01000000 and n=1, does it give
another result on big-endian machines then on little endian machines? Or
does the compiler take care of this?

Melle
--
I read it on Usenet - So it must be True
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...


Quote:
>how access the bits in a byte (unsigned char)?

>As I saw this is only possible using the boolean operators and the
>*toa(source_numeric_value, *dest_string, radix) functions - (where radix is
>the conversion base)
>Is there any other way??

I am not clear what you are trying to do.  If you want to access the
bits as some kind of flags bit fields often achieve what you want.

Francis Glassborow      Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:


> > how access the bits in a byte (unsigned char)?

> > As I saw this is only possible using the boolean operators and the
> > *toa(source_numeric_value, *dest_string, radix) functions - (where radix is
> > the conversion base)
> > Is there any other way??

> Say you want the n'th bit from byte.

> {
> char byte;
> short n, bit;
> ...
> byte = (char) something();

> /* now you want to do a bitwise AND of byte with n^2
>    which gives 0 if n'th bit was 0 and n^2 if it was 1 */
> /* than do the right shift to get either 0 or 1 */

> bit = ((byte & (1 << n)) >> n);

> /* bit is now the value of the n'th bit of byte
>    (counting from n=0 to n=7, if a char has 8 bits) */
> ...
> }

> My question to clcm: How does the endian of the machine you're running
> this on affect things?

It doesn't (not even for int, where sizeof(int)>1).  You're doing
integer arithmetic mod <mumble>.

Quote:
> When you run this on a byte with (bin): 01000000 and n=1, does it give
> another result on big-endian machines then on little endian machines? Or
> does the compiler take care of this?

Better still, Mathematics takes care of it for you.

--

Compugen Ltd.          |Tel: +972-2-6795059 (Jerusalem) \  NEW IMPROVED URL!
72 Pinhas Rosen St.    |Tel: +972-3-7658520 (Main office)`--------------------
Tel-Aviv 69512, ISRAEL |Fax: +972-3-7658555  http://3w.compugen.co.il/~ariels
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:

>My question to clcm: How does the endian of the machine you're running
>this on affect things?

"Endianness" is a matter of perception, as it were.  (More on this
in a moment.)

Quote:
>When you run this on a byte with (bin): 01000000 ...

(1 << 1) is always just 2, and (1 << 6) is always just 0x40.

The "byte with value 01000000_base_2" (or 0x40) always has binary
value 0x40.

Thus, if "c" is 0x40, and "n" is 6, then (c & (1 << n)) is 0x40.
If you call that "bit 6", then 0x40 has bit 6 set (and no other
bits).

This holds even for larger values: the value 0x1000 is always
just 0x1000; to get a "1 << n" to mask with it, you need n to
be equal to log2(0x1000) or 12.  (1 << 12) is 0x1000, always.

If what you mean is: "I want to build up a byte from a bit stream,
where I look first at a bit that is 0, then a bit that is 1, then
a bit that is 0, then ...", only *then* does "endianness" come into
play:  you must decide whether the "1" (which you inspected second)
is "second-most-significant" or "second-least".  This is because
you have chosen your own "building block" (a single bit) and decided
to assemble a sequence of those blocks into a new value.

If you choose to write a value out as a sequence of bits, one at
a time, you again go back to having to choose whether to write
"least significant" or "most significant" first.  This is because
you have taken an existing basic building block -- some value, in
some C type -- and are trying to break it down into smaller values.

Thus, "endianness" is something that "appears between the cracks"
whenever you go to take a sequence of "building blocks" (values,
of some size in bits) and build something bigger (almost like
building a house with bricks and mortar) or break something down
(like taking the complete house and removing one brick at a time).

C always[%] relies on the underlying machine to interpret individual
bits -- the "basic building block" (or "brick size", if you will)
in C is the "char", which C also calls a "byte".  (To confuse
matters slightly, the C "char/byte" could be more than 8 bits --
but usually it is 8.)  That means that if you write data to a file
as a sequence of "char"s, and move that file from one machine to
another -- say, by copying the file on a floppy -- it is up to the
two machines to interpret the bits in those "char"s in the same
way.  That way when you write binary value 0x33 on one machine.,
and read it on the other, you get binary value 0x33 again.

If your C-bytes are 8 bits, and your floppy also uses 8-bit bytes,
no one ever has to "look at" the bits one at a time, so endianness
never crops up there.  A basic-building-block with value 0x33 is
always just a "brick #0x33".  As long as all your machines use
"interchangeable bricks" (e.g., all 8-bit bytes), the internal
structure of those bricks themselves stays irrelevant.

On the other hand, suppose you take a sequence of 32-bit values,
break each of those up into four 8-bit bytes, write those bytes on
a floppy, move the floppy to another machine, and then go to
re-assemble those four 8-bit bytes into a new 32-bit value.  In
order to get the *same* value (say, 0x12345678), you had better
reassemble those four bytes in the same order.  This is endianness
cropping up:  You took a "32-bit brick" and broke it into four,
and now you want to glue the four back together, so you had best
do it the same way.  If you rely on the fact that the two computers
(say, an Intel and a SPARC) happen to be able to do the "break int
into 4-chars" and "treat 4-chars as an int" in hardware, you will
also rely implicitly on the *order* that the hardware uses for
breaking-up-and-assembling.

If you do the breaking-and-assembling "manually" -- say, by doing:

        putc((val >> 24) & 0xff, fp);
        putc((val >> 16) & 0xff, fp);
        putc((val >>  8) & 0xff, fp);
        putc((val      ) & 0xff, fp);
        /* check for error */

        ...

        val3 = getc(fp);
        val2 = getc(fp);
        val1 = getc(fp);
        val0 = getc(fp);
        /* check for EOF and error */
        val = (val3 << 24) | (val2 << 16) | (val1 << 8) | val0;

you eliminate the dependence on the hardware's order.  You have
imposed your own "endianness" on your data format instead -- here,
"big endian", because you putc() the most significant 8-bit-brick
first, and when you glue them back together, you stash that one in
the most significant position.

[% The exception for this "always" lies in bit-fields in C "struct"s.
   These are not individually addressable, but you can always
   inspect memory using "unsigned char" and figure out how your C
   compiler decided to split-up and glue-together the individual
   bits.  At the same time, though, unless the system has some
   "outside pressure" that encourages a specific bit-endian-ness
   -- such as instructions that operate on bits, or an ABI that
   says "when bytes are broken into bits, the bits shall be numbered
   this way" -- it is not at all unusual to have two different C
   compilers use two different bit orders, even on the same machine.
   In this case, then, the "endianness of bits" -- which shows up
   only once you decide to break them up semi-manually using C's
   bitfields -- is determined by the C compiler.  This makes sense:
   the agent doing the "brick splitting" and "brick gluing" always
   decides which piece to use first, and which piece to use last.]
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc


--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...
Henrique Seganfredo a crit dans le message ...

Quote:
>how access the bits in a byte (unsigned char)?

A byte is a 8 bits variable. If you want to to read one bit status, just do
an equal test with a one bit mask, the result will be 0 or 1.

unsigned char o=0xAA;
/*
                   bit   76543210
AA in binary is 10101010 */

If you want to test bit 5, you need a mask with bit 5

#define bit5 (1<<5)

the operation of testing is a simple & (arithmetic AND)

   int val5 = (o&bit5)==bit5;

in this case, val5 will be set to 1.

If you are used to work with bit mask, I give you my "bit.h" include :

*************** begin *************
/* bits.h */

#ifndef BITS_H
#define BIT_H

 #define  bit31 0x80000000L
 #define  bit30 0x40000000L
 #define  bit29 0x20000000L
 #define  bit28 0x10000000L
 #define  bit27 0x08000000L
 #define  bit26 0x04000000L
 #define  bit25 0x02000000L
 #define  bit24 0x01000000L
 #define  bit23 0x00800000L
 #define  bit22 0x00400000L
 #define  bit21 0x00200000L
 #define  bit20 0x00100000L
 #define  bit19 0x00080000L
 #define  bit18 0x00040000L
 #define  bit17 0x00020000L
 #define  bit16 0x00010000L

 #define  bit15 0x8000
 #define  bit14 0x4000
 #define  bit13 0x2000
 #define  bit12 0x1000
 #define  bit11 0x0800
 #define  bit10 0x0400
 #define  bit9  0x0200
 #define  bit8  0x0100
 #define  bit7  0x0080
 #define  bit6  0x0040
 #define  bit5  0x0020
 #define  bit4  0x0010
 #define  bit3  0x0008
 #define  bit2  0x0004
 #define  bit1  0x0002
 #define  bit0  0x0001

#endif /* BITS_H */
*************** end *************

It is *very* handy.

--
HS

--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...
On Tue, 31 Aug 1999 20:11:43 GMT, "Henrique Seganfredo"

Quote:
> how access the bits in a byte (unsigned char)?

> As I saw this is only possible using the boolean operators and the
> *toa(source_numeric_value, *dest_string, radix) functions - (where radix is
> the conversion base)
> Is there any other way??

> thanks,

> Henrique Seganfredo

Example at http://home.att.net/~jackklein/ctips01.html#binary_out.

Jack Klein
--
Home: http://home.att.net/~jackklein
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...


Quote:
> how access the bits in a byte (unsigned char)?

> As I saw this is only possible using the boolean operators and the
> *toa(source_numeric_value, *dest_string, radix) functions - (where
radix is
> the conversion base)
> Is there any other way??

  What do you mean by accessing bits? Cleaning or setting specific bit
and checking if specific bit is set? For this you do have bitwise
operators (&,| and ^).

--
        Regards,
                Alex Krol
Disclaimer: I'm not speaking for Scitex
Corporation Ltd

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
--



Sun, 17 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:

> Henrique Seganfredo a crit dans le message ...
> >how access the bits in a byte (unsigned char)?

> A byte is a 8 bits variable.

Please stop this.  If you don't know by now that this is wrong, then you
have a power plant to melt down.  Give my regards to Lisa.

--

__________________________________________________________
Fight spam now!
Get your free anti-spam service: http://www.brightmail.com
--



Sat, 23 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:


> >My question to clcm: How does the endian of the machine you're running
> >this on affect things?

> "Endianness" is a matter of perception, as it were.  (More on this
> in a moment.)

> >When you run this on a byte with (bin): 01000000 ...

> (1 << 1) is always just 2, and (1 << 6) is always just 0x40.

> The "byte with value 01000000_base_2" (or 0x40) always has binary
> value 0x40.

> Thus, if "c" is 0x40, and "n" is 6, then (c & (1 << n)) is 0x40.
> If you call that "bit 6", then 0x40 has bit 6 set (and no other
> bits).

> This holds even for larger values: the value 0x1000 is always
> just 0x1000; to get a "1 << n" to mask with it, you need n to
> be equal to log2(0x1000) or 12.  (1 << 12) is 0x1000, always.

This was what I was not sure about in the first pleace.
My fault was thinking that the left and right, as used in left or right
shift, are the same as the 'normal' use of left and right in language,
concerning numbers (the more significant the digit, the more to the
left; assuming that big endianness meant the opposite).
Thus the compiler lets the most significant bit always be on the left in
this case (just from a 'human point of view')?

<SNIP>

Thanks for your excellent explanation.

Melle Gerikowski

BTW: in my example code, "n^2" should have read: "2^n" and the "char
byte;" should have been an unsigned char (it was late).
--
This message was send via Usenet
Usenet - Learn what you know. Share what you don't.
--



Sat, 23 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...
This would be an ideal place to mention the "hton_" and "ntoh_"
functions.  They were invented to allow machines with different
conventions for ordering bytes to communicate multi-byte values.

The idea was that we would define a "network" order for bytes,
and every system transmitting multi-byte integers should send
them in the "network" order, and receive data in "network" order
and translate it to "host" order (host and network are the h and
n).  

There are translations for shorts and longs as "s" and "l"
suffixes (and, yes, I suppose these really presuppose a short
is two bytes and a long is four).  So, "a = htons(b)" is a
functions that takes a short in host order to a short in
network order.  Similarly, "x = ntohl(y)" takes a network
long "y" and converts it to host order.  the htonX functions
are inverses of the corresponding ntohX functions.  These may
be identity functions (with perhaps no code generated) for
machine with the same internal order as "network" order.  So,
to show a "long" as a sequence of bytes in a way that comes out
the same on different machines, do:

void
print_long( long v )
{
  volatile union { long l; char a[4]; } u;
  u.l = htonl( v );
  printf( "%02x %02x %02x %02x\n",
      u.a[0], u.a[1], u.a[2], u.a[3] );

Quote:
}

I suggest using this convention is better than doing the
conversions yourself, since every C compiler vendor will
be able to make the translation more efficiently (by being
machine-specific) than you can in writing general code,
while there _is_ a convention about what network order is
that you can buy into by using these functions.

-Scott David Daniels

Quote:



> >My question to clcm: How does the endian of the machine
> >you're running this on affect things?

> "Endianness" is a matter of perception, as it were.  (More on this
> in a moment.)
> ... plenty of more than reasonable stuff ...

> C always[%] relies on the underlying machine to interpret individual
> bits -- the "basic building block" (or "brick size", if you will)
> in C is the "char", which C also calls a "byte".  (To confuse
> matters slightly, the C "char/byte" could be more than 8 bits --
> but usually it is 8.)  That means that if you write data to a file
> as a sequence of "char"s, and move that file from one machine to
> another -- say, by copying the file on a floppy -- it is up to the
> two machines to interpret the bits in those "char"s in the same
> way.  That way when you write binary value 0x33 on one machine.,
> and read it on the other, you get binary value 0x33 again.
> ...
> On the other hand, suppose you take a sequence of 32-bit values,
> break each of those up into four 8-bit bytes, write those bytes on
> a floppy, move the floppy to another machine, and then go to
> re-assemble those four 8-bit bytes into a new 32-bit value.  In
> order to get the *same* value (say, 0x12345678), you had better
> reassemble those four bytes in the same order.  This is endianness
> cropping up:  You took a "32-bit brick" and broke it into four,
> and now you want to glue the four back together, so you had best
> do it the same way.  If you rely on the fact that the two computers
> (say, an Intel and a SPARC) happen to be able to do the "break int
> into 4-chars" and "treat 4-chars as an int" in hardware, you will
> also rely implicitly on the *order* that the hardware uses for
> breaking-up-and-assembling.

> If you do the breaking-and-assembling "manually" -- say, by doing:

>         putc((val >> 24) & 0xff, fp);
>         putc((val >> 16) & 0xff, fp);
>         putc((val >>  8) & 0xff, fp);
>         putc((val      ) & 0xff, fp);
>         /* check for error */

>         ...

>         val3 = getc(fp);
>         val2 = getc(fp);
>         val1 = getc(fp);
>         val0 = getc(fp);
>         /* check for EOF and error */
>         val = (val3 << 24) | (val2 << 16) | (val1 << 8) | val0;

> you eliminate the dependence on the hardware's order.  You have
> imposed your own "endianness" on your data format instead -- here,
> "big endian", because you putc() the most significant 8-bit-brick
> first, and when you glue them back together, you stash that one in
> the most significant position.
> ...

--



Sat, 23 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...

Quote:

>My fault was thinking that the left and right, as used in left or right
>shift, are the same as the 'normal' use of left and right in language,
>concerning numbers (the more significant the digit, the more to the
>left; assuming that big endianness meant the opposite).

Actually, "big endian" is "most significant portion first", and:

Quote:
>Thus the compiler lets the most significant bit always be on the left in
>this case (just from a 'human point of view')?

... it is not so much the compilers that use big-endian notation
(although they do) when writing numbers like "1024" and "0x40";
rather, it is the humans who write numbers down on paper, or at a
computer, who do so.  There are a few natural languages (including
the largely obsolete English-language usage preserved in the old
nursery rhyme about "four and twenty blackbirds baked in a pie")
where some numbers are stated "little-endian" -- "four and twenty"
means 24 -- but in general we start out with the most significant
digit, and work down to the least.

In other words, in this case, the compiler is matching our
expectations, so that the constant 24 means "four and twenty", not
"two and forty", and 1024 means what we expect as well.  How "1024"
gets broken up into bytes -- assuming 8-bit bytes, into 0x04 and
0x00 -- is up to the compiler and/or machine, and its own endianness
will remain hidden unless and until we peek at the two separate
bytes.  If you conclude that "the 0x04 was put in first", you
have concluded that the system is using a big-endian notation in
this case; if you conclude that "the 0x00 was put in first", you
have found the system to use little-endian notation.

VAX and Intel x86 are classic little-endian architectures; the
680x0 is a classic big-endian architecture.  A lot of modern
microprocessors are "bi-endian", with endianness bits in the CPU
and/or page tables (or equivalent).  The PDP-11, on which C really
took shape, is mostly little-endian, except that a "long" with
value 0x11223344 is not stored in memory as the four unsigned chars
{ 0x44, 0x33, 0x22, 0x11 } (little-endian), nor is it the four byte
sequence { 0x11, 0x22, 0x33, 0x44 } (big-endian).  Rather, 0x11223344
is stored in memory as { 0x22, 0x11, 0x44, 0x33 }.  If you have
the hardware assemble each two-byte group for you, it comes up with
the values { 0x1122, 0x3344 }.  That means that it stores 16-bit
values little-endian when breaking them into 8-bit bytes, and
assembles 8-bit bytes little-endian into 16-bit values, but when
taking 16-bit groups to or from a 32-bit value, it uses big-endian
order!  This is a form of "mixed endianness", and it shows that
you cannot simply divide the world into "big" and "little" endian
and expect to be able to use one byte-swapping routine to
compensate. :-)

(The VAX also uses a peculiar significance-order when working with
D-floating formats, so that any D-float can be treated as an F-float
simply by lopping off its tail.  For various reasons, however, no
one ever seems to find such FP formats odd -- perhaps because FP
formats are already sufficiently complicated to overwhelm other
aesthetic senses. :-) )
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc


--



Sun, 24 Feb 2002 03:00:00 GMT  
 seeing the bits in a byte...
[...]
Quote:
> ... it is not so much the compilers that use big-endian notation
> (although they do) when writing numbers like "1024" and "0x40";
> rather, it is the humans who write numbers down on paper, or at a
> computer, who do so.  There are a few natural languages (including
> the largely obsolete English-language usage preserved in the old
> nursery rhyme about "four and twenty blackbirds baked in a pie")
> where some numbers are stated "little-endian" -- "four and twenty"
> means 24 -- but in general we start out with the most significant
> digit, and work down to the least.

[...]

Here's an obscure historical note on endianness (only vaguely
on-topic).  Our decimal numbering system, referred to as Arabic or
Hindu-Arabic numbers, was inherited by the Europeans from the Arabs,
largely replacing the older and more unwieldy Roman numerals.  The
Arabs wrote numbers like 1024 with the most significant digit on the
left; the Europeans maintained that convention.  The trick is, Arabic
is written right-to-left, whereas Europeans languages are written
left-to-right.  So, the representation magically changed from
little-endian to big-endian, not by swapping the digits, but by
swapping the rest of the written language around them.

Caveat: This is based on a vague recollection of something I read
somewhere or other (probably on the net) an unknown number of years
ago.  I'd be interested if anyone can confirm or deny this.  I'm also
curious how modern Arabic handles this -- or modern Hebrew, for that
matter.

--

San Diego Supercomputer Center           <*>  <http://www.sdsc.edu/~kst>
"Oh my gosh!  You are SO ahead of your time!" -- anon.
--



Mon, 25 Feb 2002 03:00:00 GMT  
 
 [ 23 post ]  Go to page: [1] [2]

 Relevant Pages 

1. seeing too many bytes?

2. Dividing 42 bits / 42 bits in segments of 8 bits

3. - working on single bits of a byte

4. Extracting bits from an array of bytes

5. shifting bits or bytes

6. reverse bits in bytes

7. Bits in a byte

8. need help with reading indiv.bits on byte

9. --Automatically calculate bits per byte?

10. bits and bytes

11. count bits turned on in a byte (or int)

12. Confusing Bits and Bytes

 

 
Powered by phpBB® Forum Software