seeing the bits in a byte...
Author |
Message |
Henrique Seganfred #1 / 23
|
 seeing the bits in a byte...
how access the bits in a byte (unsigned char)? As I saw this is only possible using the boolean operators and the *toa(source_numeric_value, *dest_string, radix) functions - (where radix is the conversion base) Is there any other way?? thanks, Henrique Seganfredo --
|
Sat, 16 Feb 2002 03:00:00 GMT |
|
 |
Ben Pfaf #2 / 23
|
 seeing the bits in a byte...
how access the bits in a byte (unsigned char)? As I saw this is only possible using the boolean operators and the *toa(source_numeric_value, *dest_string, radix) functions - (where radix is the conversion base) Is there any other way?? There are no *toa functions in C, and the operators most often used for Boolean operations, !, ||, and &&, aren't too useful for accessing bits. I suggest that you take a look at the bitwise operators &, |, and ^, as well as << and >>. They're the easiest ways to deal with bits. --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Mell #3 / 23
|
 seeing the bits in a byte...
Quote:
> how access the bits in a byte (unsigned char)? > As I saw this is only possible using the boolean operators and the > *toa(source_numeric_value, *dest_string, radix) functions - (where radix is > the conversion base) > Is there any other way??
Say you want the n'th bit from byte. { char byte; short n, bit; ... byte = (char) something(); /* now you want to do a bitwise AND of byte with n^2 which gives 0 if n'th bit was 0 and n^2 if it was 1 */ /* than do the right shift to get either 0 or 1 */ bit = ((byte & (1 << n)) >> n); /* bit is now the value of the n'th bit of byte (counting from n=0 to n=7, if a char has 8 bits) */ ... Quote: }
My question to clcm: How does the endian of the machine you're running this on affect things? When you run this on a byte with (bin): 01000000 and n=1, does it give another result on big-endian machines then on little endian machines? Or does the compiler take care of this? Melle -- I read it on Usenet - So it must be True --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Francis Glassboro #4 / 23
|
 seeing the bits in a byte...
Quote: >how access the bits in a byte (unsigned char)? >As I saw this is only possible using the boolean operators and the >*toa(source_numeric_value, *dest_string, radix) functions - (where radix is >the conversion base) >Is there any other way??
I am not clear what you are trying to do. If you want to access the bits as some kind of flags bit fields often achieve what you want. Francis Glassborow Journal Editor, Association of C & C++ Users 64 Southfield Rd Oxford OX4 1PA +44(0)1865 246490 All opinions are mine and do not represent those of any organisation --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Ariel Scolnico #5 / 23
|
 seeing the bits in a byte...
Quote:
> > how access the bits in a byte (unsigned char)? > > As I saw this is only possible using the boolean operators and the > > *toa(source_numeric_value, *dest_string, radix) functions - (where radix is > > the conversion base) > > Is there any other way?? > Say you want the n'th bit from byte. > { > char byte; > short n, bit; > ... > byte = (char) something(); > /* now you want to do a bitwise AND of byte with n^2 > which gives 0 if n'th bit was 0 and n^2 if it was 1 */ > /* than do the right shift to get either 0 or 1 */ > bit = ((byte & (1 << n)) >> n); > /* bit is now the value of the n'th bit of byte > (counting from n=0 to n=7, if a char has 8 bits) */ > ... > } > My question to clcm: How does the endian of the machine you're running > this on affect things?
It doesn't (not even for int, where sizeof(int)>1). You're doing integer arithmetic mod <mumble>. Quote: > When you run this on a byte with (bin): 01000000 and n=1, does it give > another result on big-endian machines then on little endian machines? Or > does the compiler take care of this?
Better still, Mathematics takes care of it for you. --
Compugen Ltd. |Tel: +972-2-6795059 (Jerusalem) \ NEW IMPROVED URL! 72 Pinhas Rosen St. |Tel: +972-3-7658520 (Main office)`-------------------- Tel-Aviv 69512, ISRAEL |Fax: +972-3-7658555 http://3w.compugen.co.il/~ariels --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Chris Tore #6 / 23
|
 seeing the bits in a byte...
Quote:
>My question to clcm: How does the endian of the machine you're running >this on affect things?
"Endianness" is a matter of perception, as it were. (More on this in a moment.) Quote: >When you run this on a byte with (bin): 01000000 ...
(1 << 1) is always just 2, and (1 << 6) is always just 0x40. The "byte with value 01000000_base_2" (or 0x40) always has binary value 0x40. Thus, if "c" is 0x40, and "n" is 6, then (c & (1 << n)) is 0x40. If you call that "bit 6", then 0x40 has bit 6 set (and no other bits). This holds even for larger values: the value 0x1000 is always just 0x1000; to get a "1 << n" to mask with it, you need n to be equal to log2(0x1000) or 12. (1 << 12) is 0x1000, always. If what you mean is: "I want to build up a byte from a bit stream, where I look first at a bit that is 0, then a bit that is 1, then a bit that is 0, then ...", only *then* does "endianness" come into play: you must decide whether the "1" (which you inspected second) is "second-most-significant" or "second-least". This is because you have chosen your own "building block" (a single bit) and decided to assemble a sequence of those blocks into a new value. If you choose to write a value out as a sequence of bits, one at a time, you again go back to having to choose whether to write "least significant" or "most significant" first. This is because you have taken an existing basic building block -- some value, in some C type -- and are trying to break it down into smaller values. Thus, "endianness" is something that "appears between the cracks" whenever you go to take a sequence of "building blocks" (values, of some size in bits) and build something bigger (almost like building a house with bricks and mortar) or break something down (like taking the complete house and removing one brick at a time). C always[%] relies on the underlying machine to interpret individual bits -- the "basic building block" (or "brick size", if you will) in C is the "char", which C also calls a "byte". (To confuse matters slightly, the C "char/byte" could be more than 8 bits -- but usually it is 8.) That means that if you write data to a file as a sequence of "char"s, and move that file from one machine to another -- say, by copying the file on a floppy -- it is up to the two machines to interpret the bits in those "char"s in the same way. That way when you write binary value 0x33 on one machine., and read it on the other, you get binary value 0x33 again. If your C-bytes are 8 bits, and your floppy also uses 8-bit bytes, no one ever has to "look at" the bits one at a time, so endianness never crops up there. A basic-building-block with value 0x33 is always just a "brick #0x33". As long as all your machines use "interchangeable bricks" (e.g., all 8-bit bytes), the internal structure of those bricks themselves stays irrelevant. On the other hand, suppose you take a sequence of 32-bit values, break each of those up into four 8-bit bytes, write those bytes on a floppy, move the floppy to another machine, and then go to re-assemble those four 8-bit bytes into a new 32-bit value. In order to get the *same* value (say, 0x12345678), you had better reassemble those four bytes in the same order. This is endianness cropping up: You took a "32-bit brick" and broke it into four, and now you want to glue the four back together, so you had best do it the same way. If you rely on the fact that the two computers (say, an Intel and a SPARC) happen to be able to do the "break int into 4-chars" and "treat 4-chars as an int" in hardware, you will also rely implicitly on the *order* that the hardware uses for breaking-up-and-assembling. If you do the breaking-and-assembling "manually" -- say, by doing: putc((val >> 24) & 0xff, fp); putc((val >> 16) & 0xff, fp); putc((val >> 8) & 0xff, fp); putc((val ) & 0xff, fp); /* check for error */ ... val3 = getc(fp); val2 = getc(fp); val1 = getc(fp); val0 = getc(fp); /* check for EOF and error */ val = (val3 << 24) | (val2 << 16) | (val1 << 8) | val0; you eliminate the dependence on the hardware's order. You have imposed your own "endianness" on your data format instead -- here, "big endian", because you putc() the most significant 8-bit-brick first, and when you glue them back together, you stash that one in the most significant position. [% The exception for this "always" lies in bit-fields in C "struct"s. These are not individually addressable, but you can always inspect memory using "unsigned char" and figure out how your C compiler decided to split-up and glue-together the individual bits. At the same time, though, unless the system has some "outside pressure" that encourages a specific bit-endian-ness -- such as instructions that operate on bits, or an ABI that says "when bytes are broken into bits, the bits shall be numbered this way" -- it is not at all unusual to have two different C compilers use two different bit orders, even on the same machine. In this case, then, the "endianness of bits" -- which shows up only once you decide to break them up semi-manually using C's bitfields -- is determined by the C compiler. This makes sense: the agent doing the "brick splitting" and "brick gluing" always decides which piece to use first, and which piece to use last.] -- In-Real-Life: Chris Torek, Berkeley Software Design Inc
--
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Homer Simpso #7 / 23
|
 seeing the bits in a byte...
Henrique Seganfredo a crit dans le message ... Quote: >how access the bits in a byte (unsigned char)?
A byte is a 8 bits variable. If you want to to read one bit status, just do an equal test with a one bit mask, the result will be 0 or 1. unsigned char o=0xAA; /* bit 76543210 AA in binary is 10101010 */ If you want to test bit 5, you need a mask with bit 5 #define bit5 (1<<5) the operation of testing is a simple & (arithmetic AND) int val5 = (o&bit5)==bit5; in this case, val5 will be set to 1. If you are used to work with bit mask, I give you my "bit.h" include : *************** begin ************* /* bits.h */ #ifndef BITS_H #define BIT_H #define bit31 0x80000000L #define bit30 0x40000000L #define bit29 0x20000000L #define bit28 0x10000000L #define bit27 0x08000000L #define bit26 0x04000000L #define bit25 0x02000000L #define bit24 0x01000000L #define bit23 0x00800000L #define bit22 0x00400000L #define bit21 0x00200000L #define bit20 0x00100000L #define bit19 0x00080000L #define bit18 0x00040000L #define bit17 0x00020000L #define bit16 0x00010000L #define bit15 0x8000 #define bit14 0x4000 #define bit13 0x2000 #define bit12 0x1000 #define bit11 0x0800 #define bit10 0x0400 #define bit9 0x0200 #define bit8 0x0100 #define bit7 0x0080 #define bit6 0x0040 #define bit5 0x0020 #define bit4 0x0010 #define bit3 0x0008 #define bit2 0x0004 #define bit1 0x0002 #define bit0 0x0001 #endif /* BITS_H */ *************** end ************* It is *very* handy. -- HS --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Jack Kle #8 / 23
|
 seeing the bits in a byte...
On Tue, 31 Aug 1999 20:11:43 GMT, "Henrique Seganfredo"
Quote: > how access the bits in a byte (unsigned char)? > As I saw this is only possible using the boolean operators and the > *toa(source_numeric_value, *dest_string, radix) functions - (where radix is > the conversion base) > Is there any other way?? > thanks, > Henrique Seganfredo
Example at http://home.att.net/~jackklein/ctips01.html#binary_out. Jack Klein -- Home: http://home.att.net/~jackklein --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Alex_K.. #9 / 23
|
 seeing the bits in a byte...
Quote: > how access the bits in a byte (unsigned char)? > As I saw this is only possible using the boolean operators and the > *toa(source_numeric_value, *dest_string, radix) functions - (where radix is > the conversion base) > Is there any other way??
What do you mean by accessing bits? Cleaning or setting specific bit and checking if specific bit is set? For this you do have bitwise operators (&,| and ^). -- Regards, Alex Krol Disclaimer: I'm not speaking for Scitex Corporation Ltd Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't. --
|
Sun, 17 Feb 2002 03:00:00 GMT |
|
 |
Martin Ambuh #10 / 23
|
 seeing the bits in a byte...
Quote:
> Henrique Seganfredo a crit dans le message ... > >how access the bits in a byte (unsigned char)? > A byte is a 8 bits variable.
Please stop this. If you don't know by now that this is wrong, then you have a power plant to melt down. Give my regards to Lisa. --
__________________________________________________________ Fight spam now! Get your free anti-spam service: http://www.brightmail.com --
|
Sat, 23 Feb 2002 03:00:00 GMT |
|
 |
Mell #11 / 23
|
 seeing the bits in a byte...
Quote:
> >My question to clcm: How does the endian of the machine you're running > >this on affect things? > "Endianness" is a matter of perception, as it were. (More on this > in a moment.) > >When you run this on a byte with (bin): 01000000 ... > (1 << 1) is always just 2, and (1 << 6) is always just 0x40. > The "byte with value 01000000_base_2" (or 0x40) always has binary > value 0x40. > Thus, if "c" is 0x40, and "n" is 6, then (c & (1 << n)) is 0x40. > If you call that "bit 6", then 0x40 has bit 6 set (and no other > bits). > This holds even for larger values: the value 0x1000 is always > just 0x1000; to get a "1 << n" to mask with it, you need n to > be equal to log2(0x1000) or 12. (1 << 12) is 0x1000, always.
This was what I was not sure about in the first pleace. My fault was thinking that the left and right, as used in left or right shift, are the same as the 'normal' use of left and right in language, concerning numbers (the more significant the digit, the more to the left; assuming that big endianness meant the opposite). Thus the compiler lets the most significant bit always be on the left in this case (just from a 'human point of view')? <SNIP> Thanks for your excellent explanation. Melle Gerikowski BTW: in my example code, "n^2" should have read: "2^n" and the "char byte;" should have been an unsigned char (it was late). -- This message was send via Usenet Usenet - Learn what you know. Share what you don't. --
|
Sat, 23 Feb 2002 03:00:00 GMT |
|
 |
Scott.David.Daniel #12 / 23
|
 seeing the bits in a byte...
This would be an ideal place to mention the "hton_" and "ntoh_" functions. They were invented to allow machines with different conventions for ordering bytes to communicate multi-byte values. The idea was that we would define a "network" order for bytes, and every system transmitting multi-byte integers should send them in the "network" order, and receive data in "network" order and translate it to "host" order (host and network are the h and n). There are translations for shorts and longs as "s" and "l" suffixes (and, yes, I suppose these really presuppose a short is two bytes and a long is four). So, "a = htons(b)" is a functions that takes a short in host order to a short in network order. Similarly, "x = ntohl(y)" takes a network long "y" and converts it to host order. the htonX functions are inverses of the corresponding ntohX functions. These may be identity functions (with perhaps no code generated) for machine with the same internal order as "network" order. So, to show a "long" as a sequence of bytes in a way that comes out the same on different machines, do: void print_long( long v ) { volatile union { long l; char a[4]; } u; u.l = htonl( v ); printf( "%02x %02x %02x %02x\n", u.a[0], u.a[1], u.a[2], u.a[3] ); Quote: }
I suggest using this convention is better than doing the conversions yourself, since every C compiler vendor will be able to make the translation more efficiently (by being machine-specific) than you can in writing general code, while there _is_ a convention about what network order is that you can buy into by using these functions. -Scott David Daniels
Quote:
> >My question to clcm: How does the endian of the machine > >you're running this on affect things? > "Endianness" is a matter of perception, as it were. (More on this > in a moment.) > ... plenty of more than reasonable stuff ... > C always[%] relies on the underlying machine to interpret individual > bits -- the "basic building block" (or "brick size", if you will) > in C is the "char", which C also calls a "byte". (To confuse > matters slightly, the C "char/byte" could be more than 8 bits -- > but usually it is 8.) That means that if you write data to a file > as a sequence of "char"s, and move that file from one machine to > another -- say, by copying the file on a floppy -- it is up to the > two machines to interpret the bits in those "char"s in the same > way. That way when you write binary value 0x33 on one machine., > and read it on the other, you get binary value 0x33 again. > ... > On the other hand, suppose you take a sequence of 32-bit values, > break each of those up into four 8-bit bytes, write those bytes on > a floppy, move the floppy to another machine, and then go to > re-assemble those four 8-bit bytes into a new 32-bit value. In > order to get the *same* value (say, 0x12345678), you had better > reassemble those four bytes in the same order. This is endianness > cropping up: You took a "32-bit brick" and broke it into four, > and now you want to glue the four back together, so you had best > do it the same way. If you rely on the fact that the two computers > (say, an Intel and a SPARC) happen to be able to do the "break int > into 4-chars" and "treat 4-chars as an int" in hardware, you will > also rely implicitly on the *order* that the hardware uses for > breaking-up-and-assembling. > If you do the breaking-and-assembling "manually" -- say, by doing: > putc((val >> 24) & 0xff, fp); > putc((val >> 16) & 0xff, fp); > putc((val >> 8) & 0xff, fp); > putc((val ) & 0xff, fp); > /* check for error */ > ... > val3 = getc(fp); > val2 = getc(fp); > val1 = getc(fp); > val0 = getc(fp); > /* check for EOF and error */ > val = (val3 << 24) | (val2 << 16) | (val1 << 8) | val0; > you eliminate the dependence on the hardware's order. You have > imposed your own "endianness" on your data format instead -- here, > "big endian", because you putc() the most significant 8-bit-brick > first, and when you glue them back together, you stash that one in > the most significant position. > ...
--
|
Sat, 23 Feb 2002 03:00:00 GMT |
|
 |
Chris Tore #13 / 23
|
 seeing the bits in a byte...
Quote:
>My fault was thinking that the left and right, as used in left or right >shift, are the same as the 'normal' use of left and right in language, >concerning numbers (the more significant the digit, the more to the >left; assuming that big endianness meant the opposite).
Actually, "big endian" is "most significant portion first", and: Quote: >Thus the compiler lets the most significant bit always be on the left in >this case (just from a 'human point of view')?
... it is not so much the compilers that use big-endian notation (although they do) when writing numbers like "1024" and "0x40"; rather, it is the humans who write numbers down on paper, or at a computer, who do so. There are a few natural languages (including the largely obsolete English-language usage preserved in the old nursery rhyme about "four and twenty blackbirds baked in a pie") where some numbers are stated "little-endian" -- "four and twenty" means 24 -- but in general we start out with the most significant digit, and work down to the least. In other words, in this case, the compiler is matching our expectations, so that the constant 24 means "four and twenty", not "two and forty", and 1024 means what we expect as well. How "1024" gets broken up into bytes -- assuming 8-bit bytes, into 0x04 and 0x00 -- is up to the compiler and/or machine, and its own endianness will remain hidden unless and until we peek at the two separate bytes. If you conclude that "the 0x04 was put in first", you have concluded that the system is using a big-endian notation in this case; if you conclude that "the 0x00 was put in first", you have found the system to use little-endian notation. VAX and Intel x86 are classic little-endian architectures; the 680x0 is a classic big-endian architecture. A lot of modern microprocessors are "bi-endian", with endianness bits in the CPU and/or page tables (or equivalent). The PDP-11, on which C really took shape, is mostly little-endian, except that a "long" with value 0x11223344 is not stored in memory as the four unsigned chars { 0x44, 0x33, 0x22, 0x11 } (little-endian), nor is it the four byte sequence { 0x11, 0x22, 0x33, 0x44 } (big-endian). Rather, 0x11223344 is stored in memory as { 0x22, 0x11, 0x44, 0x33 }. If you have the hardware assemble each two-byte group for you, it comes up with the values { 0x1122, 0x3344 }. That means that it stores 16-bit values little-endian when breaking them into 8-bit bytes, and assembles 8-bit bytes little-endian into 16-bit values, but when taking 16-bit groups to or from a 32-bit value, it uses big-endian order! This is a form of "mixed endianness", and it shows that you cannot simply divide the world into "big" and "little" endian and expect to be able to use one byte-swapping routine to compensate. :-) (The VAX also uses a peculiar significance-order when working with D-floating formats, so that any D-float can be treated as an F-float simply by lopping off its tail. For various reasons, however, no one ever seems to find such FP formats odd -- perhaps because FP formats are already sufficiently complicated to overwhelm other aesthetic senses. :-) ) -- In-Real-Life: Chris Torek, Berkeley Software Design Inc
--
|
Sun, 24 Feb 2002 03:00:00 GMT |
|
 |
Keith Thompso #14 / 23
|
 seeing the bits in a byte...
[...] Quote: > ... it is not so much the compilers that use big-endian notation > (although they do) when writing numbers like "1024" and "0x40"; > rather, it is the humans who write numbers down on paper, or at a > computer, who do so. There are a few natural languages (including > the largely obsolete English-language usage preserved in the old > nursery rhyme about "four and twenty blackbirds baked in a pie") > where some numbers are stated "little-endian" -- "four and twenty" > means 24 -- but in general we start out with the most significant > digit, and work down to the least.
[...] Here's an obscure historical note on endianness (only vaguely on-topic). Our decimal numbering system, referred to as Arabic or Hindu-Arabic numbers, was inherited by the Europeans from the Arabs, largely replacing the older and more unwieldy Roman numerals. The Arabs wrote numbers like 1024 with the most significant digit on the left; the Europeans maintained that convention. The trick is, Arabic is written right-to-left, whereas Europeans languages are written left-to-right. So, the representation magically changed from little-endian to big-endian, not by swapping the digits, but by swapping the rest of the written language around them. Caveat: This is based on a vague recollection of something I read somewhere or other (probably on the net) an unknown number of years ago. I'd be interested if anyone can confirm or deny this. I'm also curious how modern Arabic handles this -- or modern Hebrew, for that matter. --
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst> "Oh my gosh! You are SO ahead of your time!" -- anon. --
|
Mon, 25 Feb 2002 03:00:00 GMT |
|
|
Page 1 of 2
|
[ 23 post ] |
|
Go to page:
[1]
[2] |
|