Quote:
>Say I'm writing binary data to a file in my program. I want this binary
>data to be interpreted the same, regardless of platform or hardware. In
Good luck. You'll need it. It might actually be achievable if you
are willing to restrict yourself to machines with 8-bit characters
and two's complement arithmetic. If you can avoid needing to represent
negative numbers, you might be able to drop the two's complement
requirement.
In many cases, the best approach is to turn everything into text
(use printf() on integers) rather than using a binary file.
Quote:
>particular, I just started playing with encryption, and I want to ensure
>that if I encrypt something under one platform-hardware, I can decrypt
>it under another. (So I'm only dealing with integral types for this
>post.)
Write a specification of what your binary file is supposed to look
like in terms of 8-bit characters. Write your program so that it
will read/write this format REGARDLESS of the native endianness of
the machine. (This often involves extensive use of the << and >>
shift operators and the & and | bitwise operators.) Use no #if
directives based on endianness unless you absolutely have to for
speed. You also have to nail down how big each value can be,
independent of the size of integer types on any individual platform.
My suggestion is to never represent multi-byte integers as big-endian
or little-endian. Too boring.
Example:
A Student Data Record consists of a Student Data Header Record
followed by one Student Record per student.
A Student Data Header Record contains the number of Student
Records to follow (24-bit unsigned integer):
Byte 0: Most significant 8 bits of the number of Student Records
Byte 1: Least significant 8 bits of the number of Student Records
Byte 2: Middle significant 8 bits of the number of Student Records
Byte 3: The letter 'S', in ASCII
Byte 4: The letter 'D', in ASCII
Byte 5: The letter 'R', in ASCII
A Student Record contains the following data:
Byte 0: Day of birth (1-31)
Byte 1: Month of birth (1-12)
A year is a 12-bit unsigned integer (runs out in the
year 4095), represented as follows:
Byte 2: Least significant 3 bits of the Year of birth
Byte 3: Most signifcant 3 bits of the Year of birth
Byte 4: Middle significant 6 bits of the Year of birth
Bytes 5-24: First 20 characters of student's last name.
If there are less than 20 characters, pad with spaces.
If there are more than 20 characters, tough, chop it.
...
Quote:
>How is such binary compatibility ordinarily achieved? Compression
>programs come to mind---a bzip2 file is usable under any platform that
>has the bzip2 utility.
Everything is an 8-bit character, and is treated as such.
Quote:
>I doesn't seem too elegant for my program to have a million #if
>directives for every platform, and makes dealing with new platforms
>ugly.
You shouldn't need #if directives for endianness (incidentally, you
DO know that there are machines that don't use inner-left-endian
or inner-right-endian, don't you?).
Quote:
>So my initial thought is to use the following scheme: commit to always
>using one endianness, say big-endian for this example. Then my code
>will look something like the following:
>my_function(input_data) {
> if (architecture_is_not_big-endian()) {
> covert_data_to_big-endian(input_data);
> }
> // do whatever needs to be done with input_data
Yeech. Do it right the first time, without an intermediate
multibyte form. Example, for our number of students value above:
buffer[0] = (number_of_students >> 16) & 0xff;
buffer[1] = number_of_students & 0xff;
buffer[2] = (number_of_students >> 8) & 0xff;
number_of_students needs at least 24 bits, so an appropriate C type is
long. This might even work on a machine with 11 bit chars if somehow
in the process of outputting the most significant 3 bits end up getting
discarded. We *DON'T CARE* what the endianness of the host machine is.
The code still works.
Going the other way:
number_of_students = (buffer[0] & 0xff) << 16 |
(buffer[1] & 0xff) |
(buffer[2] & 0xff) << 8 ;
(Note that the number of students is specified as a non-big-endian
non-little-endian value).
Quote:
>}
>Is this the correct way to approach this problem? Or is this too naive?
>Also, doesn't OpenVMS use a scheme that's neither big-endian nor
>little-endian?
I'm not familiar with how OpenVMS does things.
Gordon L. Burditt