slightly OT: cross-platform binary compatibility? 
Author Message
 slightly OT: cross-platform binary compatibility?

Say I'm writing binary data to a file in my program.  I want this binary
data to be interpreted the same, regardless of platform or hardware.  In
particular, I just started playing with encryption, and I want to ensure
that if I encrypt something under one platform-hardware, I can decrypt
it under another.  (So I'm only dealing with integral types for this
post.)

How is such binary compatibility ordinarily achieved?  Compression
programs come to mind---a bzip2 file is usable under any platform that
has the bzip2 utility.

I doesn't seem too elegant for my program to have a million #if
directives for every platform, and makes dealing with new platforms
ugly.

So my initial thought is to use the following scheme: commit to always
using one endianness, say big-endian for this example.  Then my code
will look something like the following:

my_function(input_data) {
        if (architecture_is_not_big-endian()) {
                covert_data_to_big-endian(input_data);
        }
        // do whatever needs to be done with input_data

Quote:
}

Is this the correct way to approach this problem?  Or is this too naive?
Also, doesn't OpenVMS use a scheme that's neither big-endian nor
little-endian?

--

The given email address is invalid.  Replace the string
"bogus" with "net" to obtain my correct email address.
Down with spam!



Sun, 27 Nov 2005 22:14:47 GMT  
 slightly OT: cross-platform binary compatibility?


Quote:
> Say I'm writing binary data to a file in my program.  I want this binary
> data to be interpreted the same, regardless of platform or hardware.  In
> particular, I just started playing with encryption, and I want to ensure
> that if I encrypt something under one platform-hardware, I can decrypt
> it under another.  (So I'm only dealing with integral types for this
> post.)

> How is such binary compatibility ordinarily achieved?  Compression
> programs come to mind---a bzip2 file is usable under any platform that
> has the bzip2 utility.

The general approach is to layout your structures as characters and assemble
the integers from the bytes.

struct CpioHdr {
    uchar magic[2];
    uchar size[4];
    ...

Quote:
};

Then provide "endian" macros to pull shorts and longs from the elements:

#define getshort(a) (a[0] << 8 | a[1])
#define getlong(a) (a{0] << 24 | a[1] << 16 | a[2] << 8 | a[3])

karl m



Sun, 27 Nov 2005 23:32:34 GMT  
 slightly OT: cross-platform binary compatibility?

Quote:

> Say I'm writing binary data to a file in my program.

    All right: "I'm writing binary data to a file in my program."

Quote:
>  I want this binary
> data to be interpreted the same, regardless of platform or hardware.  In
> particular, I just started playing with encryption, and I want to ensure
> that if I encrypt something under one platform-hardware, I can decrypt
> it under another.  (So I'm only dealing with integral types for this
> post.)

    This is Question 20.5 in the comp.lang.c Frequently
Asked Questions (FAQ) list

        http://www.eskimo.com/~scs/C-faq/top.html

--



Mon, 28 Nov 2005 01:01:44 GMT  
 slightly OT: cross-platform binary compatibility?


Quote:
>Say I'm writing binary data to a file in my program.  I want this binary
>data to be interpreted the same, regardless of platform or hardware.  In
>particular, I just started playing with encryption, and I want to ensure
>that if I encrypt something under one platform-hardware, I can decrypt
>it under another.  (So I'm only dealing with integral types for this
>post.)

The most often suggested way of doing this (here, at least) is to
convert everything to ASCII text (with sprintf or fprintf, f'rexample)
before writing and convert it back (with sscanf or strtoul, f'rexample)
after reading.

Quote:
>How is such binary compatibility ordinarily achieved?  Compression
>programs come to mind---a bzip2 file is usable under any platform that
>has the bzip2 utility.

>I doesn't seem too elegant for my program to have a million #if
>directives for every platform, and makes dealing with new platforms
>ugly.

>So my initial thought is to use the following scheme: commit to always
>using one endianness, say big-endian for this example.

If you want to write binary data and have it portable, then this is
probably the best way to do it.  Big-endian is probably the best choice
there; most implementations have (as an extension) ntohl and friends,
functions that convert a long (.to.l) or short (.to.s) integer into
network byte order (which is big-endian) (hton.) or out of network byte
order (ntoh.).

Quote:
>     Then my code
>will look something like the following:

>my_function(input_data) {
>    if (architecture_is_not_big-endian()) {
>            covert_data_to_big-endian(input_data);
>    }

This would just be:
        input_data_to_work_with=ntohl(input_data_from_file);/*[1]*/
(ntohl will do The Right Thing no matter what the endianness you're
working with is).

Quote:
>    // do whatever needs to be done with input_data
>}
>Is this the correct way to approach this problem?  Or is this too naive?

It works quite well, and you avoid converting to and from text (which
may or may not be something you want to avoid doing).  It doesn't handle
floating-point values very well - you'll need some format that both
the source and destination architectures understand; if they don't
use the same format natively, this could be Difficult.  (The IEEE
(754?) representation, which most (but not all) systems use natively,
is probably a good choice but I'd expect that if you have to do the
conversion yourself it would be quite nontrivial.)

Quote:
>Also, doesn't OpenVMS use a scheme that's neither big-endian nor
>little-endian?

The implementor of ntohl and friends (which may be you if your
implemtation doesn't have it) is responsible for getting it right, even
if you have a weird-endian architecture.  See [1] for set of examples
that, if I'm not mistaken, should work anywhere.

dave

[1] If you don't have ntohl, something like this should work (with the
    usual disclaimer that I haven't compiled or tested it):
--------
#if BIG_ENDIAN
        /*ntohl is just the identity function.  A smart compiler may be
            able to recognize and inline (=>completely eliminate) it.
        */
        unsigned long ntohl(unsigned long x){return x;}
#elif LITTLE_ENDIAN
        /*Assumes 32-bit long, 8-bit byte*/
        unsigned long ntohl(unsigned long x)
        {
                /*I've seen bit and byte reversals done this way in a
                    bunch of places.  Not sure who to give credit for
                    the original idea to.
                */
                /*The unwanted bits here just slide off the end...*/
                x=(x>>16)|(x<<16);
                /*...but we have to mask them here*/
                x=((x>>8) & 0x00ff00ff) | ((x<<8) & 0xff00ff00))

                return x;
        }
#else   /*weird-endian*/
        /*Assumes 32-bit long, 8-bit byte
          Also requires (implicit in sizes assumed) that there are no trap
            representations for unsigned long.
        */
        unsigned long ntohl(unsigned long src)
        {
                /*If we're on a weird-endian system, we end up having
                    to pack the bytes by hand instead of playing games
                    with the values to shift them around.
                  Note that this version will work anywhere that the size
                    assumptions are valid, no matter what the endianness.
                  Depending on the processor, this may even be faster
                    than the little-endian version.  In the highly
                    unlikely event it's where your program is spending
                    most of its time, you might want to check the
                    generated assembly to see which to use.
                */
                unsigned long dest;
                unsigned char *ptr=(unsigned char *)&dest;

                ptr[3]=src&0xff;    src<<=8;
                ptr[2]=src&0xff;    src<<=8;
                ptr[1]=src&0xff;    src<<=8;
                ptr[0]=src&0xff;

                return dest;
        }
#endif          /*ntohl implementations for various endianneses*/
--------
--

So they're talking about code re-use. I guess they must be C++ programmers.
C programmers don't talk about code re-use, because they're too busy
actually doing it.                      --Richard Heathfield in comp.lang.c



Mon, 28 Nov 2005 01:20:21 GMT  
 slightly OT: cross-platform binary compatibility?

Quote:
>Say I'm writing binary data to a file in my program.  I want this binary
>data to be interpreted the same, regardless of platform or hardware.  In

Good luck.  You'll need it.  It might actually be achievable if you
are willing to restrict yourself to machines with 8-bit characters
and two's complement arithmetic.  If you can avoid needing to represent
negative numbers, you might be able to drop the two's complement
requirement.

In many cases, the best approach is to turn everything into text
(use printf() on integers) rather than using a binary file.

Quote:
>particular, I just started playing with encryption, and I want to ensure
>that if I encrypt something under one platform-hardware, I can decrypt
>it under another.  (So I'm only dealing with integral types for this
>post.)

Write a specification of what your binary file is supposed to look
like in terms of 8-bit characters.  Write your program so that it
will read/write this format REGARDLESS of the native endianness of
the machine.  (This often involves extensive use of the << and >>
shift operators and the & and | bitwise operators.) Use no #if
directives based on endianness unless you absolutely have to for
speed.  You also have to nail down how big each value can be,
independent of the size of integer types on any individual platform.

My suggestion is to never represent multi-byte integers as big-endian
or little-endian.  Too boring.

Example:

        A Student Data Record consists of a Student Data Header Record
        followed by one Student Record per student.

        A Student Data Header Record contains the number of Student
        Records to follow (24-bit unsigned integer):

        Byte 0:  Most significant 8 bits of the number of Student Records
        Byte 1:  Least significant 8 bits of the number of Student Records
        Byte 2:  Middle significant 8 bits of the number of Student Records
        Byte 3:  The letter 'S', in ASCII
        Byte 4:  The letter 'D', in ASCII
        Byte 5:  The letter 'R', in ASCII

        A Student Record contains the following data:

        Byte 0:  Day of birth (1-31)
        Byte 1:  Month of birth (1-12)
                 A year is a 12-bit unsigned integer (runs out in the
                 year 4095), represented as follows:
        Byte 2:  Least significant 3 bits of the Year of birth
        Byte 3:  Most signifcant 3 bits of the Year of birth
        Byte 4:  Middle significant 6 bits of the Year of birth
        Bytes 5-24:  First 20 characters of student's last name.
                     If there are less than 20 characters, pad with spaces.
                     If there are more than 20 characters, tough, chop it.
        ...

Quote:
>How is such binary compatibility ordinarily achieved?  Compression
>programs come to mind---a bzip2 file is usable under any platform that
>has the bzip2 utility.

Everything is an 8-bit character, and is treated as such.

Quote:
>I doesn't seem too elegant for my program to have a million #if
>directives for every platform, and makes dealing with new platforms
>ugly.

You shouldn't need #if directives for endianness (incidentally, you
DO know that there are machines that don't use inner-left-endian
or inner-right-endian, don't you?).

Quote:
>So my initial thought is to use the following scheme: commit to always
>using one endianness, say big-endian for this example.  Then my code
>will look something like the following:

>my_function(input_data) {
>    if (architecture_is_not_big-endian()) {
>            covert_data_to_big-endian(input_data);
>    }
>    // do whatever needs to be done with input_data

Yeech.  Do it right the first time, without an intermediate
multibyte form.  Example, for our number of students value above:

        buffer[0] = (number_of_students >> 16) & 0xff;
        buffer[1] = number_of_students & 0xff;
        buffer[2] = (number_of_students >> 8) & 0xff;

number_of_students needs at least 24 bits, so an appropriate C type is
long.  This might even work on a machine with 11 bit chars if somehow
in the process of outputting the most significant 3 bits end up getting
discarded.  We *DON'T CARE* what the endianness of the host machine is.
The code still works.  

Going the other way:

        number_of_students = (buffer[0] & 0xff) << 16 |
                                (buffer[1] & 0xff) |
                                (buffer[2] & 0xff) << 8 ;

(Note that the number of students is specified as a non-big-endian
non-little-endian value).

Quote:
>}
>Is this the correct way to approach this problem?  Or is this too naive?
>Also, doesn't OpenVMS use a scheme that's neither big-endian nor
>little-endian?

I'm not familiar with how OpenVMS does things.

                                                Gordon L. Burditt



Mon, 28 Nov 2005 06:35:40 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. C/C++ cross-platform compatibility

2. slightly OT.

3. Slightly OT: sorting

4. (slightly OT): Static Code Analysis Tool (C++)

5. C portability [slightly OT]

6. Slightly OT, but pertinant to ANSI C

7. slightly OT, but C# curious

8. Style Question (slightly O.T.)

9. (Slightly OT) General ActiveX information?

10. Slightly OT: VC++6, Dikumware 3.08 and Stingray

11. Slightly OT - Class Hierarchy

 

 
Powered by phpBB® Forum Software