unsigned char and file streams 
Author Message
 unsigned char and file streams

Hi,
I don't mean to harp on this but I'm just curious about "best practices" and
how people have dealt with this problem of opening a file of unsigned chars
in C++. I see 2 possibilities:

1) use the C FILE functions
2) use the C++ fstream something like this:
   typedef ifstream<unsigned char> uc_ifstream;
   uc_ifstream myInputFile(filename);

The Dinkum online docs seems to lean towards (1) if you go to the "Files and
Streams" section.
(2) seems innocent enough but I'm not sure how the implementation of the
char_trait<unsigned char> default constructed will operate. Are there any
advantages to using the C++ fstream and its char_trait, perhaps easing
cross-platform implementations? Are there any disadvantages, perhaps
performance overhead?

There is also the possibility of using:
mystream.read(reinterpret_cast<char*>(myunsignedcharptr), n);
which was posted in another thread, which would be fine for a single
machine, but I don't like it since it's a headache waiting to happen if I
ever want to recompile the code on another machine.

Thanks for any advice,
John.



Sun, 16 May 2004 03:38:13 GMT  
 unsigned char and file streams


Quote:
> Hi,
> I don't mean to harp on this but I'm just curious about "best practices"
and
> how people have dealt with this problem of opening a file of unsigned
chars
> in C++.

I don't see the problem. The member functions

read(), write()

are for reading and writing binary data. They are equivalent to C's fread(),
fwrite(). You will need reinterpret_cast if you decide to read/write int's,
short's, float's, double's, POD structs etc in binary as read/write are not
overloaded.  So why is there suddenly a problem reading/writing with
unsigned char or signed char when only plain char is provided? To me, I lump
unsigned char, signed char with the types I mentioned. So I have no qualms
with using reinterpret_cast as appropriate.

Stephen Howe



Sun, 16 May 2004 03:45:11 GMT  
 unsigned char and file streams


Quote:
> I don't see the problem.

I'm just a little concerned about the following test on an Intel PC,
although the pointers are preserved, on conversion to a 32bit number the bit
pattern gets filled differently. True, as long as you never make that
conversion while you are manipulating your buffer, there is no problem, but
I would still rather not do the reinterpret_cast if I can help it.

  unsigned char* a = new unsigned char[1];
  *a = 255;
  char* b = new char[1];
  b = reinterpret_cast<char*>(a);

  char* s = new char[500];
  sprintf(s, " a = %.8x, b = %.8x\n a = %d, b = %d\n a = %u, b = %u\n a =
%.8x, b = %.8x\n", a, b, *a, *b, *a, *b, *a, *b);
  OutputDebugString(s);

  delete[] a;
  delete[] b;
  delete[] s;

The output is:
 a = 047a6498, b = 047a6498
 a = 255, b = -1
 a = 255, b = 4294967295
 a = 000000ff, b = ffffffff

John.



Sun, 16 May 2004 05:00:42 GMT  
 unsigned char and file streams


Quote:
> Hi,
> I don't mean to harp on this but I'm just curious about "best practices"
and
> how people have dealt with this problem of opening a file of unsigned
chars
> in C++.

I've become somewhat fascinated with this "problem". Below is a snippet from
revision 19 of the open issues in the ISO/IEC 14882 standard.
http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2001/n1317.html
It cites that "There was strong opposition to requiring that library
implementors provide those specializations ... [char_traits<signed char> and
char_traits<unsigned char>]"

I've been looking for an article that would explain what would be required
for these specializations and why the strong opposition to requiring
implementors to supply them.

Thanks for any pointers,
John

<snip>
167. Improper use of traits_type::length()
Section: 27.6.2.5.4 [lib.ostream.inserters.character]  Status: Review
Submitter: Dietmar Khl  Date: 20 Jul 1999

Paragraph 4 states that the length is determined using traits::length(s).
Unfortunately, this function is not defined for example if the character
type is wchar_t and the type of s is char const*. Similar problems exist if
the character type is char and the type of s is either signed char const* or
unsigned char const*.

Proposed resolution:

Change 27.6.2.5.4 paragraph 4 from:

  Effects: Behaves like an formatted inserter (as described in
lib.ostream.formatted.reqmts) of out. After a sentry object is constructed
it inserts characters. The number of characters starting at s to be inserted
is traits::length(s). Padding is determined as described in
lib.facet.num.put.virtuals. The traits::length(s) characters starting at s
are widened using out.widen (lib.basic.ios.members). The widened characters
and any required padding are inserted into out. Calls width(0).

to:

  Effects: Behaves like an formatted inserter (as described in
lib.ostream.formatted.reqmts) of out. After a sentry object is constructed
it inserts characters. The number len of characters starting at s to be
inserted is

  - traits::length((const char*)s) if the second argument is of type const
charT*
  - char_traits<char>::length(s) if the second argument is of type const
char*, const signed char*, or const unsigned char* and and charT is not
char.

  Padding is determined as described in lib.facet.num.put.virtuals. The len
characters starting at s are widened using out.widen
(lib.basic.ios.members). The widened characters and any required padding are
inserted into out. Calls width(0).

[Kona: It is clear to the LWG there is a defect here. Dietmar will supply
specific wording.]

[Post-Tokyo: Dietmar supplied the above wording.]

[Toronto: The original proposed resolution involved char_traits<signed char>
and char_traits<unsigned char>. There was strong opposition to requiring
that library implementors provide those specializations of char_traits.]

[Copenhagen: This still isn't quite right: proposed resolution text got
garbled when the signed char/unsigned char specializations were removed.
Dietmar will provide revised wording.]

</snip>



Mon, 17 May 2004 05:45:48 GMT  
 unsigned char and file streams

Quote:
> I've become somewhat fascinated with this "problem". Below is a snippet
from
> revision 19 of the open issues in the ISO/IEC 14882 standard.
> http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2001/n1317.html
> It cites that "There was strong opposition to requiring that library
> implementors provide those specializations ... [char_traits<signed char>
and
> char_traits<unsigned char>]"

John

raise the question on comp.std.c++.

Many of the members who sat on the standards committee hang out there. They
will be able to give you definitive responses.

Stephen Howe



Mon, 17 May 2004 08:38:45 GMT  
 unsigned char and file streams

Quote:



> > I don't see the problem.

> I'm just a little concerned about the following test on an Intel PC,
> although the pointers are preserved, on conversion to a 32bit number the bit
> pattern gets filled differently. True, as long as you never make that
> conversion while you are manipulating your buffer, there is no problem, but
> I would still rather not do the reinterpret_cast if I can help it.

C and C++ file IO deals with char and wchar_t, not unsigned char.
fread works with any pointer type at all - the fact that it takes a
unsigned char is simply because it isn't typesafe. Typesafety is the
difference between C and C++ here, not support of different character
types.

Quote:

>   unsigned char* a = new unsigned char[1];
>   *a = 255;
>   char* b = new char[1];
>   b = reinterpret_cast<char*>(a);

>   char* s = new char[500];
>   sprintf(s, " a = %.8x, b = %.8x\n a = %d, b = %d\n a = %u, b = %u\n a =
> %.8x, b = %.8x\n", a, b, *a, *b, *a, *b, *a, *b);
>   OutputDebugString(s);

>   delete[] a;
>   delete[] b;
>   delete[] s;

> The output is:
>  a = 047a6498, b = 047a6498
>  a = 255, b = -1
>  a = 255, b = 4294967295
>  a = 000000ff, b = ffffffff

These are the results I would expect on a platform with an 8 bit
signed plain char type, although I am not sure it is safe to pass a
char to a %d argument (I'm far from an expert of C stdio and variable
length argument lists - I never use either).

char c = 255;
int i = c;
assert(i == -1);

If you want the 255, you have to do:
char c = 255;
int i = static_cast<unsigned char>(c);

or, equivalently and very explicitly:
char c = 255;
int i = std::char_traits<char>::to_int_type(c);

Tom



Tue, 18 May 2004 00:52:34 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

2. Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

3. Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

4. bytes to unsigned char, unsigned short, unsigned int, ...

5. char, unsigned char, signed char

6. From unsigned int to unsigned char

7. unsigned char assignment to unsigned int.

8. Casting from unsigned char[] to unsigned short

9. How to convert unsigned long to unsigned char?

10. To convert unsigned char to unsigned short in VC++

11. unsigned/unsigned char

12. To convert unsigned char to unsigned short in VC++

 

 
Powered by phpBB® Forum Software