VC6 to VC7: <locale> bugs? 
Author Message
 VC6 to VC7: <locale> bugs?

Hi,

While trying to recompile/test some old code with VC7, I
found some strange behaviors that might be bugs in the
VC7 locale implementation.

My platform is Win2K Simplified Chinese.

This is the complete program, named widen.cpp:

--------------------------------------------------------
#include <iostream>
#include <locale>
#include <vector>

// the system default locale:
::std::locale defloc( "" );

typedef mbstate_t MbState;
typedef ::std::codecvt<wchar_t,char,MbState> Codecvt;

// widen/narrow a vector<> of char/wchar_t's.
template< typename DstT, typename SrcT, typename CvtF >
void convert( DstT*, SrcT const& src, CvtF cvtf )
{
        typedef typename DstT::value_type DstChar;
        typedef typename SrcT::value_type SrcChar;

        Codecvt const& cvt = ::std::_USE(defloc,Codecvt);

        DstT dst( src.size() * 3 ); //
        MbState state(0);

        SrcChar const* src_first = &src[0];
        SrcChar const* src_last = src_first + src.size();
        SrcChar const* src_next;

        DstChar* dst_first = &dst[0];
        DstChar* dst_last = dst_first + dst.size();
        DstChar* dst_next;

        (cvt.*cvtf)(
                state,
                src_first, src_last, src_next,
                dst_first, dst_last, dst_next
                );

        dst.resize( dst_next - dst_first );

        ::std::cout << src_next - src_first << " converted to "
                << dst.size() << ", "
                << src_last - src_next << " left.\n";

Quote:
}

typedef ::std::vector<wchar_t> Wvector;
typedef ::std::vector<char> Nvector;

int main()
{
#ifdef REINIT
        defloc = ::std::locale("");
#endif
#ifdef CHGGLOBAL
        ::std::locale::global( defloc );
#endif
        // convert a unicode character to GB2312:
        convert( (Nvector*)0, Wvector(1, 0x4e2d), &Codecvt::out );
        // convert a two-byte GB2312 character to unicode:
        convert( (Wvector*)0, Nvector(2, '\xd0'), &Codecvt::in );
        return 0;

Quote:
}

--------------------------------------------------------

With VC6, every combinations of options I've tried, it prints:

  1 converted to 2, 0 left.
  2 converted to 1, 0 left.

as I expect.

With VC7, these command lines:

  cl -DCHGGLOBAL -EHsc -MT widen.cpp
  cl -DCHGGLOBAL -EHsc -ML widen.cpp
  cl -DCHGGLOBAL -EHsc -MTd -DREINIT widen.cpp
  cl -DCHGGLOBAL -EHsc -MLd -DREINIT widen.cpp

produce the same result as with VC6.

These command lines:

  cl -DCHGGLOBAL -EHsc -MTd widen.cpp
  cl -DCHGGLOBAL -EHsc -MLd widen.cpp

produce:

  0 converted to 0, 2 left.
  2 converted to 2, 0 left.

So it seems that with the debug versions of runtime
libraries, the defloc constructed at global namespace
doesn't behave like the system default locale, while
with the release versions of libraries or constructing
the locale in main works as expected.

These:

  cl -EHsc -MT widen.cpp
  cl -EHsc -ML widen.cpp
  cl -EHsc -MTd -DREINIT widen.cpp
  cl -EHsc -MLd -DREINIT widen.cpp

produce:

  1 converted to 2, 0 left.
  0 converted to 0, 2 left.

The widening conversion doesn't work, and seems depending
on the global locale.  I traced the program, and found
that codecvt<wchar_t,char,mbstate_t>::in() ultimately
called _cpp_isleadbyte() to check for a DBCS lead byte.
If I read it correctly, _cpp_isleadbyte() uses the global
locale.

Are these two problems:
  1. globally constructed locale + debuglib doesn't work,
  2. codecvt<>::in() depends on the global locale.
real bugs?  What's the best way to solve them?

Regards,
tx



Sat, 02 Apr 2005 17:09:56 GMT  
 VC6 to VC7: <locale> bugs?
Wang,

Sorry to hear you're having problems. I tried to reproduce your problem on
VC7, but I always get the same result (different from yours):

    0 converted to 0, 1 left.
    2 converted to 2, 0 left.

Are you running on an operating system for a specific locale, or do you have
a local expansion pack installed on your operating system?

--
Martyn Lovell
Visual C++ Team
This posting is provided "AS IS" with no warranties, and confers no rights.


Quote:
> Hi,

> While trying to recompile/test some old code with VC7, I
> found some strange behaviors that might be bugs in the
> VC7 locale implementation.

> My platform is Win2K Simplified Chinese.

> This is the complete program, named widen.cpp:

> --------------------------------------------------------
> #include <iostream>
> #include <locale>
> #include <vector>

> // the system default locale:
> ::std::locale defloc( "" );

> typedef mbstate_t MbState;
> typedef ::std::codecvt<wchar_t,char,MbState> Codecvt;

> // widen/narrow a vector<> of char/wchar_t's.
> template< typename DstT, typename SrcT, typename CvtF >
> void convert( DstT*, SrcT const& src, CvtF cvtf )
> {
> typedef typename DstT::value_type DstChar;
> typedef typename SrcT::value_type SrcChar;

> Codecvt const& cvt = ::std::_USE(defloc,Codecvt);

> DstT dst( src.size() * 3 ); //
> MbState state(0);

> SrcChar const* src_first = &src[0];
> SrcChar const* src_last = src_first + src.size();
> SrcChar const* src_next;

> DstChar* dst_first = &dst[0];
> DstChar* dst_last = dst_first + dst.size();
> DstChar* dst_next;

> (cvt.*cvtf)(
> state,
> src_first, src_last, src_next,
> dst_first, dst_last, dst_next
> );

> dst.resize( dst_next - dst_first );

> ::std::cout << src_next - src_first << " converted to "
> << dst.size() << ", "
> << src_last - src_next << " left.\n";
> }

> typedef ::std::vector<wchar_t> Wvector;
> typedef ::std::vector<char> Nvector;

> int main()
> {
> #ifdef REINIT
> defloc = ::std::locale("");
> #endif
> #ifdef CHGGLOBAL
> ::std::locale::global( defloc );
> #endif
> // convert a unicode character to GB2312:
> convert( (Nvector*)0, Wvector(1, 0x4e2d), &Codecvt::out );
> // convert a two-byte GB2312 character to unicode:
> convert( (Wvector*)0, Nvector(2, '\xd0'), &Codecvt::in );
> return 0;
> }
> --------------------------------------------------------

> With VC6, every combinations of options I've tried, it prints:

>   1 converted to 2, 0 left.
>   2 converted to 1, 0 left.

> as I expect.

> With VC7, these command lines:

>   cl -DCHGGLOBAL -EHsc -MT widen.cpp
>   cl -DCHGGLOBAL -EHsc -ML widen.cpp
>   cl -DCHGGLOBAL -EHsc -MTd -DREINIT widen.cpp
>   cl -DCHGGLOBAL -EHsc -MLd -DREINIT widen.cpp

> produce the same result as with VC6.

> These command lines:

>   cl -DCHGGLOBAL -EHsc -MTd widen.cpp
>   cl -DCHGGLOBAL -EHsc -MLd widen.cpp

> produce:

>   0 converted to 0, 2 left.
>   2 converted to 2, 0 left.

> So it seems that with the debug versions of runtime
> libraries, the defloc constructed at global namespace
> doesn't behave like the system default locale, while
> with the release versions of libraries or constructing
> the locale in main works as expected.

> These:

>   cl -EHsc -MT widen.cpp
>   cl -EHsc -ML widen.cpp
>   cl -EHsc -MTd -DREINIT widen.cpp
>   cl -EHsc -MLd -DREINIT widen.cpp

> produce:

>   1 converted to 2, 0 left.
>   0 converted to 0, 2 left.

> The widening conversion doesn't work, and seems depending
> on the global locale.  I traced the program, and found
> that codecvt<wchar_t,char,mbstate_t>::in() ultimately
> called _cpp_isleadbyte() to check for a DBCS lead byte.
> If I read it correctly, _cpp_isleadbyte() uses the global
> locale.

> Are these two problems:
>   1. globally constructed locale + debuglib doesn't work,
>   2. codecvt<>::in() depends on the global locale.
> real bugs?  What's the best way to solve them?

> Regards,
> tx



Tue, 05 Apr 2005 02:48:10 GMT  
 VC6 to VC7: <locale> bugs?
Hi,

On Thu, 17 Oct 2002 11:48:10 -0700, "Visual C++ Team"

: Sorry to hear you're having problems. I tried to reproduce your problem on
: VC7, but I always get the same result (different from yours):
:
:     0 converted to 0, 1 left.
:     2 converted to 2, 0 left.
:
: Are you running on an operating system for a specific locale, or do you have
: a local expansion pack installed on your operating system?

Yes I'm using codepage 936, which is the system default
locale on Win2K Simplified Chinese.

It seems the default locale on your machine is something
like codepage 1252, which doesn't use multibyte encoding,
so the narrowing conversion failed, and the widening
conversion resulted in two wchar_t's.

If you have any multibyte locales, such as codepage 936
or codepage 950, installed on your machine, you may want
to change the empty locale strings as in defloc("") and
::std::locale("") to something like ".936" and give the
test program another try.

Thanks for your concerns.

Regards,
tx



Tue, 05 Apr 2005 11:50:28 GMT  
 VC6 to VC7: <locale> bugs?

Quote:
> Yes I'm using codepage 936, which is the system default
> locale on Win2K Simplified Chinese.

> It seems the default locale on your machine is something
> like codepage 1252, which doesn't use multibyte encoding,
> so the narrowing conversion failed, and the widening
> conversion resulted in two wchar_t's.

> If you have any multibyte locales, such as codepage 936
> or codepage 950, installed on your machine, you may want
> to change the empty locale strings as in defloc("") and
> ::std::locale("") to something like ".936" and give the
> test program another try.

> Thanks for your concerns.

> Regards,
> tx

Thanks for the additional information.  You're running into a problem with
the C++ Standard Library in VC7.  The locale subsystem in that library has a
number of global variables which get initialized at startup.  Unfortunately,
these are not being initialized before user global variables, like your
defloc.  Instead, they're grouped with user globals, and the actual order of
initialization is fairly random.  That's why your example works with the
retail library (libcpmt.lib), but not the debug version (libcpmtd.lib).

There are a few possible workarounds:

1) Don't rely on global variables like defloc that use the C++ Std Lib
locale subsystem.  Instead, use local variables, or reinitialize your
globals after main() is entered (like you did with the -DREINIT code).
2) Make sure your global variables are initialized after the C++ Std Lib
locale subsystem is initialized.  Add the following lines to the start of
your program:

    #pragma warning(disable: 4075)
    #pragma init_seg(".CRT$XCV)

The init_seg causes your runtime startups to be associated with a segment
name, .CRT$XCV, which alphabetically comes after the default startup
segment, .CRT$XCU.  That's enough to get your variables initialized later.
Using the init_seg pragma in this way will trigger warning C4075, which you
can disable.

Hope this helps.

--
Philip Lucido
Ruth Kurniawati
Visual C++ Team
This posting is provided "AS IS" with no warranties, and confers no rights.



Sun, 10 Apr 2005 02:45:27 GMT  
 VC6 to VC7: <locale> bugs?
On Tue, 22 Oct 2002 11:45:27 -0700, "Visual C++ Team"

: Thanks for the additional information.  You're running into a problem with
: the C++ Standard Library in VC7.  The locale subsystem in that library has a
: number of global variables which get initialized at startup.  Unfortunately,
: these are not being initialized before user global variables, like your
: defloc.  Instead, they're grouped with user globals, and the actual order of
: initialization is fairly random.  That's why your example works with the
: retail library (libcpmt.lib), but not the debug version (libcpmtd.lib).

Thank you very much for the explanation and workarounds.

In my original message, I asked two questions, one is the
initialization order problem for which you've given clear
answer.

The other is that codecvt<wchar_t,char,mbstate_t>::in()
is depending on the global locale.  I think this is a
clear bug of the Dinkumware library, but I still want
some confirmation, and want to see it got fixed in the
future versions of the compiler/library.

Regards,
tx



Mon, 11 Apr 2005 00:19:49 GMT  
 VC6 to VC7: <locale> bugs?

Quote:
> Thank you very much for the explanation and workarounds.

> In my original message, I asked two questions, one is the
> initialization order problem for which you've given clear
> answer.

> The other is that codecvt<wchar_t,char,mbstate_t>::in()
> is depending on the global locale.  I think this is a
> clear bug of the Dinkumware library, but I still want
> some confirmation, and want to see it got fixed in the
> future versions of the compiler/library.

That's not how I read the code (though I'm in the middle of
a standards meeting and have divided attention). The facet
has long captured locale-specific data and passed it to
_Mbrtowc. Been a while since I looked at that function, but
IIRC it uses the captured information to decide how to do
the conversion.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



Mon, 11 Apr 2005 01:08:25 GMT  
 VC6 to VC7: <locale> bugs?
On Wed, 23 Oct 2002 13:08:25 -0400, "P.J. Plauger"

: > The other is that codecvt<wchar_t,char,mbstate_t>::in()
: > is depending on the global locale.
:
: That's not how I read the code (though I'm in the middle of
: a standards meeting and have divided attention). The facet
: has long captured locale-specific data and passed it to
: _Mbrtowc. Been a while since I looked at that function, but
: IIRC it uses the captured information to decide how to do
: the conversion.

Well, it should work that way, but it doesn't.

At the heart of _Mbrtowc(), it calls _cpp_isleadbyte()
to decide whether the char at hand is a lead byte of a
multibyte character:

    else if ( _cpp_isleadbyte((unsigned char)*s) )
    {
      /* multibyte char */
      ...
    }

As I see it, the macro _cpp_isleadbyte() isn't using
any facet-specific data, all it is using is a table
pointed to by a global variable _pctype.

You may want to investigate this problem again when
you have time.  For now, my program works with VC6.

Regards,
tx



Mon, 11 Apr 2005 14:18:04 GMT  
 VC6 to VC7: <locale> bugs?

Quote:
> On Wed, 23 Oct 2002 13:08:25 -0400, "P.J. Plauger"

> : > The other is that codecvt<wchar_t,char,mbstate_t>::in()
> : > is depending on the global locale.
> :
> You may want to investigate this problem again when
> you have time.  For now, my program works with VC6.

Your analysis looks correct.  Code relying on _cpp_isleadbyte(), which
includes _Mbrtowc, _Tolower, and _Toupper, all suffer from this issue.
Unfortunately, this bug isn't yet fixed in the upcoming VC++ 7.1 release.

The workaround is as you state, leaving the global locale matching the one
in the std::locale object of interest.

The reason this works in VC6, but not VC7, is actually a further bug found
in VC6.  There, constructing a std::locale object, as in your variable
'defloc', would leave the global locale changed to be the same as that in
the new locale object.  VC7 fixed this, saving and restoring the global
locale around the construction of a std::locale object.

--
Philip Lucido
Ruth Kurniawati
Visual C++ Team
This posting is provided "AS IS" with no warranties, and confers no rights.



Tue, 12 Apr 2005 02:00:32 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. <<<<<<<Parsing help, please>>>>>>>>

2. possible bug in VC6, VC7, VC7.1

3. File Format conversion, ascii freeform -->.csv <-->.wk1<-->dbf<-->?HELP

4. <<<>>>Need C code advice with functions and sorting.<<<>>>

5. <><><>HELP<><><> PCMCIA Motorola Montana 33.6

6. >>>Windows Service<<<

7. VC6/VC7 bug with ?: operator and throw expressions.

8. __int64 negative value initialization bug in VC6 (SP5), may be VC7 also

9. static_cast<>, VC7, Comeau C/C++ 4.3.0.1

10. <<<Bug in Developer studio 2003 beta (Everett)>>>

11. proposal: <basic.h>, <pascal.h>, <fortran.h>, <cobol.h>

12. <<<< C Grammar for yacc needed >>>>

 

 
Powered by phpBB® Forum Software