Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *) 
Author Message
 Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

Quote:

> Hi Everyone,
>         I have downloaded a huge text file  (in Chinese) (21Mb)

>         I used to use msort to sort big files, which works perfectly
> for English Files but not in Chinese.

>         This may be due to the strcmp function they used! strcmp
> takes two (char *) which is signed.

Not necessarily. It's up to the implementation whether a plain char is
signed or unsigned.

Quote:
> But what I want is strcmp
> whichs takes two (unsigned char *)

>         Now my question is where can I download a sorting program
> which works for UNICODE and can sort EXTREMELY HUGE file? Or where
> can I download the source code so that I can make it to unsigned
> char!

To compare Unicode strings use wcscmp, prototyped in wchar.h. It's like
strcmp, but it takes two arguments of type wchar_t*, that is, pointer to
wide character.

--
Pete Becker
Dinkumware, Ltd.
http://www.*-*-*.com/



Sun, 31 Dec 2000 03:00:00 GMT  
 Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)
Hi Everyone,
        I have downloaded a huge text file  (in Chinese) (21Mb)

        I used to use msort to sort big files, which works perfectly
for English Files but not in Chinese.

        This may be due to the strcmp function they used! strcmp
takes two (char *) which is signed. But what I want is strcmp
whichs takes two (unsigned char *)

        Now my question is where can I download a sorting program
which works for UNICODE and can sort EXTREMELY HUGE file? Or where
can I download the source code so that I can make it to unsigned
char!

        Please Help!!!

Yick Yan



Mon, 01 Jan 2001 03:00:00 GMT  
 Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

Quote:


>> Hi Everyone,
>>         I have downloaded a huge text file  (in Chinese) (21Mb)

>>         I used to use msort to sort big files, which works perfectly
>> for English Files but not in Chinese.

>>         This may be due to the strcmp function they used! strcmp
>> takes two (char *) which is signed.

>Not necessarily. It's up to the implementation whether a plain char is
>signed or unsigned.

This doesn't matter here. strcmp() is defined to compare the strings by
interpreting the individual characters as unsigned char values.

Quote:
>> But what I want is strcmp
>> whichs takes two (unsigned char *)

The results would be no different if it did.

--
-----------------------------------------


-----------------------------------------



Mon, 01 Jan 2001 03:00:00 GMT  
 Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

: Hi Everyone,
:       I have downloaded a huge text file  (in Chinese) (21Mb)

:       I used to use msort to sort big files, which works perfectly
: for English Files but not in Chinese.

:       This may be due to the strcmp function they used! strcmp
: takes two (char *) which is signed. But what I want is strcmp
: whichs takes two (unsigned char *)

:       Now my question is where can I download a sorting program
: which works for UNICODE and can sort EXTREMELY HUGE file? Or where
: can I download the source code so that I can make it to unsigned
: char!

As someone else pointed out, there is a function (I don't remember the
name) to compare UNICODE chars.  I have a module to sort huge amounts
of data with your own function, look for "bigsort.c" in

  http://www.pci.uni-heidelberg.de/tc/usr/joerg/prg/testbigsort.tar.gz

It should be fairly easy to adapt the required solution from this code.

Hope that helps, Joerg

:       Please Help!!!

: Yick Yan

--
                                                    \|/

------------------------------------------------oOO-(_)-OOo---------
        Joerg Schoen
E-mail: Joerg.Schoen AT tc DOT pci DOT uni-heidelberg DOT de
Web-Page: http://www.pci.uni-heidelberg.de/tc/usr/joerg
--------------------------------------------------ooO-Ooo-----------



Mon, 01 Jan 2001 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

2. Sorting a Huge Unicode File use strcmp(unsigned char *, unsigned char *)

3. bytes to unsigned char, unsigned short, unsigned int, ...

4. Using unsigned char in (char *) functions...

5. char, unsigned char, signed char

6. From unsigned int to unsigned char

7. unsigned char assignment to unsigned int.

8. Casting from unsigned char[] to unsigned short

9. How to convert unsigned long to unsigned char?

10. To convert unsigned char to unsigned short in VC++

11. unsigned/unsigned char

12. To convert unsigned char to unsigned short in VC++

 

 
Powered by phpBB® Forum Software