C-FFI: GC questions, wchar_t, Melange redesign 
Author Message
 C-FFI: GC questions, wchar_t, Melange redesign

Eric Kidd wrote on 1999-January-24:

Quote:
> * Harlequin's C FFI allows a "char *" to be extracted from a <byte-string>
>   and passed straight to C "without copying". This obviously has lots of
>   repercussions--byte strings must be represented as NULL-terminated
>   character arrays, and the garbage collector will have to be careful
>   about moving them. What's going on here, and how much of it can or
>   should be made portable?

Interesting... how do other FFIs handle strings? Does the HLQN FFI (sorry, I
don't have it handy right now) really state that the strings must be passed
to C without copying? Or are the semantics of the FFI such that an
implementation could copy if it wanted?

Quote:
>Since people are looking at internationalization, we should probably also
>think about how wchar_t interacts with <unicode-string>.

This is a bit more tricky: different platforms have different sizes for
wchar_t. For example, GCC on Solaris uses 4-bytes: twice as much as needed
for Unicode in Dylan. It would be tremendously wasteful to use 4-byte
characters in an implementation, so one would either use 2-byte characters
or another encoding, such as UTF-8, internally. In any event, if the C
mapping from <unicode-string> is "wchar_t *" then this probably cannot be
done without copying.

I'm still thinking about these issues --- stay tuned.

    -tre



Sat, 21 Jul 2001 03:00:00 GMT  
 C-FFI: GC questions, wchar_t, Melange redesign

Quote:
>>Since people are looking at internationalization, we should probably also
>>think about how wchar_t interacts with <unicode-string>.>
>This is a bit more tricky: different platforms have different sizes for
>wchar_t. For example, GCC on Solaris uses 4-bytes: twice as much as needed
>for Unicode in Dylan. It would be tremendously wasteful to use 4-byte
>characters in an implementation, so one would either use 2-byte characters
>or another encoding, such as UTF-8, internally. In any event, if the C
>mapping from <unicode-string> is "wchar_t *" then this probably cannot be
>done without copying.

Oh... wchar_t is typedef'ed to long on Solaris to support ISO/IEC 10646
characters in UCS-4 format.  Unicode 2.1 (with surrogates) is equivalent to
ISO/IEC 10646 in UTF-16 format.

For future internationalization (and for present support of XML) the
common-extensions library ought to support an <iso10646-string> class.
There also needs to be a generic function (possibly in a separate
internationalization library) as-string(class :: <class>, string ::
<sequence>, #key encoding) for converting to and from <byte-string>
sequences of different native encodings.

Depending on how wide wchar_t is, the FFI should be defined to map "wchar_t
*" to either <unicode-string> or <iso10646-string>.




Sat, 21 Jul 2001 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. C-FFI: GC questions, wchar_t, Melange redesign

2. to CS: or not to CS: in F-PC assembler

3. Fun-O Win32 FFI Question

4. c-ffi question involving c-struct

5. C FFI question

6. FFI question #2: Getting the address of a C function

7. FFI question: Setting array elements

8. FFI questions

9. FFI Question

10. ACL FFI question

11. FFI question in ACL/W 2.0

12. wchar_t **p?

 

 
Powered by phpBB® Forum Software