Behavior of newline character in ANS Forth, using WRITE-FILE 
Author Message
 Behavior of newline character in ANS Forth, using WRITE-FILE

.       I am evaluating GFORTH v0.4.0.

        I know that gForth was developed under Linux.

        It seems that I am unable to write a Unix-compatible text
        file using the DOS version of gForth:

        Newline character 0x0A is translated to a 2-character
        CR/LF sequence (0x0D,0x0A)  no matter how it is written.

        WRITE-LINE appends 0x0D,0x0A to the string.

        Attempt to WRITE-FILE a single linefeed character 0x0A to
        the output file inserts a carriage return 0x0D before the
        linefeed.

        WRITE-FILE from a buffer appended with 0x0A is also
        translated to u+1 characters, inserting 0x0D before
        the line feed at c-addr+u..

        Is this correct behavior for WRITE-FILE, and what would
        you recommend the workaround to be?

--
Douglas Beattie Jr.       http://www.*-*-*.com/ ~beattidp/



Mon, 22 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE


Quote:
>.       I am evaluating GFORTH v0.4.0.

>        I know that gForth was developed under Linux.

>        It seems that I am unable to write a Unix-compatible text
>        file using the DOS version of gForth:

>        Newline character 0x0A is translated to a 2-character
>        CR/LF sequence (0x0D,0x0A)  no matter how it is written.

>        WRITE-LINE appends 0x0D,0x0A to the string.

>        Attempt to WRITE-FILE a single linefeed character 0x0A to
>        the output file inserts a carriage return 0x0D before the
>        linefeed.

>        WRITE-FILE from a buffer appended with 0x0A is also
>        translated to u+1 characters, inserting 0x0D before
>        the line feed at c-addr+u..

>        Is this correct behavior for WRITE-FILE, and what would
>        you recommend the workaround to be?

The workaround is to open the file in binary mode, i.e.,

S" file" r/w bin open-file

As far as the standard is concerned, what you described is one
possible correct behaviour, because the standard does not specify what
happens when you WRITE-FILE characters outside the range 32-126
(3.1.2.2).  It has also been the intended behaviour in Gforth for DOS
and Windows (there is no translation under Unix).

However, I am currently contemplating changing this behaviour to
address some problems in connection with FILE-SIZE, FILE-POSITION and
REPOSITION-FILE; the change would work like this: all files would be
opened in binary mode, binary mode on opening would have no effect.
WRITE-LINE would always write CRLF on DOS and Windows; READ-LINE would
probably ignore CRs (not clear about that yet).  READ-FILE and
WRITE-FILE would not perform any translation.

The disadvantages are that it becomes harder to write programs that
use WRITE-FILE and READ-FILE to process native text files under both
Unix and DOS/Windows; i.e., the goal of platform-independence and
portability suffers.  However, I don't know many programs that use
WRITE-FILE for text files (and the programs I know that use READ-FILE
treat all CR as white space).  However, when Gforth becomes available
on the Mac (should happen with MacOS X), this would really cause
problems (hmm, CR<->LF translation for the Mac, no translation for
DOS/Windows?)

What do you (clf readership) think about these issues?

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed

http://www.complang.tuwien.ac.at/anton/home.html



Tue, 23 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:



> >.       I am evaluating GFORTH v0.4.0.

> >        I know that gForth was developed under Linux.

> >        It seems that I am unable to write a Unix-compatible text
> >        file using the DOS version of gForth:

> >        Newline character 0x0A is translated to a 2-character
> >        CR/LF sequence (0x0D,0x0A)  no matter how it is written.

> >        WRITE-LINE appends 0x0D,0x0A to the string.

> >        Attempt to WRITE-FILE a single linefeed character 0x0A to
> >        the output file inserts a carriage return 0x0D before the
> >        linefeed.

> >        WRITE-FILE from a buffer appended with 0x0A is also
> >        translated to u+1 characters, inserting 0x0D before
> >        the line feed at c-addr+u..

> >        Is this correct behavior for WRITE-FILE, and what would
> >        you recommend the workaround to be?

> The workaround is to open the file in binary mode, i.e.,

> S" file" r/w bin open-file

> As far as the standard is concerned, what you described is one
> possible correct behaviour, because the standard does not specify what
> happens when you WRITE-FILE characters outside the range 32-126
> (3.1.2.2).  It has also been the intended behaviour in Gforth for DOS
> and Windows (there is no translation under Unix).

As I read section 3.1.2.2, it says;

"Programs the require the ability to send or receive control characters
have an environmental dependency"

My assumption was that they are talking about sending and receiving characters
via
communications channels, like KEY and EMIT, not files.

Quote:

> However, I am currently contemplating changing this behaviour to
> address some problems in connection with FILE-SIZE, FILE-POSITION and
> REPOSITION-FILE; the change would work like this: all files would be
> opened in binary mode, binary mode on opening would have no effect.
> WRITE-LINE would always write CRLF on DOS and Windows; READ-LINE would
> probably ignore CRs (not clear about that yet).  READ-FILE and
> WRITE-FILE would not perform any translation.

I looked at what Win32Forth does, and it makes no assumption when opening a
file
with the R/W "fam".  The assumptions is encoded into WRITE-FILE and WRITE-LINE.

WRITE-LINE assumes that you are writing ascii text, and WRITE-FILE assumes
you are writing binary data. To be more specific, BIN doesn't do anything in
Win32Forth.

Quote:

> The disadvantages are that it becomes harder to write programs that
> use WRITE-FILE and READ-FILE to process native text files under both
> Unix and DOS/Windows; i.e., the goal of platform-independence and
> portability suffers.  However, I don't know many programs that use
> WRITE-FILE for text files (and the programs I know that use READ-FILE
> treat all CR as white space).  However, when Gforth becomes available
> on the Mac (should happen with MacOS X), this would really cause
> problems (hmm, CR<->LF translation for the Mac, no translation for
> DOS/Windows?)

I agree, it makes things even messier.  The real problem, is that though the
standard
defines "bin", as requiring binary "file access method", it doesn't also define

a mechanism for requiring an ASCII file access method, or what the default
behavior must be  for individual file access words.  So, while the assumption
that file transfers not specifying BIN, must therefore be ASCII, is valid, the
standard
does not state that anywhere, as least as far as I can find.

I suppose that the only real way to write portable code would be to always
include the "bin" modifier, and then assume all data transfers were binary.

 I can see why an implementer might choose to default file operations to
ASCII instead of binary.  I can only say, that my choice to default to binary
has
not caused problems of this sort.  Primarily I believe because I, and probably
all of the Win32Forth users, don't use READ-FILE and WRITE-FILE for
ASCII operations requiring line end translation.

Quote:

> What do you (clf readership) think about these issues?

> - anton
> --
> M. Anton Ertl                    Some things have to be seen to be believed

> http://www.complang.tuwien.ac.at/anton/home.html

Just my thoughts,

Tom Zimmer



Tue, 23 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:

>As far as the standard is concerned, what you described is one
>possible correct behaviour, because the standard does not specify what
>happens when you WRITE-FILE characters outside the range 32-126
>(3.1.2.2).  It has also been the intended behaviour in Gforth for DOS
>and Windows (there is no translation under Unix).
>However, I am currently contemplating changing this behaviour to
>address some problems in connection with FILE-SIZE, FILE-POSITION and
>REPOSITION-FILE; the change would work like this: all files would be
>opened in binary mode, binary mode on opening would have no effect.
>WRITE-LINE would always write CRLF on DOS and Windows; READ-LINE would
>probably ignore CRs (not clear about that yet).  READ-FILE and
>WRITE-FILE would not perform any translation.
>What do you (clf readership) think about these issues?

READ-LINE should give you a line with no CR or LF on all systems for all
files.  So regardless which of the 4 possible combinations of CR and LF get
used your users should be able to read files.

You should give your users some way to choose what combination of CR LF
WRITE-LINE writes.  Let the default for new files be whatever the current OS
uses.  If a file already has some lines with one version, let that be the
default.  If you can find some way to let your users easily find out how to
change that, without bothering them with that information when they aren't
interested, that would be truly excellent.



Tue, 23 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE
On 4 Feb 2000 09:48:07 GMT,

Quote:


>>.       I am evaluating GFORTH v0.4.0.

>>        I know that gForth was developed under Linux.

>>        It seems that I am unable to write a Unix-compatible text
>>        file using the DOS version of gForth:

>>        Newline character 0x0A is translated to a 2-character
>>        CR/LF sequence (0x0D,0x0A)  no matter how it is written.

[snip]

Quote:
>However, I am currently contemplating changing this behaviour to
>address some problems in connection with FILE-SIZE, FILE-POSITION and
>REPOSITION-FILE; the change would work like this: all files would be
>opened in binary mode, binary mode on opening would have no effect.
>WRITE-LINE would always write CRLF on DOS and Windows; READ-LINE would
>probably ignore CRs (not clear about that yet).  READ-FILE and
>WRITE-FILE would not perform any translation.

Just defer the terminator. Then the user can set it to anything,
0D0A , 0A , FFFF , nothing , etc., as needed.

--

WasteLand  http://www.dhc.net/~tzegub
|_|_|_|_  
  | | | |  http://www.dhc.net/~tzegub/fop.htm



Tue, 23 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE
For any ANS Forth system, for any ANS Forth program,
there is an environmental dependency on how portability
is understood.

I am used to the fact that to port a text from Unix to DOS,
I need to recode it ( CR --> CR,LF ).

Which program do you call portable:

a) the one that takes the Unix text and works correctly under DOS/Win

b) the one that takes the equivalent DOS/Win text and works correctly
under DOS/Win

?

In the first case, we mean (in the prefix notation) that

(Forth-Unix Source-Unix) = (Forth-Win Source-Unix)

In the second case, we mean that

(Forth-Unix Source-Unix) = (To-Unix (Forth-Win (To-Win Source-Unix)))

IMO, you can have only one of these.

Regards, Michael

Quote:



> >.       I am evaluating GFORTH v0.4.0.

> >        I know that gForth was developed under Linux.

> >        It seems that I am unable to write a Unix-compatible text
> >        file using the DOS version of gForth:

> >        Newline character 0x0A is translated to a 2-character
> >        CR/LF sequence (0x0D,0x0A)  no matter how it is written.

[...]
> >        Is this correct behavior for WRITE-FILE, and what would
> >        you recommend the workaround to be?

> The workaround is to open the file in binary mode, i.e.,

> S" file" r/w bin open-file

> As far as the standard is concerned, what you described is one
> possible correct behaviour, because the standard does not specify what
> happens when you WRITE-FILE characters outside the range 32-126
> (3.1.2.2).  It has also been the intended behaviour in Gforth for DOS
> and Windows (there is no translation under Unix).

> However, I am currently contemplating changing this behaviour to
> address some problems in connection with FILE-SIZE, FILE-POSITION and
> REPOSITION-FILE; the change would work like this: all files would be
> opened in binary mode, binary mode on opening would have no effect.
> WRITE-LINE would always write CRLF on DOS and Windows; READ-LINE would
> probably ignore CRs (not clear about that yet).  READ-FILE and
> WRITE-FILE would not perform any translation.

> The disadvantages are that it becomes harder to write programs that
> use WRITE-FILE and READ-FILE to process native text files under both
> Unix and DOS/Windows; i.e., the goal of platform-independence and
> portability suffers.  However, I don't know many programs that use
> WRITE-FILE for text files (and the programs I know that use READ-FILE
> treat all CR as white space).  However, when Gforth becomes available
> on the Mac (should happen with MacOS X), this would really cause
> problems (hmm, CR<->LF translation for the Mac, no translation for
> DOS/Windows?)

> What do you (clf readership) think about these issues?

> - anton
> --
> M. Anton Ertl                    Some things have to be seen to be believed

> http://www.complang.tuwien.ac.at/anton/home.html

--

To avoid misinterpretation of the Standard, do not write
standard programs, write Standard Systems.
:-)



Wed, 24 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:

> WRITE-LINE assumes that you are writing ascii text, and WRITE-FILE
> assumes you are writing binary data. To be more specific, BIN doesn't
> do anything in Win32Forth.

Well then, perhaps it should..  is BIN an irreversible process?  Doesn't
it imply that binary as "fam" should happen only when BIN is executed?

--
Douglas Beattie Jr.



Wed, 24 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:


> > WRITE-LINE assumes that you are writing ascii text, and WRITE-FILE
> > assumes you are writing binary data. To be more specific, BIN doesn't
> > do anything in Win32Forth.

> Well then, perhaps it should..  is BIN an irreversible process?  Doesn't
> it imply that binary as "fam" should happen only when BIN is executed?

The typical use of BIN is

S" filename.ext" R/O BIN OPEN-FILE ...

BIN ( fam1 -- fam2 )
is used only on values returned by constants R/O W/O R/W .
NB: fam is not fileid.

: UNBIN ( fam2 -- fam1 )
        CASE
        R/O BIN OF R/O ENDOF
        R/W BIN OF R/W ENDOF
        W/O BIN OF W/O ENDOF
        DUP                     \ ENDCASE performs DROP
        ENDCASE
;

and then what?

Maybe, you wanted to re-open a file?

Quote:
> --
> Douglas Beattie Jr.

I myself understand the meaning of BIN as:

if the file is opened for char i/o, tabs (ASCII 9's) are allowed
to be expanded to spaces;
if the file is opened for bin i/o, tabs ,etc. must be left as is.

OTOH, the fact itself that a file is opened for char i/o, doesn't
mean that tabs will be processed in any peculiar manner,
so you cannot reckon on tab expansion.
But you cannot reckon on tab non-expansion, either.

(And there still are 29 other characher whose codes are < 32 !)

Regards, Michael



Wed, 24 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:
> I am evaluating GForth....

AFAIK in every good TCP/IP-Package there is the possibility of copying
files in binary mode or copying in text mode, which automatically
converts the newline character of Unix systems to the cr-lf-sequence of
MS-DOS and vice versa.

You will get a similar problem when you copy binary files between
little-endian and big-endian-machines. Therefore the specs for RPC, the
remote procedure calls, include the specification, that integers have to
be sent big endian, and a standard function exists, which converts from
machine format to the network encoding and back again (which on a
big-endian-system obviously does nothing). I do not remember the specs
for floating point numbers, but the problem was already known when
inventing network software and it was assumed, that the network software
has too do some translation.

This is the _general_ problem of sharing data between systems with
different internal representation for numbers or characters or whatever
we like to encode in bytes and you need standards "on the net",
like ASCII is for for characters, to solve this problem and you have to
convert your internal representation to the network representation when
you copy files to different systems as long as you dont use the network
representation internally.

Best regards

kseege



Wed, 24 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:


> > WRITE-LINE assumes that you are writing ascii text, and WRITE-FILE
> > assumes you are writing binary data. To be more specific, BIN doesn't
> > do anything in Win32Forth.

> Well then, perhaps it should..  is BIN an irreversible process?  Doesn't
> it imply that binary as "fam" should happen only when BIN is executed?

Hmm...,  I don't find anything in the standard that specifies that anything
other than
binary mode has to be supported, or is even an environmental dependency.

If there is a word for selecting binary, shouldn't it say something about what
the
mode is when binary is not used?  I don't find it. So..  Win32Forth only
supports
binary operations for WRITE-FILE and READ-FILE, whereas WRITE-LINE
and READ-LINE only support ascii, which is what most people would want to do
anyway.

I don't expect anyone to agree with me on this however, I realize my perspective
is
pretty (well, ok, very) narrow.

Just my thoughts,

Tom Zimmer



Wed, 24 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE


Quote:
>WRITE-LINE assumes that you are writing ascii text, and WRITE-FILE assumes
>you are writing binary data. To be more specific, BIN doesn't do anything in
>Win32Forth.

So Win32Forth already does what I am contemplating for Gforth.
Another reason for changing Gforth in this direction.

Apart from ignoring BIN, how does WRITE-LINE assume text and
WRITE-FILE assume binary data?

Quote:
>I agree, it makes things even messier.  The real problem, is that though the
>standard
>defines "bin", as requiring binary "file access method", it doesn't also define

>a mechanism for requiring an ASCII file access method, or what the default
>behavior must be  for individual file access words.  So, while the assumption
>that file transfers not specifying BIN, must therefore be ASCII, is valid, the
>standard
>does not state that anywhere, as least as far as I can find.

It doesn't state much anyway.  I can imagine standard systems that
translate between UTF-16 (internal) and UTF-8 (external) or between
ASCII (internal) and EBCDIC (external) without BIN.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed

http://www.complang.tuwien.ac.at/anton/home.html



Thu, 25 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE


Quote:
>READ-LINE should give you a line with no CR or LF on all systems for all
>files.  So regardless which of the 4 possible combinations of CR and LF get
>used your users should be able to read files.

It's not so simple.  Consider

abcd<CR>ef<CR><LF>g<LF>h<LF><CR>i

Where should the line ends be on Unix, DOS/Windows, MacOS?

E.g., if we interpret both CR and LF as newlines, we get

abcd
ef

g
h

i

OTOH, if we view only LF as newline (Unixoid), we get

abcd<CR>ef<CR>
g
h
<CR>i

If we view only the CRLF combo as newline (DOSoid), we get

abcd<CR>ef
g<LF>h<LF><CR>i

If we view LF as newline but suppress CR (DOS/Unix hybrid 1), we get

abcdef
g
h
i

Other variants: Suppress CR only before LF (DOS/Unix hybrid 2), use CR
als newline (MacOS), view CR and LF as newline, but CRLF only as one
newline (DOS/Unix/MacOS hybrid?).

Can someone explain how this is done in C on MacOS; Is '\n' LF or CR
(and what is '\r').  Is there a difference between opening a file as
text or binary file (is there translation)?

Quote:
>You should give your users some way to choose what combination of CR LF
>WRITE-LINE writes.  Let the default for new files be whatever the current OS
>uses.  If a file already has some lines with one version, let that be the
>default.  If you can find some way to let your users easily find out how to
>change that, without bothering them with that information when they aren't
>interested, that would be truly excellent.

Too complex, for little gain IMO.  If they want to control it, they
can use WRITE-FILE (with BIN), or can postprocess the file with a
converter.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed

http://www.complang.tuwien.ac.at/anton/home.html



Thu, 25 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:

>For any ANS Forth system, for any ANS Forth program,
>there is an environmental dependency on how portability
>is understood.

>I am used to the fact that to port a text from Unix to DOS,
>I need to recode it ( CR --> CR,LF ).

>Which program do you call portable:

>a) the one that takes the Unix text and works correctly under DOS/Win

>b) the one that takes the equivalent DOS/Win text and works correctly
>under DOS/Win

>?

b (and of course on Unix it should understand and produce Unix-style
text files).

Interoperability with other programs on the same system is a must.

I think the users would hate it if I wrote, say, a compiler in Forth,
and the users could not prepare the source files with their favourite
text editor, but would have to convert the file into Unix format
first.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed

http://www.complang.tuwien.ac.at/anton/home.html



Thu, 25 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE

Quote:


>>READ-LINE should give you a line with no CR or LF on all systems for all
>>files.  So regardless which of the 4 possible combinations of CR and LF get
>>used your users should be able to read files.
>It's not so simple.  Consider
>abcd<CR>ef<CR><LF>g<LF>h<LF><CR>i

Would you expect to have a file that does this?  Certainly not in
DOS/Windows, there you'd expect to always have both together.  If only LF is
the newline, what are those <CR>'s doing there?  IF only CR is the newline
why are the <LF>'s there?  I had the idea that you could read any of the
three main variants (CRLF CR LF) unambiguously, by:

1.  Read a line up to the next CR or LF.  Put the delimiter in one of the
two spaces after the buffer.
2.  If the next character is the *other* delimiter, put it in the second
space after the buffer.

But if you have weird combinations in the same file, that won't work.

If the systems that use only one of CR and LF as a delimiter, also result in
files that have the other of CR and LF scattered through the text like
raisins in a pudding, then all bets are off and there's no consistent way
you can import text from other systems.

Quote:
>Can someone explain how this is done in C on MacOS; Is '\n' LF or CR
>(and what is '\r').  Is there a difference between opening a file as
>text or binary file (is there translation)?
>>You should give your users some way to choose what combination of CR LF
>>WRITE-LINE writes.  Let the default for new files be whatever the current OS
>>uses.  If a file already has some lines with one version, let that be the
>>default.  If you can find some way to let your users easily find out how to
>>change that, without bothering them with that information when they aren't
>>interested, that would be truly excellent.
>Too complex, for little gain IMO.  If they want to control it, they
>can use WRITE-FILE (with BIN), or can postprocess the file with a
>converter.

Your choice.  The standard clearly says this is implememtation-defined.


Thu, 25 Jul 2002 03:00:00 GMT  
 Behavior of newline character in ANS Forth, using WRITE-FILE
Anton says:

Quote:
>It's not so simple.  Consider
>abcd<CR>ef<CR><LF>g<LF>h<LF><CR>i
>Where should the line ends be on Unix, DOS/Windows, MacOS?

Also note that LF on a teletype actually does a line feed without a
carraige return.  And a <CR> without a <LF> will cause overstrikes.
Since overstrikes are very difficult to emulate now, one possible
interpretation of the above might be:

efcd
g
 h
i

Particularly note the white space preceeding the 'h'.

Bob



Thu, 25 Jul 2002 03:00:00 GMT  
 
 [ 53 post ]  Go to page: [1] [2] [3] [4]

 Relevant Pages 

1. How do I replace a file using Write Characters to file.vi

2. ANS Forth WRITE-FILE

3. Producing a newline character with the write predicate

4. ANS Forth in ANS Forth?

5. ANS Forth characters

6. ANS Forth characters

7. ANS Forth Uses

8. Blocks or Files?, was Re: Opinions about ANS FORTH

9. ANS Forth file name extensions--consensus?

10. ANS Forth WORD Files Available

11. NUMBER and block files and ANS Forth

12. Buggy shebang behavior with different newline style

 

 
Powered by phpBB® Forum Software