Remove trailing newlines from file 
Author Message
 Remove trailing newlines from file

I'm new to awk and looking for some clarification of my understanding
of a bit of code.  I wanted a solution to remove the trailing newlines
from a file, so I hopped over to DejaGoo and ran a search.  I found a
post from Ian Stirling which suggested:

gawk 'BEGIN{RS="\n+";ORS=""}x{print x}x=RT' infile

This works, but I'm having a bit of a time understanding it.  The
BEGIN section sets the record separator to one or more newlines and
the output record separator to nothing.  According to "sed & awk", RT
is the record terminator.  How does this differ from RS?

Here's my interpretation of the body:

x { print x }

This declares a variable x and doesn't assign an initial value.  It
then prints whatever x contains.

x = RT

This assigns the value of whatever RT contains to x.

If I had to psuedo-code this, I think it would look like:

        declare variable x with no initial value
        for each input line
                print x on the output if it contains a value
                print the current record
                assign the value of the record terminator to x

Is this correct?  If so, then it looks like the trailing newlines
don't get printed because x is never printed after the last line of
input.

--
Charles Calvert             |  Software Design/Development
Celtic Wolf, Inc.           |  Project Management
http://www.*-*-*.com/ ;|  Technical Writing
(703) 580-0210              |  Research



Tue, 03 Feb 2004 03:35:01 GMT  
 Remove trailing newlines from file

Quote:

>This works, but I'm having a bit of a time understanding it.  The
>BEGIN section sets the record separator to one or more newlines and
>the output record separator to nothing.  According to "sed & awk", RT
>is the record terminator.  How does this differ from RS?

I'm guessing here, but if RS can be a regular expression, perhaps RT
is the character(s) which matched the expression?

Quote:
>Here's my interpretation of the body:

>x { print x }

>This declares a variable x and doesn't assign an initial value.  It
>then prints whatever x contains.

>x = RT

>This assigns the value of whatever RT contains to x.

>If I had to psuedo-code this, I think it would look like:

> declare variable x with no initial value
> for each input line
> print x on the output if it contains a value
> print the current record
> assign the value of the record terminator to x

>Is this correct?  If so, then it looks like the trailing newlines
>don't get printed because x is never printed after the last line of
>input.

Normally a "print" command prints what you tell it to (or $0 as default)
and appends the value of ORS, but in this script ORS was set to "".
 "x{print x}" prints a newline, except on the first line of input.
 "x=RT" sets the variable x, and since it is a pattern with no {action},
it performs the default action {print}.


Tue, 03 Feb 2004 10:11:41 GMT  
 Remove trailing newlines from file


% from a file, so I hopped over to DejaGoo and ran a search.  I found a
% post from Ian Stirling which suggested:
%
% gawk 'BEGIN{RS="\n+";ORS=""}x{print x}x=RT' infile
%
% This works, but I'm having a bit of a time understanding it.  The
% BEGIN section sets the record separator to one or more newlines and
% the output record separator to nothing.  According to "sed & awk", RT
% is the record terminator.  How does this differ from RS?

RT (which is gawk-specific) is the actual value that was matched by
the RS regular expression (RS officially is not a regular expression --
if it's more than one character than the processing is undefined,
although several implementations do treat it as an RS). So, if RS
is matched by a dozen new-lines, RT would be that dozen new-lines.

% x { print x }
%
% This declares a variable x and doesn't assign an initial value.  It
% then prints whatever x contains.

awk scripts consist of patterns which are evaluated and actions
which are executed if the patterns evaluate to non-zero numeric
values or non-zero-length string values. In this case, x is
evaluated, and if it's non-zero-length, it's printed.

% x = RT
%
% This assigns the value of whatever RT contains to x.

Yes, and then it's evaluated. If RT is a non-zero number or
a non-zero-length string, then a default action is performed.
The default action is to print the current record $0.

% If I had to psuedo-code this, I think it would look like:
%
%       declare variable x with no initial value
%       for each input line
%               print x on the output if it contains a value
%               print the current record
%               assign the value of the record terminator to x

No. It would be more like:

       assign an RE to RS so that it matches strings of new-lines
       assign the empty string to ORS
       for each input line
          if the previous record terminator was not zero length, print it
          if there's a current record terminator, print the record

not that this will fail if the input file doesn't end with a new-line.
--

Patrick TJ McPhee
East York  Canada



Tue, 03 Feb 2004 11:10:54 GMT  
 Remove trailing newlines from file


Quote:


>% from a file, so I hopped over to DejaGoo and ran a search.  I found a
>% post from Ian Stirling which suggested:
>%
>% gawk 'BEGIN{RS="\n+";ORS=""}x{print x}x=RT' infile
>%
>% This works, but I'm having a bit of a time understanding it.  The
>% BEGIN section sets the record separator to one or more newlines and
>% the output record separator to nothing.  According to "sed & awk", RT
>% is the record terminator.  How does this differ from RS?

>RT (which is gawk-specific) is the actual value that was matched by
>the RS regular expression (RS officially is not a regular expression --
>if it's more than one character than the processing is undefined,
>although several implementations do treat it as an RS). So, if RS
>is matched by a dozen new-lines, RT would be that dozen new-lines.

Thanks.  That's what I thought, but it's good to have it confirmed.

[snip]

Quote:
>% If I had to psuedo-code this, I think it would look like:
>%
>%   declare variable x with no initial value
>%   for each input line
>%           print x on the output if it contains a value
>%           print the current record
>%           assign the value of the record terminator to x

>No. It would be more like:

>       assign an RE to RS so that it matches strings of new-lines
>       assign the empty string to ORS
>       for each input line
>          if the previous record terminator was not zero length, print it
>          if there's a current record terminator, print the record

So essentially it's a trick to avoid printing the final record
terminator.  Interesting.

Quote:
>not that this will fail if the input file doesn't end with a new-line.

I'll look at that later.

Thanks for your help.

--
Charles Calvert             |  Software Design/Development
Celtic Wolf, Inc.           |  Project Management
http://www.celticwolf.com/  |  Technical Writing
(703) 580-0210              |  Research



Tue, 03 Feb 2004 21:48:11 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Modifying existing script to remove trailing newlines

2. How to remove newlines in text file

3. $canvas find enclosed does not find text with trailing newline

4. trailing newline in string.splitfields()

5. removing trailing spaces from string variables

6. Removing trailing spaces from a field using awk

7. Removing trailing blanks

8. how to remove trailing blanks of a string?

9. Elisp--Remove a trailing NULL char?

10. Removing final newline from a Tcl string using C API

11. PEP 259: Omit printing newline after newline

12. PEP 259: Omit printing newline after newline

 

 
Powered by phpBB® Forum Software