Rooting out C Comments 
Author Message
 Rooting out C Comments

Hello.  I am creating a script that roots out comments in C files.  Meaning,
all the text that is in between /* and */ should be substituted with null.
So far, this is what I have:

cat $1 | gawk '{
    BEGIN { RS = "/*" }
    gsub( /'/*'*'*/'/, "", $0 )'

Quote:
}

I think the regular expressions are incorrect.  But amongst all the
documentation I can find, I cannot find anything that deals with special
characters like *.  Can anyone give me some tips?  Much appreciated.

-Eugene



Tue, 30 Jul 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:

> Hello.  I am creating a script that roots out comments in C files.  Meaning,
> all the text that is in between /* and */ should be substituted with null.
> So far, this is what I have:

> cat $1 | gawk '{
>     BEGIN { RS = "/*" }
>     gsub( /'/*'*'*/'/, "", $0 )'
> }

> I think the regular expressions are incorrect.  But amongst all the
> documentation I can find, I cannot find anything that deals with special
> characters like *.  Can anyone give me some tips?  Much appreciated.

> -Eugene

from mawk-1.3.3/man/mawk.doc:

     The following program replaces  each  comment  by  a  single
     space in a C program file,

          BEGIN {
            RS = "/\*([^*]|\*+[^/*])*\*+/"
               # comment is record separator
            ORS = " "
            getline  hold
            }

            { print hold ; hold = $0 }

            END { printf "%s" , hold }

     Buffering one record is needed to avoid terminating the last
     record with a space.

Eiso.



Tue, 30 Jul 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:
> from mawk-1.3.3/man/mawk.doc:

>      The following program replaces  each  comment  by  a  single
>      space in a C program file,

>           BEGIN {
>             RS = "/\*([^*]|\*+[^/*])*\*+/"
>                # comment is record separator
>             ORS = " "
>             getline  hold
>             }

>             { print hold ; hold = $0 }

>             END { printf "%s" , hold }

>      Buffering one record is needed to avoid terminating the last
>      record with a space.

Hi again,

I'm sorry, I am a complete newbie in terms of using awk.  I tried using
mawk, but I do not have it installed on the Unix system where I do my work.
I used the above code with awk, resulting in a file named 'test':

#!/usr/local/bin/bash
cat $1 | awk '{
    BEGIN {
    RS = "/\*([^*]|\*+[^/*])*\*+/"
    ORS = ""
    getline hold
    }
    { print hold; hold = $0 }
    END { printf "%s", hold }

Quote:
}'

Then, I ran the file with this command line: test main.c
where main.c is a sample C file with comments.

Unfortunately, nothing is outputted to the screen!  What's going on? :-(

Here is some food for thought.  Why would the author of the GNU gawk manual
use an example of rooting out C comments as complicated as this?

http://www.gnu.org/manual/gawk-3.0.3/html_mono/gawk.html#SEC48

I tried out that code, to no avail.  I keep getting errors:

awk: can't set $0
 record number 1

If anyone could help me out, I'd really, really appreciate it.

Also, I would like to learn more about awk.  Does anyone have any reading
material to recommend?

Thanks again,

 Please remove "remove" from e-mail address to reply directly.



Wed, 31 Jul 2002 03:00:00 GMT  
 Rooting out C Comments


   >Then, I ran the file with this command line: test main.c
   >where main.c is a sample C file with comments.
   >Unfortunately, nothing is outputted to the screen!  What's going
   >on? :-(
Since Unix has a "test" command, you might want to pick another name for
your program, or try ./test main.c

Net-Tamer V 1.08X - Test Drive



Wed, 31 Jul 2002 03:00:00 GMT  
 Rooting out C Comments
First, a minor usage issue. Instead of:
  cat $1 | awk 'blah'
you should use:
  awk 'blah' $1
There is no need for the cat.

The mawk solution offered looks elegant; I tested it and it works fine for
me.
It does not work with gawk or onetrueawk, however, and I am not
good enough with awk to offer a solution with either of those.
I list below a perl solution taken from Jeffrey Friedl's O'Reilly
book, "Mastering Regular Expressions", p.293.

First, this is the mawk script I used:

#!/bin/sh
mawk 'BEGIN {
        RS = "/\*([^*]|\*+[^/*])*\*+/"
        ORS = ""
        getline hold
      }
      { print hold; hold = $0 }
      END { printf "%s", hold }
' $1

Second, this is the file rmcom.pl:

# Perl script to remove C comments
# See Mastering Regular Expressions by Jeffrey Friedl, p. 293.
undef $/;              # slurp whole file mode
$_ = join("", <>);     # The join() can handle multiple files
s{
    (
       [^"'/]+
      |
       (?:"[^"\\]*(?:\\.[^"\\]*)*" [^"'/]*)+
      |
       (?:'[^'\\]*(?:\\.[^'\\]*)*' [^"'/]*)+
    )
   |
    / (?:
        \*[^*]*\*+(?:[^/*][^*]*\*+)*/
        |
        /[^\n]*
      )

Quote:
}{$1}gsx;

print;

Finally, this is the shell driver I used for rmcom.pl:

#!/bin/sh
perl rmcom.pl $1

Both versions ran very fast and both produced identical (correct) output on
my test data.

Andrew Savige.



Thu, 01 Aug 2002 03:00:00 GMT  
 Rooting out C Comments



% > from mawk-1.3.3/man/mawk.doc:
% >
% >      The following program replaces  each  comment  by  a  single
% >      space in a C program file,

[...]

% >             RS = "/\*([^*]|\*+[^/*])*\*+/"

[...]

% I'm sorry, I am a complete newbie in terms of using awk.  I tried using
% mawk, but I do not have it installed on the Unix system where I do my work.
% I used the above code with awk, resulting in a file named 'test':

Unfortunately, using a regular expression for the record separator is a non-
portable extension in mawk. It doesn't really solve your original problem,
anyway, since you're after the contents of the comments, and this gives
everything but.

To solve this problem, you really need to read the input character by
character, which isn't awk's normal mode of operation. This could be
more efficient in the way it pulls out the comment text, but ....

  #!/usr/bin/awk -f
  BEGIN { incomment = 0; inquote = 0; }

  incomment || /\/\*/ {
     for (i = 1; i <= length($0); i++) {
        s2 = substr($0, i, 2)
        s1 = substr($0, i, 1)

        # if in a comment, look for the end of the comment
        if (incomment) {
          if (s2 == "*/") {
             incomment = 0
             print cmt
             i++
          }
          else {
             cmt = cmt s1
          }
        }
        # if in a quote, skip over escaped quotes, and look for the end
        else if (inquote) {
           if (s2 == "\\\"") i++
           else if (s1 == "\"") inquote = 0
        }
        # otherwise, look for the comment start
        else {
          if (s2 == "/*") {
             incomment = 1
             i++
             cmt = ""
          }
        }
     }
  }
--

Patrick TJ McPhee
East York  Canada



Thu, 01 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:
> I read that as meaning he wants to delete the comments; he is not after
> their contents.
> Eugene, can you please clarify?

Hi.  Thanks everyone, I am very grateful for your assistance.  Yes,
basically, I would like to remove the comments entirely.  I am restricted
only to an awk-based solution, so the perl example which I saw in an earlier
message is not feasible. :(

I was wondering what the major differences are between all these awk
variants.  Does each one provide optimizations for specific tasks?  I
thought the variants were just platform specific.  Why would some examples
work with some variants, and not others?

I also renamed my previous example from test to temp, since there could have
been a mixup with the Unix's test command.  Instead of receiving nothing
outputted, now I get:

awk: syntax error near line 2
awk: illegal statement near line 2
awk: syntax error near line 9
awk: bailing out near line 9

this from using:

awk '{
    BEGIN {
    RS = "/\*([^*]|\*+[^/*])*\*+/"
    ORS = ""
    getline hold
    }
    { print hold ; hold = $0 }
    END { printf "%s", hold }

Quote:
}' $1

Looking at this, I cannot figure out anything wrong in terms of syntax.
Andrew, you refer to Friedl's solution, but who is he, and what solution are
you talking about?

Regards,



Thu, 01 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:


> Looking at this, I cannot figure out anything wrong in terms of syntax.
> Andrew, you refer to Friedl's solution, but who is he, and what solution
are
> you talking about?

Oh, you are referring to the perl solution.  I see.  Alas, the professor has
refused to allow perl to be used, although it would make life so much
easier. <sigh>

-Eugene



Thu, 01 Aug 2002 03:00:00 GMT  
 Rooting out C Comments



Quote:

> It doesn't really solve your original problem,
> anyway, since you're after the contents of the comments, and this gives
> everything but.

Yet in the original post, Eugene said:
Quote:
> Hello.  I am creating a script that roots out comments in C files.
Meaning,
> all the text that is in between /* and */ should be substituted with null.

I read that as meaning he wants to delete the comments; he is not after
their contents.
Eugene, can you please clarify?

The semantics of this problem are not quite as simple as first thought
as illustrated by the following test file.

#include <stdio.h>
/* this is a c comment */
/* the quick brown fox
jumps
over the lazy dog. ***/
main()
{
   printf("%s\n", "/* this code is generated -- do not edit! */");
   return 0;

Quote:
}

The offered mawk and awk solutions strip out the comments inside the
double-quotes in the printf() statement! Jeffrey Friedl's perl solution does
not.
Friedl's solution is the correct one since rooting out comments
should not change program behaviour.

Andrew Savige



Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments


Quote:
> I was wondering what the major differences are between all these awk
> variants.  Does each one provide optimizations for specific tasks?  I
> thought the variants were just platform specific.  Why would some examples
> work with some variants, and not others?

There are at least: oawk, nawk, onetrueawk, gawk, mawk, MKS awk, tawk.
These variants come from different C source code bases so there will
inevitably
be differences in their behaviour even if they are trying to be the same:-).
Sometimes, however, the author implements a pet feature that he/she thinks
cool.
The feature may be nice, but if you use it, your script will not run with
the other awks.
There is also a POSIX standard for awk but I don't know much about it.
The original awk reference is "The Awk Programming Language"
by Aho, Weinberger, Kernighan. If you write only what is described
in that book you can be fairly confident that your script will run
everywhere.
Quote:

> awk '{
>     BEGIN {
>     RS = "/\*([^*]|\*+[^/*])*\*+/"
>     ORS = ""
>     getline hold
>     }
>     { print hold ; hold = $0 }
>     END { printf "%s", hold }
> }' $1

> Looking at this, I cannot figure out anything wrong in terms of syntax.

To get a clean compile, simply delete the leading { and the trailing }.
The script will then work with mawk; it will compile ok with the others
but will not do what you want.

Andrew Savige



Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:



>> Looking at this, I cannot figure out anything wrong in terms of
>>syntax.  Andrew, you refer to Friedl's solution, but who is he, and
>>what solution are you talking about?

>Oh, you are referring to the perl solution.  I see.  Alas, the professor has
>refused to allow perl to be used, although it would make life so much
>easier. <sigh>

For roughly the same reasons that contributions in French or Greek are not
acceptable in an English Composition class.


Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

...

Quote:
>There are at least: oawk, nawk, onetrueawk, gawk, mawk, MKS awk, tawk.
>These variants come from different C source code bases so there will
>inevitably be differences in their behaviour even if they are trying
>to be the same:-).  Sometimes, however, the author implements a pet
>feature that he/she thinks cool.  The feature may be nice, but if you
>use it, your script will not run with the other awks.  There is also a
>POSIX standard for awk but I don't know much about it.  The original
>awk reference is "The Awk Programming Language" by Aho, Weinberger,
>Kernighan. If you write only what is described in that book you can be
>fairly confident that your script will run everywhere.

Is that book compatible with Sun's "awk" ?  I thought that described
"new AWK" features.  Remember that a significant portion of the traffic in
this group is caused by Sun's "awk".

My point is that if we take your "always do things the old, compatible, way"
philosophy to heart, we'd all write only for the original AWK (aka, oawk).

Which is obviously stoopid...

Quote:
>>     BEGIN {
>>     RS = "/\*([^*]|\*+[^/*])*\*+/"
>>     ORS = ""
>>     getline hold
>>     }
>>     { print hold ; hold = $0 }
>>     END { printf "%s", hold }

Note: Since ORS is "", the END can be simplfiied to: END { print hold }
The printf is unnecessary.

Quote:
>> Looking at this, I cannot figure out anything wrong in terms of syntax.
>To get a clean compile, simply delete the leading { and the trailing }.
>The script will then work with mawk; it will compile ok with the others
>but will not do what you want.

Verified (that it works only under mawk).  Please remind me what the
specific mawk feature is upon which this depends.


Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:



>...
>>There are at least: oawk, nawk, onetrueawk, gawk, mawk, MKS awk, tawk.
>>These variants come from different C source code bases so there will
>>inevitably be differences in their behaviour even if they are trying
>>to be the same:-).  Sometimes, however, the author implements a pet
>>feature that he/she thinks cool.  The feature may be nice, but if you
>>use it, your script will not run with the other awks.  There is also a
>>POSIX standard for awk but I don't know much about it.  The original
>>awk reference is "The Awk Programming Language" by Aho, Weinberger,
>>Kernighan. If you write only what is described in that book you can be
>>fairly confident that your script will run everywhere.

>Is that book compatible with Sun's "awk" ?  I thought that described
>"new AWK" features.  Remember that a significant portion of the traffic in
>this group is caused by Sun's "awk".

>My point is that if we take your "always do things the old, compatible, way"
>philosophy to heart, we'd all write only for the original AWK (aka, oawk).

>Which is obviously stoopid...

>>>     BEGIN {
>>>     RS = "/\*([^*]|\*+[^/*])*\*+/"
>>>     ORS = ""
>>>     getline hold
>>>     }
>>>     { print hold ; hold = $0 }
>>>     END { printf "%s", hold }

>Note: Since ORS is "", the END can be simplfiied to: END { print hold }
>The printf is unnecessary.

>>> Looking at this, I cannot figure out anything wrong in terms of syntax.
>>To get a clean compile, simply delete the leading { and the trailing }.
>>The script will then work with mawk; it will compile ok with the others
>>but will not do what you want.

>Verified (that it works only under mawk).  Please remind me what the
>specific mawk feature is upon which this depends.

Someone suggested that mawk supports RE for RS whereas other versions
of awk do not.  I didn't check.  

I posted a simple script using sed that removed C type comments
(and blank lines) a few days ago.

Basicly, it put /* and */ found in the text on lines all by themselves,
and then processed the resulting file.

It won't do the comment inside a quoted string properly, but it does
do the other "normal" type stuff right.

Doesn't the preprocessor have a capability to strain out comments?
Seems like it should, and the output should be available somehow.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

...

Quote:
>>Verified (that it works only under mawk).  Please remind me what the
>>specific mawk feature is upon which this depends.

>Someone suggested that mawk supports RE for RS whereas other versions
>of awk do not.  I didn't check.

Hmmm..  According to the man page, it (gawk) does - and the following seems
to work:

        gawk 1 'RS=[ ,.]'

So, I still want to know why the mawk solutions works only in mawk.

Quote:
>I posted a simple script using sed that removed C type comments
>(and blank lines) a few days ago.

>Basicly, it put /* and */ found in the text on lines all by themselves,
>and then processed the resulting file.

>It won't do the comment inside a quoted string properly, but it does
>do the other "normal" type stuff right.

>Doesn't the preprocessor have a capability to strain out comments?
>Seems like it should, and the output should be available somehow.

It does, but it inserts a lot of other crud (which could probably be
controlled by various options - I investigated some, but then got bored) and
is just generally overkill for this task.

I think the original poster was looking specifically for an AWK solution -
he even stated that it was homework, and "his prof wanted an AWK solution".
So, Perlies and Seders (and CPP'ers) need not apply...



Fri, 02 Aug 2002 03:00:00 GMT  
 Rooting out C Comments

Quote:

> >Someone suggested that mawk supports RE for RS whereas other versions
> >of awk do not.  I didn't check.
gawk does.

> Hmmm..  According to the man page, it (gawk) does - and the following seems
> to work:

>         gawk 1 'RS=[ ,.]'

> So, I still want to know why the mawk solutions works only in mawk.

apparently gawk treats backslashes differently from mawk.

RS = "/\\*([^*]|\\*+[^/*])*\\*+/"

gives the same result for both.

__________________________________________________________________

           o                     Eiso AB

                 o               Dept. of Biochemistry
                                 University of Groningen                
                                 The Netherlands                      
                  o  
            . .    
         o   ^                  
         |   -   _              
          \__|__/                
             |                  
             |
            / \
           /   \
           |   |
________ ._|   |_. ________________________________________________



Fri, 02 Aug 2002 03:00:00 GMT  
 
 [ 17 post ]  Go to page: [1] [2]

 Relevant Pages 

1. to CS: or not to CS: in F-PC assembler

2. Print outs of Richtext Streams

3. Macro and struct call-outs from Eiffel

4. How to use INS OUTS in pentium CPU??

5. Q:OUTS* instruction, which registers are used?

6. How to use INS OUTS (INSB INSW...)

7. Q: OUTS* instruction, which registers are used?

8. Q: OUTS* instruction, which registers are used?

9. Sockets and time-outs

10. Avoiding long time-outs when reusing sockets in SocketServer

11. Counting commented lines (full-line comments only)

12. comments on comments

 

 
Powered by phpBB® Forum Software