AWK script to count LOC (lines of code) 
Author Message
 AWK script to count LOC (lines of code)

I was about to write my own AWK script to do a simple counting of LOC for
C/C++/Java code (ignoring comments, blank lines etc) when it occurred to me
that this has probably been done a thousand times before ...

Anyone have any good leads on such an awk (or even perl) script?

-- Mike



Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)
Probably, but you get to use someone else's specifications.
For example, how many lines of code is the following:

int something(
  char *or_other  //can't forget argument comments
  )
{
  first_one_thing(or_other);
  then_another(or_other);
  return(strlen(or_other));

Quote:
}

and then home many lines is this

int obfuscated(char *s) { return(fot(s), ta(s), strlen(s));

Quote:
}

My point is that 'lines of code' is ambiguous.

* Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful



Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)

Quote:
> I was about to write my own AWK script to do a simple counting of LOC for
> C/C++/Java code (ignoring comments, blank lines etc) when it occurred to me
> that this has probably been done a thousand times before ...

> Anyone have any good leads on such an awk (or even perl) script?

It seems that there are two basic elements to the problem. One is
swallowing comments, including multiline comments, but being careful
about string literals and stuff. The other is swallowing blank lines.
Is there anything else?

One approach (for C, at least) would be to mangle all the preprocessor
directives with an awk script and run cpp on the result - it will
replace the comments with whitespace, which is easy to handle, because
I don't think ANSI C allows multi-line string literals. If you are
going to use it on code that allows it (e.g. gcc-specific) it will be
a bit more involved.  The macros won't be expanded since #define's
won't be there. I have not tried it, but I don't think cpp will
complain about the screwed-up syntax.

Another approach that comes to mind is as follows. Look into Jeffrey
Friedl's "Mastering Regular Exressions" that devotes quite a bit of
space to the problem of matching C comments. IIRC, Friedl writes that
he came up with the "unrolling" technique while solving precisely the
problem of removing C comments. Now you can either search the net for
his name and see if his home page or another site has the tool (it's
likely to be in Perl), or you can email him, or you can bite the
bullet and write your own. Once you have swallowed the comments, the
rest is swallowing blank lines -- see above.

Generalization to C++ and Java is left as an exercise ;-).

--

"... We work by wit, and not by witchcraft;
 And wit depends on dilatory time." - W. Shakespeare.



Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


Quote:
>Probably, but you get to use someone else's specifications.

Not a problem.

Quote:
>My point is that 'lines of code' is ambiguous.

I know.  I just want something better than "wc -l", removing blank lines
and comments.  I am tempted to grep for semi-colons ...


Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


Quote:

>> I was about to write my own AWK script to do a simple counting of LOC for
>> C/C++/Java code (ignoring comments, blank lines etc) when it occurred to me
>> that this has probably been done a thousand times before ...

>> Anyone have any good leads on such an awk (or even perl) script?

>It seems that there are two basic elements to the problem. One is
>swallowing comments, including multiline comments, but being careful
>about string literals and stuff. The other is swallowing blank lines.
>Is there anything else?

Not for me.  Swallowing comments and blank lines would be enough.  
Ignoring string literals would be a bonus ...

I'd be happy enough measuring unpreprocessed code.

Surely someone has written a simple AWK script to do this ... I guess
I can spend a couple of hours doing it myself if this is truly an
earthshaterringly original idea completely unknown to the awk community
:-) :-) :-)



Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


Quote:


>>Probably, but you get to use someone else's specifications.

>Not a problem.

>>My point is that 'lines of code' is ambiguous.

>I know.  I just want something better than "wc -l", removing blank lines
>and comments.  I am tempted to grep for semi-colons ...

if you want to just eliminate lines that are blank or have nothing
before a "//" then this would work:

awk 'NF==0 || ($1 ~ /^[/][/]/) {next} {cnt++} END{print cnt}' infile

If you want to do C style comments using "/*" and "*/" it becomes
more complicated.  I like the idea of running it through the
preprocessor first in that case.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Sun, 30 Jun 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


%
% I was about to write my own AWK script to do a simple counting of LOC for
% C/C++/Java code (ignoring comments, blank lines etc) when it occurred to me
% that this has probably been done a thousand times before ...
%
% Anyone have any good leads on such an awk (or even perl) script?

I have one somewhere. I got it as an example with either mawk or C-awk
(that was an `old' awk by some fellow from BC, and I think it
later became the basis for gawk).

As I recall, it prints lines of code, lines of pre-processor directives, and
lines of comments, but I haven't run it in 8 years or so, so I don't
really recall. I'll see if I can dig it up and post it this week-end.

--

Patrick TJ McPhee
East York  Canada



Mon, 01 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)
Quote:



>%
>% I was about to write my own AWK script to do a simple counting of LOC for
>% C/C++/Java code (ignoring comments, blank lines etc) when it occurred to
me
>% that this has probably been done a thousand times before ...
>%
>% Anyone have any good leads on such an awk (or even perl) script?

>I have one somewhere. I got it as an example with either mawk or C-awk
>(that was an `old' awk by some fellow from BC, and I think it
>later became the basis for gawk).

>As I recall, it prints lines of code, lines of pre-processor directives,
and
>lines of comments, but I haven't run it in 8 years or so, so I don't
>really recall. I'll see if I can dig it up and post it this week-end.

There's a book that IIRC is called "A Book on C" or similar by Berry &
Meekings, published by ?Methuen? that has some appendices that use sed, awk
and C code to "pre-process" C source - stripping comments literal strings
and various other things.  It then outputs stats about the code including
statement, function and other counts plus a style metric.  I don't have
access to my copy at the moment, but I'll post what I can once I've found
it.  You may find the code is on the net - try searching on Amazon for a
start.

Peter
--


Opinions expressed are my own and not necessarily those
of my employer



Mon, 01 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)

Quote:

>I was about to write my own AWK script to do a simple counting of LOC for
>C/C++/Java code (ignoring comments, blank lines etc) when it occurred to me
>that this has probably been done a thousand times before ...

>Anyone have any good leads on such an awk (or even perl) script?

Just a remark: If I had to measure the complexity of code I
probably would count keywords as well as standard operators of
the given language (and possibly unique identifiers), not lines.

It would also be interesting to have a measure for the amount of
comment, e.g. expressed as 'nr of words in comments'.
Especially for awk and C code, which in most cases can hardly be
called 'selfexplaining', the amount of comment IMHO adds to the
quality of the code.
Many of my awk scripts have almost as many comment words as raw
code. But than again, I wouldn't call myself a 'real
programmer', so I'd better give myself a helping hand .....

Just my 2 ct,

--
  (  Kees Nuyt; Rotterdam; Netherlands

c[_] Disclaimer: Any opinions etc. are mine, not necessarily my employer's.



Mon, 01 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)

<snip>

Quote:
>It would also be interesting to have a measure for the
>amount of comment, e.g. expressed as 'nr of words in
>comments'. Especially for awk and C code, which in most
>cases can hardly be called 'selfexplaining', the amount of
>comment IMHO adds to the quality of the code.

<snip>

Depends on what the comments are. Comments for each & every
statement are almost always excessive. Just a single line
above a 100+ statement function definition is usually too
little. Also, comments like

++n # increment n

are at best a waste of storage. Kernighan & Plaugher's 'The
Elements of Programming Style' has very good advice on what
should and shouldn't be in comments (more comprehensive
coverage of comments than in Kernighan & Pike's latest
book). It's an old book with code examples in (pre 77)
fortran and PL/1, but it's filled with very good advice.

* Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful



Mon, 01 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)




% % Anyone have any good leads on such an awk (or even perl) script?
%
% I have one somewhere. I got it as an example with either mawk or C-awk
% (that was an `old' awk by some fellow from BC, and I think it
% later became the basis for gawk).

% As I recall, it prints lines of code, lines of pre-processor directives, and
% lines of comments, but I haven't run it in 8 years or so, so I don't
% really recall. I'll see if I can dig it up and post it this week-end.

Or even today. I had a bit of a booze-up, but I've returned home much earlier
than I ever thought possible. I must be getting old. Anyway, here are a few
awk scripts, first c_count:

# -8<----8<--- c_count.awk begins
# Date:  09-15-88  12:33
# From:  Dan Kozak

# count lines in a C program, not counting comments,
# blank lines or form feeds.  Does separate  count of
# preprocessor directives if a preprocessor directive
# is commented out, it does not count it.

{
 if (file == "") {
  file = FILENAME
 }
 if (file != FILENAME) {
  printf("Number of lines in %s is: %d\n",file,nl+ppd)
  printf("Number of preprocessor directives is: %d\n",ppd)
  printf("Number of lines excluding preprocessor directives is: %d\n\n",nl)
  file = FILENAME
  tnl += nl
  tppd += ppd
  nl = 0
  ppd = 0
 }

 if ($0 == "") { ; }
 else if ($1 ~ /^\/\*/ && $NF ~ /\*\/$/) { ; }
 else if ($0 ~ /\/\*/ && $0 !~ /\*\//) { in_comment = 1 }
 else if ($0 !~ /\/\*/ && $0 ~ /\*\//) { in_comment = 0 }
 else if (in_comment) { ; }
 else if ($1 ~ /^#/) { ppd++ }
 else { nl++ }

Quote:
}

END { printf("Number of lines in %s is: %d\n",file,nl+ppd)
      printf("Number of preprocessor directives is: %d\n",ppd)
      printf("Number of lines excluding preprocessor directives is: %d\n\n",nl)
      file = FILENAME
      tnl += nl
      tppd += ppd
      printf("Total number of lines is: %d\n",tnl+tppd)
      printf("Number of preprocessor directives is: %d\n",tppd)
      printf("Number of lines excluding preprocessor directives is: %d\n",tnl)
    }
# ->8---->8--- c_count.awk ends

and here's Jon Bentleys' m1 macro processor:

# -8<----8<--- m1.awk begins

# From Jon Bentley's article in Computer Language June '90 (v7n6)
# LISTING 4

function error(s) {

    print "m1: " s >"CON"                                               #
    exit 1

Quote:
}

function dofile(fname, savefile, savebuffer, newstring) {

    if (fname in activefiles)
        error("recursively reading file: " fname)
    activefiles[fname] = 1
    savefile = file
    file = fname
    savebuffer = buffer
    buffer = ""
    while (readline() != EOF) {

            print $0
        }

            dodef(1)                                                    #
        }

            dodef(0)                                                    #
        }

            if (NF != 2) error("bad inlcude line")
            dofile(dosubs($2))
        }

            if (NF != 2) error("bad if line")
            if (!($2 in symtab) || symtab[$2] == 0) gobble()
        }

            if (NF != 2) error("bad unless line")
            if (($2 in symtab) && symtab[$2] != 0) gobble()
        }

        }

        }
        else {
            newstring = dosubs($0)

                print newstring
            else
                buffer = newstring "\n" buffer
        }
    }
    close(fname)
    delete activefiles[fname]
    file = savefile
    buffer = savebuffer

Quote:
}

function readline(      i, status) {

    status = ""
    if (buffer != "") {
        i = index(buffer, "\n")
        $0 = substr(buffer, 1, i - 1)
        buffer = substr(buffer, i + 1)
    }
    else {
        if (file == "-") {                                              #
            i = getline                                                 #
        }                                                               #
        else {                                                          #
            i = getline < file                                          #
        }                                                               #
        status = i <= 0 ? "EOF" : ""                                    #
    }
    return status

Quote:
}

function gobble(        ifdepth) {

    ifdepth = 1
    while (readline()) {


    }

Quote:
}

function dosubs(s,      l, r, i, m) {


    l = ""  # Left of current pos: ready for output
    r = s   # Right of current: unexamined at this time

        l = l substr(r, 1, i - 1)


        if (i == 0) {

            break
        }
        m = substr(r, 1, i - 1)
        r = substr(r, i + 1)
        if (m in symtab) {
            r = symtab[m] r
        }
        else {


        }
    }
    return l r

Quote:
}

function dodef(def,     str, name) {

    name = $2
    sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]+/, "")    # $0=$P($0,FS,3,NF)
    str = $0
    while (str ~ /\\$/) {
        if (readline() == EOF)
            error("EOF inside definition")
        sub(/^[ \t]+/, "")
        sub(/[ \t]*\\$/, "\n" $0, str)
    }
    if (def || !(name in symtab))                                       #
        symtab[name] = str

Quote:
}

BEGIN {

    EOF = "EOF"
    if (ARGC == 1) dofile("-")                                          #
    else if (ARGC == 2) dofile(ARGV[1])
    else {
        print "Usage: m1 [file]" >"CON"                                 #
        exit
    }

Quote:
}

# ->8---->8--- m1.awk ends

and some sample data for m1:




This area was profoundly influenced by the groundbreaking work of Professor

# ->8---->8--- m1.dat ends






this situation many times yourself.

     Sincerly,

# ->8---->8--- sayno.mac ends


# ->8---->8--- sayno.def ends

and finally a cross referencing program

# -8<----8<--- xref.awk begins
# generate cross reference of identifiers in a program

#   Original program courtesy Bruce Feist of Arlington VA

{
# remove non alphanumeric characters

# convert to upper case
    $0 = toupper($0)
# add reference
    for (i = 1; i <= NF; i++)
    {
        if ($i !~ /^[0-9]+$/ && done[$i] != NR)  # check if number or done
        {
            done[$i] = NR               # mark as done
            xref[$i] = xref[$i] " " NR  # add reference
        }
    }

Quote:
}

END {
    for (i in xref)
        print i ": ", xref[i]

Quote:
}

# ->8---->8--- xref.awk ends

--

Patrick TJ McPhee
East York  Canada



Tue, 02 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


Quote:

> Especially for awk and C code, which in most cases can hardly be
> called 'selfexplaining', the amount of comment IMHO adds to the

I don't know any selfexplaining computer language. It doesn't matter if
you code in Visual Basic, C, awk, postscript, whatever. The individual
expressions and such are selfexplaining if you know the language. But
the intentions of the programmer are seldom very obvious once it gets
30-40+ lines of code without comments. Of course even one or two lines
can be obfuscated if you're "clever" enough. Some languages lend them
self to obfuscation more than others. It's easy to write completely
uncomprehensible code in Perl and Postscript for instance. And
obfuscated C contests do not exist for nothing.

An anecdote I just can't resist in sharing: I remember back in the
years of the Commodore VIC 20 in an issue of the great "Compute" mag; a
7 line BASIC code snippet which implemented a simple version of the
game "Snake". I kept the code snippet for very long hoping that I one
day would understand it. But nah! It just looked like gibberish and I
was VERY surprised when it indeed worked. The obfuscation served two
purposes there. 1) To show that the BASIC language as implemented on
the VIC 20 held quite a few secrets of course. 2) But also any readable
BASIC implementation of that game would have been way slower than this
one. Fortunately nowadays we seldom need to obfuscate the code like
that to gain speed. And if someone has to do that that someone should
be shot if she/he doesn't comment that piece of code heavily. =)

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 02 Jul 2002 03:00:00 GMT  
 AWK script to count LOC (lines of code)


Quote:


><snip>
> Kernighan & Plaugher's 'The
>Elements of Programming Style' has very good advice on what
>should and shouldn't be in comments (more comprehensive
>coverage of comments than in Kernighan & Pike's latest
>book). It's an old book with code examples in (pre 77)
>FORTRAN and PL/1, but it's filled with very good advice.

Agreed - it's one of my favourites.  I still dip into it from time to time
when writing some new code to remind myself about readability, choice of
constructs and layout.  It is really surprising how much good advice can be
packed into a book that is so thin.

Peter
--


Opinions expressed are my own and not necessarily those
of my employer



Wed, 03 Jul 2002 03:00:00 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. AWK script to count LOC (lines of code)

2. LOC Lines of Code debate

3. Tcl Lines Of Code (LOC) or Function Point (FP) counters

4. Counting lines of code in Prolog (source code)

5. Counting COBOL LOC

6. LOC counting standards, anyone?

7. Seek LOC Count Software in Pub Domain

8. program to count LOC

9. Counting "lines of code" in asm

10. No More Counting Lines of COBOL Code

11. Code metrics / line counts using python

12. Passing command line variables to AWK in shell script

 

 
Powered by phpBB® Forum Software