trace.py and coverage.py 
Author Message
 trace.py and coverage.py


Quote:
>The usage interfaces are very different.  Not sure which, if either,
>is better.

I developed the coverage.py interface so that it will support
coverage testing in a variety of testing scenarios.

In the coverage.py model you run tests, generate a report, use the
report to direct your testing activity to fruitful locations in the
code, generate a new report based on all the testing so far, and so
on.

When you're testing things by hand, or when you're developing a test
suite, it's important to be able to accumulate coverage information
over a series of tests.  It may not be cost-effective to go back and
run the whole set of tests again with coverage turned on (either
because the tests are expensive to set up or take too long to
execute).

Tracing is a different kind of activity from coverage testing and
probably needs a different user interface.

Quote:
>coverage.py is much faster.  In my tests, coverage.py takes less
>than 2 seconds where trace.py takes 30 seconds.

It's obvious where the bottleneck is in a tracing or coverage
application: it's the function that you pass to sys.settrace.  Here's
the tracing function from coverage.py:

c = {}
def t(f, x, y):
     c[(f.f_code.co_filename, f.f_lineno)] = 1
     return t

Here's what led me to implement it this way:

    1. If you try to increment a count of the number of times a line
has been executed then you need to handle the initial case (when
there's no entry for the line of code) and integer overflow.  It's
much cheaper to set the hash entry to 1.

    2. I thought at first that because many lines of code get executed
from each file, it would be better to write

    c[f.f_code.co_filename][f.f_lineno)] = 1

But it turns out that the extra code needed to handle the base cases
takes more time than the construction of the pair, which happens in
the python core.

    3. You can make Python run faster by giving variables shorter
names!  I presume that this is because variables are looked up by
name in the environment at run time.  So variables with shorter names
can be looked up more quickly.  Hence c, f, t, x, y in my code.

Quote:
>We consider merging the best features of these two tools and
>replacing the standard distribution's trace.py with the new merged
>tool.

I think code sharing is appropriate between coverage testing and
tracing tools (the parsing code in particular).  The tracing function
itself would need to be different (for reasons of speed as explained
above).  The user interfaces may need to be different (see my notes
at the top of this e-mail), so I'm not 100% sure of the merits of a
tool that tried to do both tasks.

I think the coverage.py licence is flexible enough that you shouldn't
have any trouble re-using my code.  Let me know if you need my help.



Tue, 22 Jun 2004 01:30:44 GMT  
 trace.py and coverage.py

Greetings Pythonismos, Pythonismas, Gareth Rees, and Andrew Dalke:

I learned from Python-URL [1] that Gareth Rees [2] has published a code coverage
tool called coverage.py [3].

On the web pages, Gareth compares coverage.py favorably against trace.py, but
he appears to be using an old version of trace.py [4] from 1999 that is still
sitting around on Andrew Dalke's FTP site.  Python comes with a newer version of
trace.py [5] which has been modified by Skip Montanaro and then by me [6].

Here are the differences that I see right now between current trace.py and
coverage.py:

1.  trace.py can do tracing as well as code coverage.  It can do either or both,
    and also has a "listfuncs" mode which just tracks which functions are
    invoked at least once.

2.  coverage.py actually parses so as to annotate lines correctly, where trace.py
    just ignores blank lines and comments.

3.  coverage.py is much faster.  In my tests, coverage.py takes less than
    2 seconds where trace.py takes 30 seconds.

4.  The usage interfaces are very different.  Not sure which, if either, is
    better.

5.  coverage.py only keeps a binary measure of whether each line was invoked or
    not, where trace.py counts how many times each line was invoked.

6.  coverage.py has a nice summary output that looks like this:

$ coverage.py -r -m foo.py bar.py
Name    Stmts   Exec  Cover  Missing
------------------------------------
foo        64     56    87%  23, 57, 85, 119, 125, 133, 137, 152
bar       105     90    86%  78-86, 237-246
------------------------------------
TOTAL     169    146    86%

Here are my ideas:

1.  Andrew Dalke removes that old version of trace.py from his FTP site!

2.  We consider merging the best features of these two tools and replacing the
    standard distribution's trace.py with the new merged tool.

2.b. Who's this "we"?  Well, I understand the trace.py code already, so my
    inclination would be to steal the "real parsing" feature from coverage.py
    and add it to trace.py, and to try to optimize trace.py.  (By profiling it,
    I suppose.  ;-))

Regards,

Zooko

---
             zooko.com
Security and Distributed Systems Engineering
---

[1] http://www.pythonware.com/daily/
[2] http://www.garethrees.org/
[3] http://www.garethrees.org/2001/12/04/python-coverage/
[4] ftp://starship.python.net/pub/crew/dalke/
[5] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src...
[6] http://zooko.com/ [7]
[7] These footnotes are getting a little out of hand.  Footnote footnote!



Tue, 22 Jun 2004 00:53:56 GMT  
 trace.py and coverage.py

Quote:

>     3. You can make Python run faster by giving variables shorter
> names!  

I find this very hard to believe.  Can you provide evidence?

Quote:
> I presume that this is because variables are looked up by name in
> the environment at run time.

Erm, not local variables (which includes function arguments).  They're
assigned a numerical index at compile time, which is what is used to
find the run time value.

Quote:
> So variables with shorter names can be looked up more quickly.

Erm, not really.  Even for global and builtin objects, the strings
containing the names will be interned at compile time and have its
hash value computed only once; this means that equality testing can be
done by comparing pointers.

The only way variable names can ever so slightly affect performance is
if you manage to have names that end up in the same hash bucket in the
relavent dictionary, and this (a) is pretty unlikely (b) probably has
a neglible effect.

Quote:
> Hence c, f, t, x, y in my code.

I think you need a better excuse for that :)

Cheers,
M.

--
  MARVIN:  Do you want me to sit in a corner and rust, or just fall
           apart where I'm standing?
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 2



Tue, 22 Jun 2004 02:29:21 GMT  
 trace.py and coverage.py

I, Zooko, wrote the lines prepended with "> >":


Quote:

> >The usage interfaces are very different.  Not sure which, if either,
> >is better.

> In the coverage.py model you run tests, generate a report, use the
> report to direct your testing activity to fruitful locations in the
> code, generate a new report based on all the testing so far, and so
> on.

I believe trace.py allows the same usage.

Quote:
> >coverage.py is much faster.  In my tests, coverage.py takes less
> >than 2 seconds where trace.py takes 30 seconds.

> It's obvious where the bottleneck is in a tracing or coverage
> application: it's the function that you pass to sys.settrace.  Here's
> the tracing function from coverage.py:

> c = {}
> def t(f, x, y):
>      c[(f.f_code.co_filename, f.f_lineno)] = 1
>      return t

I think you are right.  Here's the function in trace.py:

def localtrace_count(self, frame, why, arg):
    if why == 'line':
        (filename, lineno, funcname, context, lineindex,) = inspect.getframeinfo(frame)
        key = (filename, lineno,)
        self.counts[key] = self.counts.get(key, 0) + 1
    return self.localtrace

I vaguely recall changing from `f.f_code.co_filename' to `inspect.getframeinfo()'
because I thought I had encountered some mysterious problems with the `f_'
members.

I'll experiment with coverage.py and see if it handles Mojo Nation as well as
trace.py does.

Regards,

Zooko

---
                 zooko.com
Security and Distributed Systems Engineering
---



Tue, 22 Jun 2004 02:26:05 GMT  
 trace.py and coverage.py

    zooko> 2.  We consider merging the best features of these two tools and
    zooko>     replacing the standard distribution's trace.py with the new
    zooko>     merged tool.

In the long run I think you'll be better off basing a code coverage tool on
Jeremy Hylton's compiler package.  During compilation you generate
per-module counters that are incremented at the beginning of each basic
block (not just one for each line) and save the correspondence between each
counter and its location in the code.  At exit you dump the counts into a db
file of some sort which can then be post-processed into an annotated code
listing.

--



Tue, 22 Jun 2004 03:03:36 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. trying to run boa: problem with stc.py / stc_.py / stc_c.py

2. Deleting rexec.py and Bastion.py

3. site.py, sitecustomize.py and unicode errors

4. syntax-error with new Python 2.1 and pyFTPd.py and pyFTPdrop.py

5. execfile('bla.py'), can bla.py know its full path

6. problem using http proxy with urllib.py and urllib2.py

7. To fix urllib.py or urllib2.py?

8. Class Browser pyclbr.py for Py 1.5?

9. real.py needs math.py

10. Tkinter Issues (Py 1.5, Tcl/Tk 8.0p2, tkFileDialog.py)

11. ftpmtime.py: add-on for ftpmirror.py

12. SimpleDialog.py and FileDialog.py [1 of 2]

 

 
Powered by phpBB® Forum Software