Unit testing data; What to test 
Author Message
 Unit testing data; What to test

Having decided testing was a Good Thing and that I ought to do it, I've
started to write tests, using PyUnit.

The first question is straightforward: do people have a standard, simple
way of handling data for tests, or do you just tend to stick most of it in
the test classes?  KISS I suppose.  But if the software is going to change
a lot, isn't it a good idea to separate the tests from their input and
expected result data?  Of course with big data -- I have some mathematical
functions that need to be checked for example -- you're obviously not
going to dump it directly the test code: I'm wondering more about data of
the order of tens of objects (objects in the python sense).

In fact, do unit tests often end up having to be rewritten as code is
refactored?  Presumably yes, given the (low) level that unit tests are
written at.

The second (third) one is vague: What / How should one test?  Discuss.

I realise these questions are not strictly python-specific, but (let's
see, how do I justify this) it seems most of what is out there on the web
& USENET is either inappropriately formal, large-scale and
business-orientated for me (comp.software.testing and its FAQ, for
example), or merely describes a testing framework.  A few scattered XP
articles are the only exceptions I've found.

I'm sure there must be good and bad ways to test -- for example, I read
somewhere (can't find it now) that you should aim to end up so that each
bug generates, on (mode) average, one test failure, or at least a small
number.  The justification for this was that lots of tests failing as a
result of a single bug are difficult to deal with.  It seems to me that
this goal is a) impossible to achieve and b) pointless, since if multiple
test failures really are due to a single bug, they will all go away when
you fix it, just as compile-time errors often do.  No?

John



Thu, 07 Aug 2003 15:09:59 GMT  
 Unit testing data; What to test


    [snip]

Quote:
> The first question is straightforward: do people have a standard, simple
> way of handling data for tests, or do you just tend to stick most of it in
> the test classes?  KISS I suppose.  But if the software is going to change
> a lot, isn't it a good idea to separate the tests from their input and
> expected result data?  Of course with big data -- I have some mathematical
> functions that need to be checked for example -- you're obviously not
> going to dump it directly the test code: I'm wondering more about data of
> the order of tens of objects (objects in the Python sense).

If you keep the (stimuli, expected_responses) sets separate from the
module being tested, you gain much the same benefits (and pay much
the same costs) as by keeping documentation separate from code for
other kinds of docs (think of tests as a form of executable docs...!).

Keeping some tests/docs together with the code simplifies distribution
issues: anybody who has your code also has this minimal set of tests
(and other docs).  However, in some cases, the total amount of tests
and other docs can easily swap the amount of code -- and in this case
keeping them together can easily be thought of as "too costly". The
code can become practically unreadable if there are an order of magnitude
more lines of docs (including tests) than lines of actual source in a .py.

My favourite approach (and I do wish I was more systematical in
practising what I preach:-) is to try and strike a balance: keep with
the code (docstrings, comments, special-purpose test functions) a
reasonable minimal amount of tests (& other docs) -- indicatively,
roughly the same order of magnitude as the source code itself;
but _also_ have a separate set of docs (& tests) for more extensive
needs.  The ideal line of separation would be, is this stuff going to
be needed only by users who are also going to read or change the
sources, or is it going to be more generally useful.  Docs & tests that
only work at the *interface* of the module, without concern for its
*internals*, may allow many users to treat the module as a "black
box", only reading/running/enriching the separate docs-and-tests.

Quote:
> In fact, do unit tests often end up having to be rewritten as code is
> refactored?  Presumably yes, given the (low) level that unit tests are
> written at.

This is another good guideline: docs and tests best kept with the
source are those who'll likely need changing anyway when the code
is refactored.  The unit of reuse is the unit of release: when you
typically have internals changes that leave alone the interface of
the module, then that interface might usefully be documented and
tested "outside" the source code file -- you can imagine releasing
enhanced sources with identical "externals" docs/tests, or richer
tests/docs that require no re-release of the code itself.

Quote:
> The second (third) one is vague: What / How should one test?  Discuss.

> I realise these questions are not strictly python-specific, but (let's

So is much of what gets discussed here, and, as a born rambler, I
love the ethos of this group, which welcomes such 'somewhat OT'
discussions!-)

Quote:
> see, how do I justify this) it seems most of what is out there on the web
> & USENET is either inappropriately formal, large-scale and
> business-orientated for me (comp.software.testing and its FAQ, for
> example), or merely describes a testing framework.  A few scattered XP
> articles are the only exceptions I've found.

If XP fits your needs, you could definitely do worse than adopt it
wholesale!  Yes, much of what gets discussed about software
development deals with large-scale SW (in testing and elsewhere) --
that's because problems grow non-linearly with SW scale... when
you release an application that's about 1,000 SLOC, it does not
really matter much if your process and approach are jumbled; at
10,000 SLOC, it's a problem; at 100,000, an utter nightmare, so
you HAVE to be precise in defining your process then (or, call it
100, 1000, 10,000 FP -- but it's really about SLOC more than it
is about FP, which is why higher-level languages help so much).

Differently from what XP specifies, I think tests should be in two
categories (and, similarly, so should deliverables be, rather than
the just-one-deliverable which so characterizes XP -- that is a
tad too X for my conservative self, even though I buy into 70+%
of XP's tenets!).  Which, again, leads us to the internal/external
tests-and-docs split.  External tests and docs (possibly, in large
scale devt, on several scales: module aka unit, subsystem, whole
system) deal with externals/interfaces (not just GUI's &c -- I'm
talking about, e.g, module interfaces to other software; _of course_
'engine' and 'presentation' SHOULD almost invariably be separate
components, but that's another plane of 'split').  Internal tests
and docs deal with internals -- the kind of thing that needs to be
tweaked at each refactoring.

That's not the same dividing plane as the classic unit vs system
test distinction -- it's slanted differently, and perks up again at
each granularity level in a large-enough project (minimal granule
being the module -- perhaps, at Python level, package -- not the
single function or class, since functions and classes inside one
module _are_ typically coupled too strongly to test/release/doc
independently... there are exceptions, modules that are not very
cohesive but rather collections of somewhat unrelated stuff for
basically 'packaging' purposes, but they should be the exception
rather than the rule).

Quote:
> I'm sure there must be good and bad ways to test -- for example, I read
> somewhere (can't find it now) that you should aim to end up so that each
> bug generates, on (mode) average, one test failure, or at least a small
> number.  The justification for this was that lots of tests failing as a
> result of a single bug are difficult to deal with.  It seems to me that
> this goal is a) impossible to achieve and b) pointless, since if multiple
> test failures really are due to a single bug, they will all go away when
> you fix it, just as compile-time errors often do.  No?

True.  However, if tests are designed in terms of a _sequence_, it IS
often possible to arrange for the most fundamental tests to be run
*first*, ensuring minimal workability of some lower-level, call it
'infrastructural', kind of objects, so that dependent (higher level)
parts can be tested _assuming_ the lower-level stuff works.  This
is more of a consideration for 'internals' kinds of tests, IMHO.

Alex



Thu, 07 Aug 2003 16:16:32 GMT  
 Unit testing data; What to test

Quote:

> Having decided testing was a Good Thing and that I ought to do it, I've
> started to write tests, using PyUnit.

> The second (third) one is vague: What / How should one test?  Discuss.

> I'm sure there must be good and bad ways to test -- for example, I read
> somewhere (can't find it now) that you should aim to end up so that each
> bug generates, on (mode) average, one test failure, or at least a small
> number.  The justification for this was that lots of tests failing as a
> result of a single bug are difficult to deal with.  It seems to me that
> this goal is a) impossible to achieve and b) pointless, since if multiple
> test failures really are due to a single bug, they will all go away when
> you fix it, just as compile-time errors often do.  No?

How I commonly see it done (and how I do it):

common cases
edge cases -- perhaps -1 could be valid in some cases and not others
              or maybe you have a buffer which is only soo big
bad cases -- but -2 never is

You want to see if your code fails or gives expected output.  The other big
thing is every time you find a bug, you fix it and then add a test case that
triggers the bug.  So if you run the buggy version you get an error and the
new version is clean.  This way, if you accidently remove the fix or cause
a similar bug elsewhere, your tests catch it.



Thu, 07 Aug 2003 17:15:53 GMT  
 Unit testing data; What to test

Quote:

> [snip] But if the software is going to change
> a lot, isn't it a good idea to separate the tests from their input and
> expected result data?

Do whatever is clearest, but try to arrange the tests such that their data
doesn't change often. High-maintenance test cases are either bad test cases
or a sign that the tested code is being changed for no good reason.

Quote:
> In fact, do unit tests often end up having to be rewritten as code is
> refactored?  Presumably yes, given the (low) level that unit tests are
> written at.

Refactoring should normally not break tests -- it means "now the code is
working and passing the tests, let's remove cruft".

XP zealots might (don't quote me) go as far as saying that any change that
breaks the tests is a feature change, not a refactoring.

Refactoring of tests is a separate activity.

Quote:
> The second (third) one is vague: What / How should one test?  Discuss.

Test *everything*, and do it thoroughly!

If you want answers implying less work, there are probably some guidelines
somewhere...

Personally, I like the XP testing approach:

  1. Identify a requirement that some code has to do X
  2. Write a test where we try to do X, and check that it worked
  3. Keep adding or changing code until the test passes, and no other test
     breaks.

Finding a bug is equivalent to identifying a requirement that some code should
not do Y -- write a test, and code until it passes.

A complete set of tests thus written forms a catalogue of what the code is
guaranteed to do correctly; use of the tested code in any other manner is
not guaranteed to work.

Some people make the mistake of wasting a lot of testing effort on checking
that their code does something "sensible" if misused. There's no time-efficient
way to catch mistakes made by people who didn't read the manual.

Think of the tests as a combination of 'programmers manual' and 'example code'.

Quote:
> I'm sure there must be good and bad ways to test

True enough. The bad way is to *not* test.  :-)

Really, the point is that time spent wondering or reading about how best to
test would be better spent simply writing as many tests as possible.

-Steve

--
Steve Purcell, Pythangelist
Get testing at http://pyunit.sourceforge.net/
Get servlets at http://pyserv.sourceforge.net/
"Even snakes are afraid of snakes." -- Steven Wright



Thu, 07 Aug 2003 18:17:07 GMT  
 Unit testing data; What to test

Quote:

> Having decided testing was a Good Thing and that I ought to do it, I've
> started to write tests, using PyUnit.

> The first question is straightforward: do people have a standard, simple
> way of handling data for tests, or do you just tend to stick most of it in
> the test classes?  KISS I suppose.  But if the software is going to change
> a lot, isn't it a good idea to separate the tests from their input and
> expected result data?  Of course with big data -- I have some mathematical
> functions that need to be checked for example -- you're obviously not
> going to dump it directly the test code: I'm wondering more about data of
> the order of tens of objects (objects in the Python sense).

> In fact, do unit tests often end up having to be rewritten as code is
> refactored?  Presumably yes, given the (low) level that unit tests are
> written at.

> The second (third) one is vague: What / How should one test?  Discuss.

> I realise these questions are not strictly python-specific, but (let's
> see, how do I justify this) it seems most of what is out there on the web
> & USENET is either inappropriately formal, large-scale and
> business-orientated for me (comp.software.testing and its FAQ, for
> example), or merely describes a testing framework.  A few scattered XP
> articles are the only exceptions I've found.

> I'm sure there must be good and bad ways to test -- for example, I read
> somewhere (can't find it now) that you should aim to end up so that each
> bug generates, on (mode) average, one test failure, or at least a small
> number.  The justification for this was that lots of tests failing as a
> result of a single bug are difficult to deal with.  It seems to me that
> this goal is a) impossible to achieve and b) pointless, since if multiple
> test failures really are due to a single bug, they will all go away when
> you fix it, just as compile-time errors often do.  No?

> John

I suggest reading some of the written material about XP
that you can find by searching your favourite online book
dealer for "extreme programming". I get five hits for the
series that was started with Kent Beck with Addison-Wesley,
two of which are not yet published... All of the first
three that I know are *thin* books and easy to read!

In general I'd say only that much: knowbody knows better
*what* to test than people writing the code (after the use
cases / user stories have been determined)! So, if there
is a lack of knowledge about what to test (I'm not sure this
is the case here, though, but this is the subject line ;-)
there must be some deeper issue on the common understanding
of what the system is expected to do.

As for organizing the tests: see the books I mentioned, but
don't expect a detailed process description. Everything that
works for you will be fine! If something won't: adapt, shake
and reiterate! ;-)

Regards,

Dinu

--
Dinu C. Gherman
ReportLab Consultant - http://www.reportlab.com
................................................................
"The only possible values [for quality] are 'excellent' and 'in-
sanely excellent', depending on whether lives are at stake or
not. Otherwise you don't enjoy your work, you don't work well,
and the project goes down the drain."
                    (Kent Beck, "Extreme Programming Explained")



Sun, 10 Aug 2003 18:10:19 GMT  
 Unit testing data; What to test

Sorry about the formatting etc. of this post -- I had to cut and paste
from Google.

Quote:


[...]
> If you keep the (stimuli, expected_responses) sets separate from the
> module being tested, you gain much the same benefits (and pay much
> the same costs) as by keeping documentation separate from code for
> other kinds of docs (think of tests as a form of executable docs...!).

> Keeping some tests/docs together with the code simplifies distribution
> issues: anybody who has your code also has this minimal set of tests
> (and other docs).

Well, that's not a problem, is it (unless the tests really are huge)?
They can be in the distribution without being mixed in with the code.

Quote:
> However, in some cases, the total amount of tests
> and other docs can easily swap the amount of code -- and in this case
> keeping them together can easily be thought of as "too costly". The
> code can become practically unreadable if there are an order of
magnitude
> more lines of docs (including tests) than lines of actual source in a

.py.

Agreed.  My tests are separate from the code -- it hadn't occurred to me
to keep them in the same file.  It was the separation (or not) of test
data and test code I was wondering about, though.

Quote:
> My favourite approach (and I do wish I was more systematical in
> practising what I preach:-) is to try and strike a balance: keep with
> the code (docstrings, comments, special-purpose test functions) a
> reasonable minimal amount of tests (& other docs) -- indicatively,
> roughly the same order of magnitude as the source code itself;
> but _also_ have a separate set of docs (& tests) for more extensive
> needs.

[...]

This is all good advice, but it doesn't actually answer my question.  ;-)
Anyway, I think I will try having some tests in the code, sounds like a
great idea considering the tests-as-docs aspect.

Quote:
> > The second (third) one is vague: What / How should one test?  Discuss.
[...]
> > see, how do I justify this) it seems most of what is out there on the web
> > & USENET is either inappropriately formal, large-scale and
[...]
> > A few scattered XP
> > articles are the only exceptions I've found.
[...]
> If XP fits your needs, you could definitely do worse than adopt it
> wholesale!  Yes, much of what gets discussed about software
> development deals with large-scale SW (in testing and elsewhere) --
> that's because problems grow non-linearly with SW scale... when
> you release an application that's about 1,000 SLOC, it does not
> really matter much if your process and approach are jumbled; at

Let me assure you it _is_ possible to achieve an unmaintainable mess in
~1000 SLOC.  I have personally achieved this in, let's see (C-x C-f
munge.pl RET) ... 832 SLOC.  This is one of many things that Perl makes
easy (this is completely unfair to Perl of course -- what really makes it
easy is starting out thinking 'this is going to be a 100 line script' and
then extending by cut and paste -- I've learned my lesson, honest!).

Quote:
> 10,000 SLOC, it's a problem; at 100,000, an utter nightmare, so
> you HAVE to be precise in defining your process then (or, call it

Yeah, of course, I understand that.  But there is still a place for lots
of testing for smaller efforts, minus the heavyweight formal process.

Quote:
> 100, 1000, 10,000 FP -- but it's really about SLOC more than it
> is about FP, which is why higher-level languages help so much).

> Differently from what XP specifies, I think tests should be in two
> categories (and, similarly, so should deliverables be, rather than
> the just-one-deliverable which so characterizes XP -- that is a
> tad too X for my conservative self, even though I buy into 70+%
> of XP's tenets!).

Which two categories?  External and internal?  How is this different from
the XP unit test versus acceptance tests division?  Is your point just
that the external / internal division applies on more levels than just
final user / everything else -- which is what you seem to be talking about
below?

here's a snip explaining what XP people mean by acceptance tests for
anybody that hasn't read about it:

http://www.extremeprogramming.org/rules/functionaltests.html
----------

      Acceptance tests are created from user
stories. During an iteration the user stories selected
during the iteration planning meeting will be
translated into acceptance tests. The customer
specifies scenarios to test when a user story has
been correctly implemented. A story can have one
or many acceptance tests, what ever it takes to
ensure the functionality works.

      Acceptance tests are black box system tests.
Each acceptance test represents some expected
result from the system. Customers are responsible
for verifying the correctness of the acceptance tests
and reviewing test scores to decide which failed
tests are of highest priority. Acceptance tests are
also used as regression tests prior to a production
release.

----------

Quote:
> Which, again, leads us to the internal/external
> tests-and-docs split.  External tests and docs (possibly, in large
> scale devt, on several scales: module aka unit, subsystem, whole
> system) deal with externals/interfaces (not just GUI's &c -- I'm
> talking about, e.g, module interfaces to other software; _of course_
> 'engine' and 'presentation' SHOULD almost invariably be separate
> components, but that's another plane of 'split').  Internal tests
> and docs deal with internals -- the kind of thing that needs to be
> tweaked at each refactoring.

> That's not the same dividing plane as the classic unit vs system
> test distinction -- it's slanted differently, and perks up again at
> each granularity level in a large-enough project (minimal granule
> being the module -- perhaps, at Python level, package -- not the
> single function or class, since functions and classes inside one
> module _are_ typically coupled too strongly to test/release/doc
> independently... there are exceptions, modules that are not very
> cohesive but rather collections of somewhat unrelated stuff for
> basically 'packaging' purposes, but they should be the exception
> rather than the rule).

[...]

John



Mon, 18 Aug 2003 01:41:01 GMT  
 Unit testing data; What to test


Quote:

> Sorry about the formatting etc. of this post -- I had to cut and paste
> from Google.

My sympathy.  I'll even forgive you're not mentioning that
I'm the guy whose comments you're responding to:-).

Quote:



> [...]
> > If you keep the (stimuli, expected_responses) sets separate from the
> > module being tested, you gain much the same benefits (and pay much
> > the same costs) as by keeping documentation separate from code for
> > other kinds of docs (think of tests as a form of executable docs...!).

> > Keeping some tests/docs together with the code simplifies distribution
> > issues: anybody who has your code also has this minimal set of tests
> > (and other docs).

> Well, that's not a problem, is it (unless the tests really are huge)?
> They can be in the distribution without being mixed in with the code.

They can be, sure -- just like other documentation can be in your
distribution without being mixed with your code.  Again, I claim
the parallel is very strict.  Maybe people are *supposed* to only
ever distribute your code in the way you packaged it up -- but,
what happens if, e.g., some automated dependency finder picks up
your wonderful bleeppa.py, and stashes it into allyouneed.zip
*without* the accompanying bleeppa.doc and bleeppa_tests.py files,
for example?  Answer: whoever ends up rummaging some time later
in the unpacked allyouneed.zip WILL have your bleeppa.py but
none of the other files that you originally packaged with it.

So, if bleeppa.py itself contains some minimal/essential subset
of its own docs (as docstrings or comments) and unit-tests, you
are covering up for exactly such problems -- making your code
more usable when it happens to start going around without other
files that _should_ always accompany it.

Quote:
> > However, in some cases, the total amount of tests
> > and other docs can easily swap the amount of code -- and in this case
> > keeping them together can easily be thought of as "too costly". The
> > code can become practically unreadable if there are an order of
> magnitude
> > more lines of docs (including tests) than lines of actual source in a
> .py.

> Agreed.  My tests are separate from the code -- it hadn't occurred to me
> to keep them in the same file.  It was the separation (or not) of test
> data and test code I was wondering about, though.

I guess I tend not to think of that because my typical 'test _code_'
IS separated from my typical 'test _data_' anyway -- Tim Peters'
doctest.py being the former:-).  So, I don't think of merging test
code and test data, any more than I think of distributing, say, Word
together with my .doc files:-).

Or, if you look at the docstrings which doctest.py runs as "code"
rather than "data", then I guess the issue I have not faced (in
Python) is that of _separating_ the test-data from the code.

When I _do_ face it in other languages (in some database-centered
components: the DB on which the tests are to run is unconscionably
BIG, if you think of it in any textual form in which it might be
feasible to merge it with the relatively small amount of code that
needs it), the testcode/testdata separation happens to be well
near inevitable.  The code carries with it the SQL code (as data)
that it sends to the DBMS and the results it expects, but not the
DB itself -- so, it's not really feasible to test the component
(run its standard unit tests, I mean) if you only have its executable
part without the accompanying files (since the starting DB needed
for the tests is just such an accompanying file).

Quote:
> > My favourite approach (and I do wish I was more systematical in
> > practising what I preach:-) is to try and strike a balance: keep with
> > the code (docstrings, comments, special-purpose test functions) a
> > reasonable minimal amount of tests (& other docs) -- indicatively,
> > roughly the same order of magnitude as the source code itself;
> > but _also_ have a separate set of docs (& tests) for more extensive
> > needs.
> [...]

> This is all good advice, but it doesn't actually answer my question.  ;-)
> Anyway, I think I will try having some tests in the code, sounds like a
> great idea considering the tests-as-docs aspect.

Serendipity, pure serendipity:-).  I misread your question, or you
miswrite it (or 50-50 -- comes to the same thing in the end:-), yet
you still gain something useful from the resulting discussion...:-).

Quote:
> > you release an application that's about 1,000 SLOC, it does not
> > really matter much if your process and approach are jumbled; at

> Let me assure you it _is_ possible to achieve an unmaintainable mess in
> ~1000 SLOC.  I have personally achieved this in, let's see (C-x C-f
> munge.pl RET) ... 832 SLOC.  This is one of many things that Perl makes

Oh, I'm sure I have beaten that, a LONG time ago back when I
wrote in APL -- but what I'm saying it's not really the _process_
that controls that, for a small-enough deliverable.  Good taste,
experience, and some common sense on the coder's part may suffice,
with just-about-nonexistent-process, to keep things under control
_at this level of code-size_.

Quote:
> easy (this is completely unfair to Perl of course -- what really makes it
> easy is starting out thinking 'this is going to be a 100 line script' and
> then extending by cut and paste -- I've learned my lesson, honest!).

> > 10,000 SLOC, it's a problem; at 100,000, an utter nightmare, so
> > you HAVE to be precise in defining your process then (or, call it

> Yeah, of course, I understand that.  But there is still a place for lots
> of testing for smaller efforts, minus the heavyweight formal process.

Absolutely YES.  Testing ain't a bad thing even if done informally!-)

Quote:
> > 100, 1000, 10,000 FP -- but it's really about SLOC more than it
> > is about FP, which is why higher-level languages help so much).

> > Differently from what XP specifies, I think tests should be in two
> > categories (and, similarly, so should deliverables be, rather than
> > the just-one-deliverable which so characterizes XP -- that is a
> > tad too X for my conservative self, even though I buy into 70+%
> > of XP's tenets!).

> Which two categories?  External and internal?  How is this different from
> the XP unit test versus acceptance tests division?  Is your point just
> that the external / internal division applies on more levels than just
> final user / everything else -- which is what you seem to be talking about
> below?

Basically, yes, and I apologize for muddled expression.  The point is
that most often I build components that no 'final' user will ever
notice (unless something goes badly wrong:-) -- call it 'middleware'
or whatever.  But not-so-final users may need to get at parts of
them -- not just future maintainers/extenders, and current and future
re-users, but (at least when scripting is possible:-) current and
future _scripters_ (some 'power-users', 3rd party system integrators,
customer-support/application-engineers, ...).  What's "internal" and
what's "external" varies depending on the target audience.

Quote:
> here's a snip explaining what XP people mean by acceptance tests for
> anybody that hasn't read about it:

> http://www.extremeprogramming.org/rules/functionaltests.html

    [snip, but the whole site IS a recommended read:-)]

Alex



Tue, 19 Aug 2003 00:29:48 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Test::Unit::Mock: Mock objects for testing with Test::Unit

2. using test::unit for C++ unit tests

3. help reading negative values in data file - test code and test data

4. Test order in Test::Unit

5. Test order in Test::Unit

6. Automating UI tests (was Re: Art of Unit Testing)

7. Test Tool for Unit Tests?

8. test test test test

9. Tcl Test Script Data – Database Advice (Test Automation Improvements)

10. Testing, testing, testing,

11. test, test, test

12. test test test

 

 
Powered by phpBB® Forum Software