Traversal of a directory tree with housekeeping per directory 
Author Message
 Traversal of a directory tree with housekeeping per directory

<newbie warning>

I want to traverse a tree but, within each directory, I need to do a fair
amount of housekeeping.  In particular, I need to read through all the files
in the directory before I have the information I need to start doing the
housekeeping in that directory.

Is use File::Find ; still the way to go?

Why I am doubtful is that it seems klutzy to check for a change in directory
for every file.  And, even if I do that, detecting a change of directory
means, "Oops.  Ungracefully scramble back to the directory you just left and
housekeep."

I think that doing an independent find() in each directory is what I want
but not using find() to step through those directories seems stupid.

Any insights welcome.

</newbie warning>



Tue, 10 Feb 2004 02:23:46 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> I want to traverse a tree but, within each directory, I need to do a
> fair amount of housekeeping.  In particular, I need to read through
> all the files in the directory before I have the information I need
> to start doing the housekeeping in that directory.

> Is use File::Find ; still the way to go?

> Why I am doubtful is that it seems klutzy to check for a change in
> directory for every file.  And, even if I do that, detecting a
> change of directory means, "Oops.  Ungracefully scramble back to the
> directory you just left and housekeep."

> I think that doing an independent find() in each directory is what I
> want but not using find() to step through those directories seems
> stupid.

> Any insights welcome.

finddepth() may help.  That way, whenever a directory is processed,
you know that all of the files have already been processed.  Depending
on what exactly you need to do, that may be sufficient.

--
Ren Maddox



Tue, 10 Feb 2004 05:21:03 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> I want to traverse a tree but, within each directory, I need to do a fair
> amount of housekeeping.  In particular, I need to read through all the files
> in the directory before I have the information I need to start doing the
> housekeeping in that directory.

Again, I offer my simple tree-walker as an alternative to File::Find. The
compact version recurses into subdirectories as it finds them, so it may
not quite meet your requirements. The full version separates the names before
processing any of them, so you can do something to all the files, do some
housekeeping, and then start recursing into the subdirectories.

Note that in either case, unlike File::Find, these functions do not chdir()
you to the various directories. Instead, you have the full pathname to each
object to manipulate as needed.

Enjoy!

-- Dave Tweed

=============================================================================

#!perl -w
# treewalk.pl - example of walking a directory, for comp.lang.perl.misc

&process_directory ('/path/to/root');

# compact version

sub process_directory {

    # get all of the names from the directory, excluding "." and ".."
    local (*DIR);
    opendir (DIR, $path) || die "can't open directory $path: $!";

    closedir DIR;

    # the sort is optional

        my $temp = "$path/$_";
        if (-d $temp) {
            &process_directory ($temp);
        } else {
            &process_file ($temp);
        }
    }

Quote:
}

sub process_file {

    # whatever ...

Quote:
}

=============================================================================

#!perl -w
# treewalk.pl - example of walking a directory, for comp.lang.perl.misc

&process_directory ('/path/to/root');

# full version

sub process_directory {

    # get the names out of the current directory and separate them into
    # files and subdirectories



        if (-d "$path/$_") {

        } else {

        }
    }

    # process all the files

        &process_file ("$path/$_");
    }

    # do any housekeeping here, before recursing into subdirectories

    # process all the subdirectories

        &process_directory ("$path/$_");
    }

Quote:
}

sub process_file {

    # whatever ...

Quote:
}

# customize the filtering and sorting of names here

sub read_directory {

    # get all of the names from a directory, excluding "." and ".."
    local (*DIR);
    opendir (DIR, $path) || die "can't open directory $path: $!";

    closedir DIR;

    # optional - filter out all other names starting with '.'

    # optional - sort the names


Quote:
}

=============================================================================


Tue, 10 Feb 2004 09:11:42 GMT  
 Traversal of a directory tree with housekeeping per directory

Dave> Again, I offer my simple tree-walker as an alternative to File::Find. The
Dave> compact version recurses into subdirectories as it finds them, so it may
Dave> not quite meet your requirements.

...

Dave>         if (-d $temp) {
Dave>             &process_directory ($temp);
Dave>         } else {
Dave>             &process_file ($temp);
Dave>         }

Bad.  It chases symlinks.  Please make it not do that, or you will
ruin a good day.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Tue, 10 Feb 2004 14:12:04 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> Bad.  It chases symlinks.  Please make it not do that, or you will
> ruin a good day.

Not an issue for me, since my platform doesn't support them.

The OP may or may not want to follow links.

You, of all people, should be able to insert "unless -l _" where needed.

These were obviously skeletal scripts, not polished platform-independent
applications. Sheesh.

-- Dave Tweed



Wed, 11 Feb 2004 01:50:54 GMT  
 Traversal of a directory tree with housekeeping per directory


Quote:
>> Bad.  It chases symlinks.  Please make it not do that, or you will
>> ruin a good day.

Dave> Not an issue for me, since my platform doesn't support them.

Fine.

Dave> The OP may or may not want to follow links.

Well, if you follow links, and you aren't doing duplicate elimination,
you'll ruin a good day.  That's my point.  Do I need to repeat it?

Dave> You, of all people, should be able to insert "unless -l _" where needed.

Right. *I* can.  But I'm not the only person reading this newsgroup (I
hope :).  The warning was as much for the people reading this group as
it was for you.

Dave> These were obviously skeletal scripts, not polished
Dave> platform-independent applications. Sheesh.

Perhaps a flag that said "tested on DOS where symlinks don't exist"
might have been prudent.  It wasn't clear to me that you were running
on DOS.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Wed, 11 Feb 2004 03:07:30 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> Dave> The OP may or may not want to follow links.

> Well, if you follow links, and you aren't doing duplicate elimination,
> you'll ruin a good day.  That's my point.  Do I need to repeat it?

Well, yes. Help me understand the issue here. You seem to be asserting
that if a filesystem supports symlinks, then people will invariably
use them, and when they do, they cause problems (what, infinite loops?).

Quote:
> The warning was as much for the people reading this group as
> it was for you.

Then you need to be less oblique with your comments. You made it look
like my script was somehow creating a horrendous problem that would
render it completely useless. However, I would note that File::Find does
not address this problem either; the user needs to put the appropriate
test in his wanted() function. Therefore, you should have responded in
a way that made it clear that you were talking about an issue common
to all tree-walkers. In fact, it isn't a Perl issue at all.

Quote:
> Perhaps a flag that said "tested on DOS where symlinks don't exist"
> might have been prudent.  It wasn't clear to me that you were running
> on DOS.

It shouldn't matter. Even if my platform supported symlinks, it isn't
clear that I'd be using them in the trees I'd be running this tool on.

Neither one of us knows why the OP was walking his trees.

-- Dave Tweed



Wed, 11 Feb 2004 07:23:11 GMT  
 Traversal of a directory tree with housekeeping per directory


Dave> The OP may or may not want to follow links.

Quote:

>> Well, if you follow links, and you aren't doing duplicate elimination,
>> you'll ruin a good day.  That's my point.  Do I need to repeat it?

Dave> Well, yes. Help me understand the issue here. You seem to be asserting
Dave> that if a filesystem supports symlinks, then people will invariably
Dave> use them, and when they do, they cause problems (what, infinite loops?).

Yes.  Sorry.  Let me slow down a bit.

If someone creates:

        ln -s .. FOO

in a directory they are searching with your routine, it will go into
an infinite loop, because it'll keep reading FOO then FOO/FOO then
FOO/FOO/FOO, etc etc.  The problem is that a symlink can point to a
directory, especially directories that are above the search point.
And you weren't testing for that.

Quote:
>> The warning was as much for the people reading this group as
>> it was for you.

Dave> Then you need to be less oblique with your comments. You made it look
Dave> like my script was somehow creating a horrendous problem that would
Dave> render it completely useless.

It renders it useless in directories that may contain symlinks. :)

Dave>  However, I would note that File::Find does not address this
Dave> problem either; the user needs to put the appropriate test in
Dave> his wanted() function.

Yes it does.  It avoids following symlinks.  It goes to special pains
to do that.  You do not need to test in the wanted(), because it'll be
avoided in the recursion part, not the wanted part.  You'll get the
symlink in your wanted(), but it won't follow it.

Dave>  Therefore, you should have responded in
Dave> a way that made it clear that you were talking about an issue common
Dave> to all tree-walkers. In fact, it isn't a Perl issue at all.

Well, it's an issue in *YOUR* perl code.  It's not an issue for
tree-walkers written with File::Find (which is also Perl code) or in
properly written Perl tree walkers.  So it's not an issue for *all*
tree walkers, just the ones that don't do the right thing there. :)

Quote:
>> Perhaps a flag that said "tested on DOS where symlinks don't exist"
>> might have been prudent.  It wasn't clear to me that you were running
>> on DOS.

Dave> It shouldn't matter. Even if my platform supported symlinks, it
Dave> isn't clear that I'd be using them in the trees I'd be running
Dave> this tool on.

Symlinks can be anywhere.

Dave> Neither one of us knows why the OP was walking his trees.

Granted. :)

print "Just another Perl hacker,"

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Wed, 11 Feb 2004 09:59:45 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

>> Dave> The OP may or may not want to follow links.

>> Well, if you follow links, and you aren't doing duplicate elimination,
>> you'll ruin a good day.  That's my point.  Do I need to repeat it?

>Well, yes. Help me understand the issue here. You seem to be asserting
>that if a filesystem supports symlinks, then people will invariably
>use them, and when they do, they cause problems (what, infinite loops?).

Yes.  It is very easy to create a symlink that will cause a naive
treewalker to go into an infinite loop.

Another example is if you roll your own treewalker to do the
equivalent of "rm -r foo".  If you naively follow the symlink to
a directory somewhere else, you can wipe out a lot more than you
intended.

Quote:
>> The warning was as much for the people reading this group as
>> it was for you.

>Then you need to be less oblique with your comments. You made it look
>like my script was somehow creating a horrendous problem that would
>render it completely useless.
>From experience, I can tell you that the horrendous problem does exist.

It doesn't render the script completely useless; the problem makes
the script extremely destructive.

Quote:
> However, I would note that File::Find does
>not address this problem either;

It appears that you are using an out-of-date version of File/Find.pm
as a reference.  To quote from the version included with perl-5.6.1:

NAME
     find - traverse a file tree

     finddepth - traverse a directory structure depth-first

SYNOPSIS
         use File::Find;
         find(\&wanted, '/foo', '/bar');
         sub wanted { ... }

         use File::Find;
         finddepth(\&wanted, '/foo', '/bar');
         sub wanted { ... }

         use File::Find;
         find({ wanted => \&process, follow => 1 }, '.');

Quote:
> the user needs to put the appropriate test in his wanted() function.

No, the appropriate test needs to be in the find function.  Always test
for -l() before testing for -d().  If explicitly following symlinks, use
a hash with device_number&inode_number to make sure that you never
process the same directory twice.

Quote:
>Therefore, you should have responded in
>a way that made it clear that you were talking about an issue common
>to all tree-walkers. In fact, it isn't a Perl issue at all.

It was a problem with the sample code that was posted.

Anyone who has been burned by code like that wants to make sure
that no-one else has to suffer such an ignominious fate.

Quote:
>> Perhaps a flag that said "tested on DOS where symlinks don't exist"
>> might have been prudent.  It wasn't clear to me that you were running
>> on DOS.

>It shouldn't matter. Even if my platform supported symlinks, it isn't
>clear that I'd be using them in the trees I'd be running this tool on.

Always check for dangerous conditions, even you intend to never
run into such a situation.  The people who copy-and-paste the posted
code may not be so diligent.
        -Joe

--
See http://www.inwap.com/ for PDP-10 and "ReBoot" pages.



Wed, 11 Feb 2004 11:36:20 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> Dave>  However, I would note that File::Find does not address this
> Dave> problem either; the user needs to put the appropriate test in
> Dave> his wanted() function.

> Yes it does.  It avoids following symlinks.  It goes to special pains
> to do that.
[snip]
> It's not an issue for tree-walkers written with File::Find (which is
> also Perl code) or in properly written Perl tree walkers.  So it's not
> an issue for *all* tree walkers, just the ones that don't do the right
> thing there. :)

OK, you're right. The latest versions of File::Find do check for symlinks.
However, this was not true of the module supplied with Perls up through
version 5.005, and I'm sure there are lots of machines out there running
old code. People still need to beware.

Let me guess: You were the one who finally fixed it ...

-- Dave Tweed



Wed, 11 Feb 2004 12:16:33 GMT  
 Traversal of a directory tree with housekeeping per directory

Dave> OK, you're right. The latest versions of File::Find do check for
Dave> symlinks.  However, this was not true of the module supplied
Dave> with Perls up through version 5.005, and I'm sure there are lots
Dave> of machines out there running old code. People still need to
Dave> beware.

I think you need to read the older code a bit better.  Here's the code
from 5.005_03:


        (($topdev,$topino,$topmode,$topnlink) =
          ($Is_VMS ? stat($topdir) : lstat($topdir)))
          || (warn("Can't stat $topdir: $!\n"), next);
        if (-d _) {

Notice the "lstat" rather than "stat", followed by -d.  If it's a
symlink, the -d cannot report true at this point, so a symlink
pointing at a directory is *not* followed.  And I'm very very sure
that this is also the behavior all the way back through the find.pl
subroutine included in perl4 (perl3?).  Because following symlinks is
universally a *bad* thing if you don't also keep from looping, and
you've got to give credit to Larry for certainly knowing that.

In fact, I just hunted down 4.036 in the CPAN, and found in find.pl:

                # Get link count and check for directoriness.

                ($dev,$ino,$mode,$nlink) = lstat($_) unless $nlink;

                if (-d _) {

                    # It really is a directory, so do it recursively.

There it is.  lstat() followed by -d _.  Won't pull true for a symlink
pointing to a directory.

Again, since you seem to want everything pointed out to you, the
following are dangerous, because they can report true on a symlink
pointing to a directory:

        -d $foo
        stat($foo) ... -d _

the following are safe:

        not -l $foo and -d $foo
        lstat($foo) ... -d _

The last one is the one used by File::Find (5.5.3) and find.pl (4.036)
above.

Does that help?  I'm sorry I'm having to reiterate... I guess I
presumed you had more knowledge than you seem to be showing. :)

print "Just another Perl hacker,"

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Wed, 11 Feb 2004 14:14:08 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> Does that help?  I'm sorry I'm having to reiterate... I guess I
> presumed you had more knowledge than you seem to be showing. :)

Yes, thank you very much. I apologize for not realizing that lstat()
would prevent -d. I guess I didn't think about it very hard.

As I said, I don't generally use links (symbolic or otherwise) in
my own directory structures, even on platforms that support them.
I presumed that people who use them would be aware of the issues,
especially if they're deliberately creating loops.

All bets are off if you're using a tree walker on something that
isn't a tree; what you really need is a generalized directed-graph
walker. I guess that's what File::Find really is.

Now all it needs is the housekeeping hook that the OP was looking
for.

(But I think we've scared him off ... :-)

-- Dave Tweed



Wed, 11 Feb 2004 22:51:45 GMT  
 Traversal of a directory tree with housekeeping per directory

Dave> All bets are off if you're using a tree walker on something that
Dave> isn't a tree; what you really need is a generalized directed-graph
Dave> walker. I guess that's what File::Find really is.

Well, it *is* a tree if you ignore the symlinks!  Thus, you either
write a tree-walker by ignoring the symlinks, or a
directed-graph-walker by doing a lot of housekeeping.

Dave> Now all it needs is the housekeeping hook that the OP was looking
Dave> for.

Maybe I should fess up that I was thinking a lot about File::Find
because my upcoming column article in Linux Magazine shows a
treewalker that works as an iterator, not a callback.  Can't tip my
hand more than that, but it'll probably end up in the CPAN as my first
real module submission after a bit more polish.

I can't prepublish the article here (work for hire, ya know), but
it'll eventually show up along with the other ones at

        <http://www.stonehenge.com/merlyn/LinuxMag/>

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Thu, 12 Feb 2004 21:50:45 GMT  
 Traversal of a directory tree with housekeeping per directory

Quote:

> Well, it *is* a tree if you ignore the symlinks!  Thus, you
> either write a tree-walker by ignoring the symlinks, or a
> directed-graph-walker by doing a lot of housekeeping.

No, hard links can create loops as well.

-- Dave Tweed



Fri, 13 Feb 2004 02:09:23 GMT  
 Traversal of a directory tree with housekeeping per directory
In comp.lang.perl.moderated,

Quote:


> > Well, it *is* a tree if you ignore the symlinks!  Thus, you
> > either write a tree-walker by ignoring the symlinks, or a
> > directed-graph-walker by doing a lot of housekeeping.

> No, hard links can create loops as well.

But multiple hard links to directories are /expected/ to produce nasal
demons. That's why they require a specific option to ln, and why
they're only do-able by root. The idea is that if you do hard links of
directories, you expect breakage.

On the other hand, symlinks of directories are expected to /work/,
therefore programs that encounter them are expected to deal with them
nicely.

  -Rich

--
Rich Lafferty --------------+-----------------------------------------------
 Montreal, Quebec, Canada   |   Help save the endangered Mountain Walrus!
 http://www.lafferty.ca/    |       http://www.end.com/~jynx/walrus/



Fri, 13 Feb 2004 02:59:39 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Printing Watched Folder

2. QUESTION

3. A tree program recursing the directory tree

4. NTFS directory traversal

5. Directory traversal and symbolic links

6. preorder traversal of directories...

7. TReport Connect ???

8. Thread Safe??

9. Autoexecuting CGIs per directory

10. comparison of directory-trees with Perl 5

11. Output dynamic interactive graphical directory tree from perl to html

12. Directory tree

 

 
Powered by phpBB® Forum Software