search with a twist 
Author Message
 search with a twist

Hello,
   I have two large files each of which is an alphabetized list of
strings, one string on each line. In Perl, Is there any way to see
what is common to both lists, without loading them into memory, but
through reading them one by one. Any help would be appreciated.

Sasha Gusev



Fri, 28 Jan 2005 22:04:30 GMT  
 search with a twist

Quote:

> Hello,
>    I have two large files each of which is an alphabetized list of
> strings, one string on each line. In Perl, Is there any way to see
> what is common to both lists, without loading them into memory, but
> through reading them one by one. Any help would be appreciated.

Consider the case where the first file has lots of items, but the
second only has one line in it.  Read the line from the second file,
and search for it in the first file, stopping as soon as the current
line in the first file is greater than the line from file two.

If you found a match before then, take some appropriate action
(I don't know how you plan to deal with duplicate strings).

To handle the case where the second file has lots of lines, just
reverse the roles of the first and second files. Repeat the
entire process until one of the files has reached an eof.

As you can see, this is more of an algorithm question than
a Perl question.  What sort of perl code have you got so far?

--
Joe Schaefer    "Arguments are to be avoided; they are always vulgar and often
                                         convincing."
                                               -- Oscar Wilde



Fri, 28 Jan 2005 22:35:55 GMT  
 search with a twist

Quote:

>    I have two large files each of which is an alphabetized list of
> strings, one string on each line. In Perl, Is there any way to see
> what is common to both lists, without loading them into memory, but
> through reading them one by one. Any help would be appreciated.

#!/usr/bin/perl -w
use strict;

open F1, $ARGV[0] or die "Cannot open $ARGV[0]: $!";
open F2, $ARGV[1] or die "Cannot open $ARGV[1]: $!";

my $first  = <F1>;
my $second = <F2>;

while ( not eof(F1) and not eof(F2) ) {
    if ( $first lt $second ) {
        $first = <F1>;
        }
    elsif ( $second lt $first ) {
        $second = <F2>;
        }
    else {
        print $first;
        $first  = <F1>;
        $second = <F2>;
        }
    }

__END__

John
--
use Perl;
program
fulfillment



Fri, 28 Jan 2005 23:09:37 GMT  
 search with a twist

Quote:

>   I have two large files each of which is an alphabetized list of
>strings, one string on each line. In Perl, Is there any way to see
>what is common to both lists, without loading them into memory, but
>through reading them one by one. Any help would be appreciated.

You mean they're sorted?

You can read from both files at the same time. Start with one line from
each.

Then, repeat:

 * if both lines are the same, you found one. Mark the solution, and
read a line from each file.

 * otherwise, read a line from the file where the string is the
"smallest".

Repeat until one of the files is exhausted.

--
        Bart.



Fri, 28 Jan 2005 23:27:32 GMT  
 search with a twist

Quote:

> >   I have two large files each of which is an alphabetized list of
> >strings, one string on each line. In Perl, Is there any way to see
> >what is common to both lists, without loading them into memory, but
> >through reading them one by one. Any help would be appreciated.

> You mean they're sorted?

> You can read from both files at the same time. Start with one line from
> each.

> Then, repeat:

>  * if both lines are the same, you found one. Mark the solution, and
> read a line from each file.

>  * otherwise, read a line from the file where the string is the
> "smallest".

> Repeat until one of the files is exhausted.

The unix utility "comm" does that, which may be another alternative.

Anno



Sat, 29 Jan 2005 08:56:27 GMT  
 search with a twist

Quote:

> The unix utility "comm" does that, which may be another
> alternative.

And you can find a Perl-implementation of "comm" at

<http://www.perl.com/language/ppt/src/comm/index.html>

--
felix



Sat, 29 Jan 2005 09:37:29 GMT  
 search with a twist

Quote:


> > Hello,
> >    I have two large files each of which is an alphabetized list of
> > strings, one string on each line. In Perl, Is there any way to see
> > what is common to both lists, without loading them into memory, but
> > through reading them one by one. Any help would be appreciated.

> Consider the case where the first file has lots of items, but the
> second only has one line in it.  Read the line from the second file,
> and search for it in the first file, stopping as soon as the current
> line in the first file is greater than the line from file two.

> If you found a match before then, take some appropriate action
> (I don't know how you plan to deal with duplicate strings).

> To handle the case where the second file has lots of lines, just
> reverse the roles of the first and second files. Repeat the
> entire process until one of the files has reached an eof.

> As you can see, this is more of an algorithm question than
> a Perl question.  What sort of perl code have you got so far?

Thanks, this is what I was doing, except I had the large and small
files switched and was getting bugs.


Sat, 29 Jan 2005 14:12:11 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Because twisted minds think twisted things...

2. How about a little twist on the shredding?

3. char to number conversion w/ a twist

4. s2p twisted

5. Sourcing with a twist of Parsing ?

6. Random Numbers with a Twist

7. A problematic twist with =~ and regex

8. Counter with an NT twist

9. Alphabetizing with a twist

10. Perl-search the web quickly with Search Spaniel

11. ANNOUNCE: WWW::Search 1.009 available (web-search-engine API for Perl)

 

 
Powered by phpBB® Forum Software