grep-like search with multiple file output? 
Author Message
 grep-like search with multiple file output?


|Is there a way to do a grep-like search on a file -- a BIG file --
|and output to different files based on different search strings?
|
|I can do this with multiple passes with grep but that takes too
|long. Is a C program the answer? Maybe perl?

Awk, the mild-mannered data manipulation package? Could be ....

/pattern1/ { print > "file1" }
/pattern2/ { print > "file2" }
/pattern3/ { print > "file3" }

I use gawk which supports all manner of redirection like the above, but
I cannot comment on the suitability of vanilla awk.

                        Pete

PS. Perl/C would obviously run faster, but awk is faster to code.
--

PO Box 220, Whiteknights, Reading, | Phone: +44-118-9875123 ext 7594
Berkshire, RG6 6AF, United Kingdom | Fax:   +44-118-9750203                
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WWW: http://www.*-*-*.com/ ~spr96phh/pete.html Use lynx - you know you want to!



Mon, 14 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?


...

Quote:
>PS. Perl/C would obviously run faster, but awk is faster to code.

Yes, C would run faster - assembler would run faster still.

Perl (another well-known interpreted language available on many Unix systems)
would run about the same as AWK, but, as you say, is much harder to code.

************************************************************************
Pure balls. (David Spade)


          hundreds, if not thousands, of dollars, every time he posts -
************************************************************************
rwvpf wpnrrj ibf ijrfer



Mon, 14 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?

Quote:

> Is there a way to do a grep-like search on a file -- a BIG file --
> and output to different files based on different search strings?

#!/usr/bin/awk -f
/pat1/ {
    print > "out1"

Quote:
}

/pat2/ {
    print > "out2"

Quote:
}

/pat3/ {
    print > "out3"

Quote:
}
> Thanks in advance for any advice!

Hth,

--



Mon, 14 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?

Quote:

> PS. Perl/C would obviously run faster, but awk is faster to code.

Since you mention Perl:

The only thing making it slower to code (in this case) is that you
have to explicitly open your files.  Of course, that gives you the
ability to throw in some error checking, backing up existing files
etc., almost none of which I do below:

#!/store/bin/perl -wn

BEGIN {
        open OUT1, ">out1"      or die "Couldn't open out1: $!\n";
        open OUT2, ">out2"      or die "Couldn't open out2: $!\n";
        open OUT3, ">out3"      or die "Couldn't open out3: $!\n";

Quote:
}

/pat1/ && do {
    print OUT1 $_;
    next;
Quote:
};

/pat2/ && do {
    print OUT2 $_;
    next;
Quote:
};

/pat3/ && do {
    print OUT3 $_;
    next;

Quote:
};

Now that wasn't too hard, was it?

Why would you even want to do this, you ask?

Well, there's
 * Perl's speed (since it's (sorta) compiled), there's
 * Perl's wonderfully expressive regexes (for patterns more complex than
   the ones in this trivial example), and there's
 * the {*filter*}ivity of the
   language itself.

But if you're happy with awk, I'm certainly not going to beg for
converts.  Not in comp.lang.awk, anyway.

Followups set to comp.lang.perl.misc

--



Mon, 14 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?



Quote:


> ....
> >PS. Perl/C would obviously run faster, but awk is faster to code.
> Yes, C would run faster - assembler would run faster still.
> Perl (another well-known interpreted language available on many Unix
> systems) would run about the same as AWK, but, as you say, is much harder to
> code.

Let us not loose sight of the fact that in many real-world cases they'd all be
i/o constrained and the language would make no difference in the run-time. --
-----------------------------------------------------------

-----------------------------------------------------------


Wed, 16 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?


% /pattern1/ { print > "file1" }
% /pattern2/ { print > "file2" }
% /pattern3/ { print > "file3" }

% I use gawk which supports all manner of redirection like the above, but
% I cannot comment on the suitability of vanilla awk.

This is standard awk. gawk has some extensions, but not so many as many
people assume.

% PS. Perl/C would obviously run faster, but awk is faster to code.
sed might be fastest of all in terms of run-time, and I guess faster
than awk to type the script:
 /pattern1/ w file1
 /pattern2/ w file2
 /pattern3/ w file3
 d
--

Patrick TJ McPhee
East York  Canada



Thu, 17 Feb 2000 03:00:00 GMT  
 grep-like search with multiple file output?

Quote:

>> Next question: How does one generalize this script to imitate AWK's
>> associative arrays?
>Why imitate them?  Perl has those too.  It just calls them hashes.

Not surprising considering perl's various debts to awk. "One of the
many cool things about Perl is that it is (at least) a semantic
superset of awk."--Randall L. Schwartz, "Learning Perl" ORA.

Quote:
>>   pattern[++i] = "second pattern";outfile[i]=sprintf("out%05d",i)
>Hmm.  Those seem to be arrays, not hashes.  (numerical vs. string indices)

Don't quote me, but I believe awk always indexes on the string
representations. The following short program (MKS AWK) seems to bear
me out since all it prints is "456".

BEGIN {
  test[1] = "123"
  test["1"] = "456"
  for (i in test) print test[i]
  exit

Quote:
}

Now whether the implementation uses hashes, a linked list, or whatever
is another issue entirely. MKS seems to use an LIFO arrangement.
Thompson Automation insists on sorting at the time--one reason why
there versions compiled to massive, slower beasts.

Quote:
>I think I'd use a pattern -> filehandle mapping with a hash here:
>(tested code)

This code is beyond my limited perl knowledge, so let's get back to
AWK. <g>

--
Maynard Hogg
#306, 4-30-10 Yoga, Setagaya-ku, Tokyo, Japan 158
Fax: +81-3-3700-7399

http://www2.gol.com/users/maynard/
http://www2.gol.com/users/maynard/j-learning.htm (Japanese)
Unsolicited commercial electronic mail sent to this address will be
copyedited at a cost of US$200/hour (half-hour minimum).



Thu, 24 Feb 2000 03:00:00 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Parsing CSV file outputting desired information into multiple unique files

2. Searching program for sorting cvs log output of several files

3. Use multiple files to form one output

4. m4 and multiple output files?

5. Searching for Windows NT/95 backup utitility that keeps multiple versions of files

6. Multiple File Output

7. Grep-like text search routine NEEDED!!!

8. Writing Output to Multiple Files

9. how to output to multiple files?

10. Who likes Info files?

11. how to grep previous line in file ?

 

 
Powered by phpBB® Forum Software