Vallo> I'm having newsgroups file which I want to correct using sed. Actually
Vallo> it's done already by using sed, awk, uniq and ed, but that's not
Vallo> perfect :)
Vallo> File format is simple: newsgroups name, following at least one space or
Vallo> tab or both, then following newsgroup topic which is free form text
Vallo> until the end of line. There are lots of duplicate lines, duplicate in
Vallo> sense of newsgroup name, not topic. I'll want to remove duplicate
Vallo> lines, sorting and formatting afterwards is simple, e.g
Vallo> comp.os.linux.announce Announcements blabla (Moderated)
Vallo> comp.os.linux.announce Announcements blabla (Moderated) (Moderated)
Vallo> comp.os.linux.x blabla.
Vallo> comp.os.linux.x blabla
Vallo> I've got to the stage where the output contained unique newsgroups
Vallo> lines without topic tail, but I can't get my mind around holding,
Vallo> restoring and printing the original line if the line is unique.
Vallo> What is better for such manipulation: sed, awk, perl, whatever ..?
I don't know about "better", but one line in Perl is gonna be hard to beat:
perl -ane 'print unless $seen{$F[0]}++' <input >output
That'll print the first one. If you want to print the last one instead,
and you don't mind sorting the output by newsgroup name:
perl -ane '$item{$F[0]} = $_; END { print $item{$_} for sort keys %item }' \
<input >output
If you want to hang on to the original definition order, still printing the
last one of each:
perl -ane '$item{$F[0]} = $_; $line{$F[0]} = $.; " \
-e 'END {print $item{$_} for sort {$line{$a} <=> $line{$b}} keys %item}' \
<input >output
I bet you can do all three of these in awk. The first one will
probably take fewer keystrokes in awk, but the last two most certainly
will take more. sed won't have enough state memory to do this
conveniently.
print "Just another Perl hacker,"
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!