sort "items" paragraphwise 
Author Message
 sort "items" paragraphwise


Quote:

>I need to sort a file "paragraphwise" (under W2K) - or better said:
>sort the specific points and leave the rest after that point.

>The windows sort command would be fine, but sorts only linewise :-(
>Are the unix tools or other (console) programs able to do the
>following:

>Empty lines can be ignored - deleted or better just be left alone.

>Example input
>=============
>\item[test] This is a test.

>A wonderful line, which belongs to test.

>\item[foo] foo explanation

>\item[bar] bar explanation

>Example output should be
>========================
>\item[bar] bar explanation

>\item[foo] foo explanation

>\item[test] This is a test.

>A wonderful line, which belongs to test.

>I'd prefer a "small" solution with sed/awk or some of the unix
>commands.

You are right - this isn't really an editors question, but is easily solved
in (my favorite tool) AWK.  The only issue is that purists will point out
that the feature that this depends on (ability to sort arrays internally)
isn't in "classic AWK" (e.g., awk or nawk).  So, purists will come out with
weird solutions in pure shell that involve tacking line numbers on and so
on.  My recommendation is to get (at least) GAWK - TAWK if you can afford
it.  I give below a GAWK solution, using GAWK's built-in "asort" function,
but note that it would be even easier in TAWK, where the sorting comes
for free.

A GAWK solution (untested):

function p() {
        if (!s) return
        x[++cnt] = s
        }
/^\\item/ { p();s = $0;next }
{ s = s "\n" $0 }
END     {
        p()
        asort(x)
        for (i=1; i<=cnt; i++)
            print x[i]
        }



Wed, 16 Jun 2004 02:15:55 GMT  
 sort "items" paragraphwise


...

Quote:
>Simpler than AWK (tested on Cygwin) :

>/tmp$ cat foo.in
>\item[test] This is a test.

>A wonderful line, which belongs to test.

>\item[foo] foo explanation

>\item[bar] bar explanation
>/tmp$ sort -t ']' -k 1,1 foo.in >foo.out
>/tmp$ cat foo.out

>A wonderful line, which belongs to test.
>\item[bar] bar explanation
>\item[foo] foo explanation
>\item[test] This is a test.
>/tmp$

Clearly wrong, since "A wonderful line, which belongs to test." belongs to
test, and your output has it at the top, not down at the bottom (under
test, to which it clearly belongs) where it should be.

The whole point is that you want to support multi-line records, which sort
does not do.



Wed, 16 Jun 2004 06:52:23 GMT  
 sort "items" paragraphwise

Quote:

> I need to sort a file "paragraphwise" (under W2K) - or better said:
> sort the specific points and leave the rest after that point.

Ok, Microsoft Windows 2000 it is!

C:\>ver

Microsoft Windows 2000 [Version 5.00.2195]

C:\>

Quote:
> The windows sort command would be fine, but sorts only linewise :-(
> Are the unix tools or other (console) programs able to do the
> following:

> Empty lines can be ignored - deleted or better just be left alone.

> Example input
> =============
> \item[test] This is a test.

> A wonderful line, which belongs to test.

> \item[foo] foo explanation

> \item[bar] bar explanation

> Example output should be
> ========================
> \item[bar] bar explanation

> \item[foo] foo explanation

> \item[test] This is a test.

> A wonderful line, which belongs to test.

> I'd prefer a "small" solution with sed/awk or some of the unix
> commands. Perl would be also an option if the others won't do the job.

It's most easily done in Perl. If the key values (i.e., the substrings
enclosed within square brackets) are unique, then you can slurp the data
into a hash (an associative array) using the keys as, well, keys, and
making the whole sections of text the hash elements.

In the Perl script named sortsect.pl, the entire input file is parsed
into sections using a single, global regular expression pattern match
operation.

C:\>cat sortsect.pl
#!perl -nw0777
%sect = reverse m/^(\\item\[([^]]*)\](?:.*\n(?!\\item))*(?:.*\n))/gm;
print $sect{$_} for sort keys %sect;

C:\>type input.txt
\item[test] This is a test.

A wonderful line, which belongs to test.

\item[foo] foo explanation

\item[bar] bar explanation

C:\>perl sortsect.pl input.txt
\item[bar] bar explanation

\item[foo] foo explanation

\item[test] This is a test.

A wonderful line, which belongs to test.

C:\>

Allowing for non-unique keys would not be difficult--you just wouldn't
be able to use a simple hash as I have demonstrated here.

Quote:
> But I like to keep it simple and yet have refused to install Perl to
> do just a little text formatting.

So install Perl to do a *lot* of text formatting! ;-)

Perl represents the state of the art in text processing, so it's silly to
avoid using it for that purpose. You might refuse to install Perl for good
reasons, but this isn't one of them.

Quote:
> I know that this posting isn't strictly an "editors" subject, but
> didn't knew a better newsgroup as sed is also covered here. I'm glad
> for any pointers :-)

HTH. HAND.

--
Jim Monty

Tempe, Arizona USA



Wed, 16 Jun 2004 09:45:52 GMT  
 sort "items" paragraphwise

...

Quote:
>Perl represents the state of the art in text processing, so it's silly to
>avoid using it for that purpose. You might refuse to install Perl for good
>reasons, but this isn't one of them.

BS!

As I've made clear many a time, P***:
        a) Combines the flexibility of assembler language with the
           readability of assembler language.
        b) is completely OT in this NG.

Anything else I can help you with?



Wed, 16 Jun 2004 11:11:59 GMT  
 sort "items" paragraphwise

thanks for your additional suggestion :-)

Quote:
>> I need to sort a file "paragraphwise" (under W2K) - or better said:
>> sort the specific points and leave the rest after that point.

>Ok, Microsoft Windows 2000 it is!

Yep. Isn't W2K the usual shortcut?

Quote:
>> I'd prefer a "small" solution with sed/awk or some of the unix
>> commands. Perl would be also an option if the others won't do the job.
>It's most easily done in Perl.
>C:\>cat sortsect.pl
>#!perl -nw0777
>%sect = reverse m/^(\\item\[([^]]*)\](?:.*\n(?!\\item))*(?:.*\n))/gm;
>print $sect{$_} for sort keys %sect;

That's pretty short, but not that easy at a first glance  ;-)
Thanks for the additonal explanations!

Quote:
>> But I like to keep it simple and yet have refused to install Perl to
>> do just a little text formatting.

>So install Perl to do a *lot* of text formatting! ;-)

:-)

Quote:
>Perl represents the state of the art in text processing, so it's silly to
>avoid using it for that purpose. You might refuse to install Perl for good
>reasons, but this isn't one of them.

Avoiding Perl has more reasons for me. Ideally I want the tool to fit
easily on a floppy. Perl is much more than "just" a text formatter. So
I'd prefer the smaller tools if they can do the job. Another reason is
that the Perl syntax seems not as easy/fast to learn as a sed or awk
for example.

I forgot to tell that using Vim (I guess it's possible to use Vim from
the command line as a filter like sed too?!) would be an option also.
My guess was that the sort command from the GNU textutils (I have the
djgpp port) could do the job together with sed or another command.

The awk solution from Kenny is great too. I'll use this script. But
thanks for the additional perl solution (I'll keep it for future
reference).

BTW, as far as I know sed belongs to this newsgroup. Are the textutils
covered here too? Or should I post to another newsgroup for textutils
specific questions. I knew that my posting was not really on topic,
but not really off topic also and I know that many here know
sed/awk/perl/vim or other programs which might come to mind to do the
job. So I thought it would be o.k. to ask here for a pointer in which
direction to look :-)

Greetings from Cologne
Peter



Wed, 16 Jun 2004 21:04:09 GMT  
 sort "items" paragraphwise

Kenny> As I've made clear many a time, P***:
Kenny>       a) Combines the flexibility of assembler language with the
Kenny>          readability of assembler language.
Kenny>       b) is completely OT in this NG.

Perhaps some comp.lang.awk bigots don't want Perl even mentioned in
the Awk group, but this was also posted to comp.editors.  "this NG" is
therefore ambiguous, and incorrect.  Cut the guy some slack, eh?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Thu, 17 Jun 2004 00:12:59 GMT  
 sort "items" paragraphwise


Quote:

>Kenny> As I've made clear many a time, P***:
>Kenny>   a) Combines the flexibility of assembler language with the
>Kenny>      readability of assembler language.
>Kenny>   b) is completely OT in this NG.

>Perhaps some comp.lang.awk bigots don't want Perl even mentioned in
>the Awk group, but this was also posted to comp.editors.  "this NG" is
>therefore ambiguous, and incorrect.  Cut the guy some slack, eh?

It is no more OnTopic in editors than in AWK.  Or does Perl also claim to
be an editor these days?  (As well a waxing your floor and shining your
shoes?)

Besides, it is still OT in AWK, which is where I'm reading it.  It's not
really my problem which other NGs the OP decided to post to.  Or, to put it
another way, if someone (Jim) does believe it it is OT in editors, he
should have posted only to editors, removing AWK from the NG list.  He
could even have added comp.lang.perl (or whatever it is this week).



Thu, 17 Jun 2004 00:41:56 GMT  
 sort "items" paragraphwise

Quote:



> >I need to sort a file "paragraphwise" (under W2K) - or better said:
> >sort the specific points and leave the rest after that point.

> >The windows sort command would be fine, but sorts only linewise :-(
> >Are the unix tools or other (console) programs able to do the
> >following:

> >Empty lines can be ignored - deleted or better just be left alone.

> >Example input
> >=============
> >\item[test] This is a test.

> >A wonderful line, which belongs to test.

> >\item[foo] foo explanation

> >\item[bar] bar explanation

> >Example output should be
> >========================
> >\item[bar] bar explanation

> >\item[foo] foo explanation

> >\item[test] This is a test.

> >A wonderful line, which belongs to test.

> >I'd prefer a "small" solution with sed/awk or some of the unix
> >commands.

It's not a "small" solution, but Word Perfect 9 ( and, maybe,
older versiona) can sort by, among other things, first word
in a paragraph. Word Perfect is available for Windows,
Macintosh, and Linux.

Martin Cohen



Thu, 17 Jun 2004 04:07:19 GMT  
 sort "items" paragraphwise

Quote:

> It's not a "small" solution, but Word Perfect 9 ( and, maybe,
> older versiona) can sort by, among other things, first word
> in a paragraph. Word Perfect is available for Windows,
> Macintosh, and Linux.

Macintosh? The last WP version for Mac I can recall is 3.5 (more or
less equivalent to 5.1 for DOS ...) Is it still for sale? Does it work
on newer Macs?

--
Giuseppe "Oblomov" Bilotta

Axiom I of the Giuseppe Bilotta
theory of IT:
Anything is better than MS



Thu, 17 Jun 2004 23:50:02 GMT  
 sort "items" paragraphwise

Quote:


> > It's not a "small" solution, but Word Perfect 9 ( and, maybe,
> > older versiona) can sort by, among other things, first word
> > in a paragraph. Word Perfect is available for Windows,
> > Macintosh, and Linux.

> Macintosh? The last WP version for Mac I can recall is 3.5 (more or
> less equivalent to 5.1 for DOS ...) Is it still for sale? Does it work
> on newer Macs?

You may be right - I just did a Google search for Word Perfect and
Macintosh and, when I got hits, assumed that it was OK.

Anyway, there are two ways I might do the assignment using gawk and
unix:

1. With gawk, make the field separator be a newline and the record
separator
be a null line ("\n\n"). Read in the records, sort them internally,
and output them with a blank line separating them.

2. Find a character that is not in any of the lines. Read in the
records.
If a record and its predecessor are not null, concatenate the special
character
and the record to the previous record. Whan a record is null, write out
the
previous (possibly cancatenated to) record. Sort the resulting file.
Read in the sorted file and break lines at the special character.

Solution 2 has the advantages that (1) it can handle files too large to
fit into gawk's memory and (2) it can easily be done with any gawk.

Martin Cohen



Fri, 18 Jun 2004 02:33:14 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

2. "copy" and drag a canvas item: strange "current" item

3. "Advanced" Shortcuts menu items

4. FileDrop[Combo] - Adding "Empty" item

5. Setting "checked" property on menu item

6. Eiffel "Gocha" #4 - Array items

7. New canvas item "buffer"

8. AGAIN: New canvas item "buffer".

9. New canvas item "buffer" (long)

10. SORTS and "I" - pt3

11. SORTS and "I" - pt2

12. "Lazy"-sort ?!

 

 
Powered by phpBB® Forum Software