? text-processing examples 
Author Message
 ? text-processing examples

I'd like to find some tutorials/examples which would guide me to
appropriate techniques to do some text-processing under poplog.

The text IO would be via files.

I'm guessing that ved was written in pop-11 ?

Thanks,
  Chris Glur.



Thu, 20 Nov 2003 01:35:20 GMT  
 ? text-processing examples

Quote:

> I'd like to find some tutorials/examples which would guide me to
> appropriate techniques to do some text-processing under

HELP * FORMAT discusses formatting tools within ved.  There is a package
for editing{*filter*}files (for eventual formatting by{*filter*}itself) at
http://www.*-*-*.com/

Quote:
> I'm guessing that ved was written in pop-11 ?

Absolutely.  Look in $usepop/pop/ved/src/ . There's a section of HELP *
VED that gives an introduction to programming ved.

Stephen Isard



Fri, 21 Nov 2003 18:17:37 GMT  
 ? text-processing examples
[To reply replace "Aaron.Sloman.XX" with "A.Sloman"]

Quote:

> Date: 2 Jun 2001 17:35:20 GMT

> I'd like to find some tutorials/examples which would guide me to
> appropriate techniques to do some text-processing under poplog.

> The text IO would be via files.

There are facilities for reading writing, appending to files, in
different modes, and a fair number of string manipulating mechanisms
in pop11, mostly described in REF STRINGS, and stuff for fileio in
 REF SYSIO, with sys_file_match (described in REF SYSUTIL) for
pattern-based exploration of directories, and a regular expression
matcher described in REF REGEXP plus some extra string facilities
posted about a year ago by Steve Leach, now available in
    http://www.cs.bham.ac.uk/research/poplog/string_ops/
    http://www.cs.bham.ac.uk/research/poplog/string_ops.tar.gz
        6937 bytes
(probably also at www.poplog.org, with additional goodies.)

If you want to do manipulations not at the character level but at the
leve of words and numbers, then incharitem is extremely useful. Give it
a character repeater (such as produced by discin) and then you'll get
back an item repeater. I.e.
    vars procedure(char_rep, item_rep);
        ;;; declare them as procedure identifiers for efficiency.

    discin('myfile.txt') -> char_rep;

    incharitem(char_rep) -> item_rep;

Then char_rep is a procedure which, each time it is called returns the
next character from the file myfile.txt, and item_rep is a procedure
which each time it is called, repeatedly invokes char_rep until it has
enough characters to return a number (integer, ratio, decimal, or
complex number) or a word or a string.

If you'd prefer to manipulate the file as a list of text items do

    vars file_text = pdtolist(item_rep);

See information on dynamic (lazily evaluated lists) in REF LISTS
or HELP PDTOLIST, or see Chapter 6 of the Pop-11 primer.

It's going to be hard to know what to point you at unless you can
be a bit more specific about some of the things you might want to
do.

A colleage was having trouble with a file containing 8-bit (graphic)
characters so I showed him how this procedure could transform those
characters into something else (or omit them), while leaving other
characters unchanged.

        define transform_file(inputfile, outputfile);
                lvars
                        produce = discin(inputfile),
                        consume = discout(outputfile),
                        char;

                repeat
                        produce() -> char;
                        quitif (char == termin);
                        if char < 128 then consume(char)
                        else
                                ;;; whatever you want to go out, e.g. nothing,
                                ;;; or some translation
                        endif;
                endrepeat;
                consume(termin);        ;;; to flush output buffer
        enddefine;

Then
        transform_file('testin', 'testout');

will read the file called 'testin' and write the transformed
version to 'testout'. If it is a multi-megabyte file or you need to do
it often, some simple optimisations are possible to speed that up.
(See HELP EFFICIENCY)

A program by Riccardo Poli that reads in a file of text, builds a
table of transition probabilities, then uses it to produce a sort of
parody of the original is described here:
    http://www.cs.bham.ac.uk/research/poplog/help/summarise
        4302 bytes
with the pop-11 code in here
    http://www.cs.bham.ac.uk/research/poplog/lib/summarise.p
        6735 bytes

REF DISCAPPEND describes another utility

    discappend(<filename or device>) -> <character_consumer>;

The character consumer is a procedure that can be repeatedly applied to
characters (8 bit integers), which will be appended to the original
file (flush and close it by applying the consumer to termin).

The code for discappend can be inspected in VED
    ENTER showlib discappend

(I have just noticed that HELP DISCAPPEND is out of date.)

Quote:
> I'm guessing that ved was written in pop-11 ?

Yes, The basic sources for Ved are in
    $usepop/pop/ved/src/

and autoloadable and other extensions in

    $usepop/pop/lib/ved/

though Xved uses a lot of X facilities, as you can see in

    $usepop/pop/x/ved/

In the files
    http://www.cs.bham.ac.uk/research/poplog/auto/mimencode.p
        3647 bytes
    http://www.cs.bham.ac.uk/research/poplog/auto/mimedecode.p
        3456 bytes

You'll find code to read in a file text file and write out a mimencoded
version, and code to read in a mimencoded file and write out a
mimedecoded version. Both files have documentation near the top
of the file. They are used in files to read and write portions
of a ved file with decoding or encoding.

    http://www.cs.bham.ac.uk/research/poplog/auto/ved_writemime.p
        861 bytes
    http://www.cs.bham.ac.uk/research/poplog/auto/ved_readmime.p
        873 bytes

And the latter is used in a ved utility to prepare mime attachments
for posting, in
    http://www.cs.bham.ac.uk/research/poplog/auto/ved_attach.p
        3949 bytes

as described in
    http://www.cs.bham.ac.uk/research/poplog/help/ved_attach
        17202 bytes

All that is packaged in
    http://www.cs.bham.ac.uk/research/poplog/attach.tar.gz
        10981 bytes

There are also libraries related to parsing, which might be useful
for some applications.

I hope that helps and is not too overwhelming.

Aaron
====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK

PAPERS: http://www.cs.bham.ac.uk/research/cogaff/
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html



Fri, 21 Nov 2003 19:11:12 GMT  
 ? text-processing examples
---- snip ---
Quote:
>     vars procedure(char_rep, item_rep);
>         ;;; declare them as procedure identifiers for efficiency.

>     discin('myfile.txt') -> char_rep;

>     incharitem(char_rep) -> item_rep;

> Then char_rep is a procedure which, each time it is called returns the
> next character from the file myfile.txt ....

My manual/single-step call to discin apparently re-initialises.
ie. repeated reading of the *first* byte.
Apparently the 'counters' of repeaters are not global/persistent.
Within a single invocation (as per code below) the stepping is observed.
OK, since there is no initialisation code for discin: it must be auto-initialised
on 'start-run' ?

---- snip ---

Quote:

> It's going to be hard to know what to point you at unless you can
> be a bit more specific about some of the things you might want to
> do.

Well, my initial polog experiments overly impressed me until I realised
that elements of: [man in boat] ; are tokens (items for poplog), and NOT
'text-strings'. Unfortunately, human knowledge is embedded in text-strings.
This 'natural stuff', (which I've warned we shouldn't follow as we don't
follow flapping and feathers to advance aircraft design) has to be handled.
   So the abundant dirty-natural-text needs to be cleaned-up/formalised
to be processed to add value. I'm not much interested/optimistic about
natural language analysis, but favour man-machine co-operation.  

I want to be able to read text files from various sources/styles/formats,
with some guidance/assistance from string-search facilities.  
And manually extract text-sections to various different holders/files;
ie. use human evaluation.  

Modifying the code below, I was not able to send the alternate (to file
'testout') text strings to screen.
     char =>   ;;; produces <numeric value of char>

How would I split a text string eg:
                        if char > 128 then consume(char)  ;;; post to non-ascii file
                        else  <write to screen>
                endif;

My substantial search (which, for the first time got me to the primer), fails
to show me how the write a character to screen.

Quote:
>    define transform_file(inputfile, outputfile);
>            lvars
>                    produce = discin(inputfile),
>                    consume = discout(outputfile),
>                    char;

>            repeat
>                    produce() -> char;
>                    quitif (char == termin);
>                    if char < 128 then consume(char)
>                    else
>                            ;;; whatever you want to go out, e.g. nothing,
>                            ;;; or some translation
>                    endif;
>            endrepeat;
>            consume(termin);        ;;; to flush output buffer
>    enddefine;

> Then
>    transform_file('testin', 'testout');

> will read the file called 'testin' and write the transformed
> version to 'testout'.

All this automagically behind the scene eg. opening and closing files, is
frightening to me .  
Apparently that's what hi-level programming is about ?

--- snip ---

Quote:
> I hope that helps and is not too overwhelming.

A bit of both: no pain, no gain.

Thanks,
  Chris Glur.



Tue, 25 Nov 2003 16:36:58 GMT  
 ? text-processing examples


Quote:

> Well, my initial polog experiments overly impressed me until I realised
> that elements of: [man in boat] ; are tokens (items for poplog), and NOT
> 'text-strings'. Unfortunately, human knowledge is embedded in text-strings.

<blinks> It is?

I'm not sure what you mean by that. *Some* human knowledge is encoded
in machine-readable text-strings. An aweful lot isn't.

Quote:
> I want to be able to read text files from various sources/styles/formats,
> with some guidance/assistance from string-search facilities.  
> And manually extract text-sections to various different holders/files;
> ie. use human evaluation.  

> Modifying the code below, I was not able to send the alternate (to file
> 'testout') text strings to screen.
>      char =>   ;;; produces <numeric value of char>

`cucharout(aCharacter)` sends the character value [in] `aCharacter` to
the current output stream.

If `myOut` is a variable bound to an output stream (a "character consumer"
in Pop terminology) then, similarly, `myOut(aCharacter)` sends that character
to that output stream.

The default value of `cucharout` sends characters to the "standard output"
than Poplog started with. (It need not be "the screen", since Poplog will
work fine even if all you have is a teletype, or if the output has been
redirected, Unix-style, to a file, socket, serial line, or DAC).

In Ved, working in immediate mode, the default `cucharout` sends characters
to the current output window.

Quote:
>>        define transform_file(inputfile, outputfile);
>>                lvars
>>                        produce = discin(inputfile),
>>                        consume = discout(outputfile),
>>                        char;

>>                repeat
>>                        produce() -> char;
>>                        quitif (char == termin);
>>                        if char < 128 then consume(char)
>>                        else
>>                                ;;; whatever you want to go out, e.g. nothing,
>>                                ;;; or some translation
>>                        endif;
>>                endrepeat;
>>                consume(termin);        ;;; to flush output buffer
>>        enddefine;

>> Then
>>        transform_file('testin', 'testout');

>> will read the file called 'testin' and write the transformed
>> version to 'testout'.
> All this automagically behind the scene eg. opening and closing files, is
> frightening to me .  
> Apparently that's what hi-level programming is about ?

That's what *programming* is about: hiding irrelevant details.

Whichy bit was frightening, and can you say why? It may help us to suggest
a good way for you to get into this.

Are any Pop11 books still in print?

--
Chris "I don't like `cucharout`, as it happens" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html



Tue, 25 Nov 2003 18:35:58 GMT  
 ? text-processing examples

Quote:

> >     ;discin('myfile.txt') -> char_rep

> >     incharitem(char_rep) -> item_rep;

> > Then char_rep is a procedure which, each time it is called returns the
> > next character from the file myfile.txt ....
> My manual/single-step call to discin apparently re-initialises.
> ie. repeated reading of the *first* byte.
> Apparently the 'counters' of repeaters are not global/persistent.

Every time you call discin, you create a *new* repeater.  The counter
for each repeater is persistent.  So if you do

discin('myfile.txt') -> char_rep1;
discin('myfile.txt') -> char_rep2;

You get two independent character repeaters, each with its own memory,
that can be at the same or different points in the file at any given
moment, depending on how many times each has been called.

Of course if you do

discin('myfile.txt') -> char_rep;

<other stuff>

discin('myfile.txt') -> char_rep;

Then the second assignment to char_rep replaces the first character
repeater with a new one.  The original character repeater no longer has
the name char_rep.  Unless you somehow gave it another name before the
second call to discin (e.g., char_rep -> old_char_rep;) then it will
have no name at all, and you will have no way of running it, and it will
eventually get picked up by the garbage collector.

Hope that helps.

Stephen Isard



Tue, 25 Nov 2003 19:04:02 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. higher order functions: text processing real-world example

2. Most important text processing examples

3. formatting enhancement idea Re: Most important text processing examples

4. Processing XML with PLT Scheme (looking for help or example)

5. New book on front-end design processes by example

6. New book on front-end design processes by example

7. Examples of HTML Processing applications / code?

8. Newbie question - list processing examples

9. Q: multi-thread or processing examples for OpenMCL?

10. Switch (argument) processing examples

11. Example Code for Proc Processing

12. How to process blocks of text

 

 
Powered by phpBB® Forum Software