A useful example for Perl (was Re: Which language can write this but Perl) 
Author Message
 A useful example for Perl (was Re: Which language can write this but Perl)

So, looking for a useful example of Perl?
I will propose some specifications.
I am not a Perl programmer, so I cannot propose a solution...

In some programs, you can send output to another before seeing
the results.  Let's suppose the Program is this other program.
In non-English languages, there exists "foreign characters" that cannot
be sent via internet because they are 8 bits.  But, they can be displayed
either as an 8 bit character local to receiving terminal, or
as an output string (an escape sequence that will display by
switching mode).

Suppose input to be catched is column 1, and output is column 2:


c,  [
e'  &

I used 7-bit characters as output, but they could be longer strings.
Input are actual examples, but the codes would include about 20
lines in French, about 4 in German, Spanish, Danish, Norvegian, etc.

So, how to do that in Perl?

--

Mes opinions sont personnelles, pas celles de mon employeur.
My opinions are mine, not those of my employer.
Looking for BEAUREGARD, JARRET, JAREST, VINCENT (I'm also on soc.roots)



Fri, 26 Aug 1994 06:49:10 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)
Denis Beauregard:
   In non-English languages, there exists "foreign characters" that
   cannot be sent via internet because they are 8 bits.  But, they can
   be displayed either as an 8 bit character local to receiving
   terminal, or as an output string (an escape sequence that will
   display by switching mode).

Funny, I have exactly the opposite problem.  There's one machine up on
internet, which I have access to, which insists on sending everything
out even parity.  This is ok if I just want to display the text, but
if I bring it up in emacs it's impossible to read.

Anyways, I've found no options to rlogin or telnet to get them to
strip the 8th bit.  Nor have I found any unix filters to do it for me.
[Even tr doesn't seem to work.]  I've written my own little filter for
this, but I consider this situation rather odd.

But the point is, if you've got a way of getting a 7 bit filter on an
incoming internet connection, I'm in a position to envy you.

--



Fri, 26 Aug 1994 07:24:53 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)
: So, looking for a useful example of Perl?
: I will propose some specifications.
: I am not a Perl programmer, so I cannot propose a solution...
:
: In some programs, you can send output to another before seeing
: the results.  Let's suppose the Program is this other program.
: In non-English languages, there exists "foreign characters" that cannot
: be sent via internet because they are 8 bits.  But, they can be displayed
: either as an 8 bit character local to receiving terminal, or
: as an output string (an escape sequence that will display by
: switching mode).
:
: Suppose input to be catched is column 1, and output is column 2:
:

: c,  [
: e'  &
:
: I used 7-bit characters as output, but they could be longer strings.
: Input are actual examples, but the codes would include about 20
: lines in French, about 4 in German, Spanish, Danish, Norvegian, etc.
:
: So, how to do that in Perl?

I presume you're using "a^" etc. to represent a single 8-bit
character.  I will do likewise.  Perl is 8-bit clean, so you can just
put the substitutions you want:


    s/c,/[/g;
    s/e'/&/g;

(Note that translations from a single character to a single character
could be handled with one pass of a tr///.  Substituting multiple
characters requires s/// though.)

Alternately, if you just want one s/// pass over the string, you can define
an associative array containing the translations, and do the substitution
on a character class containing all characters with the 8th bit set:


    $xc{"c,"} = '[';
    $xc{"e'"} = '&';
    ...
    s/([\200-\377])/$xc{$1}/eg;

Depending on the frequency of characters to be translated, this is probably
the fasted solution in Perl.

Other solutions are possible.  If you load an ordinary array instead of
an associative array, and include the mapping of all 256 characters, you
can do a sliced array lookup:


That's a more APL-ish solution.

Larry Wall



Sat, 27 Aug 1994 03:11:18 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)
: Other solutions are possible.  If you load an ordinary array instead of
: an associative array, and include the mapping of all 256 characters, you
: can do a sliced array lookup:
:

:
: That's a more APL-ish solution.

It's also an inadequate solution.  The above can't do anything tr///
doesn't do.  To allow for single to multiple character mapping I should
have said


Larry



Sat, 27 Aug 1994 10:35:30 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)

                                            ^^^
I like J. It's the only language I know with a "gagging smiley with moustache"
operator. If they wanted to preserve the basic nature of APL, they succeeded
famously!
--
-- Peter da Silva,  Ferranti International Controls Corporation
-- Sugar Land, TX  77487-5012;  +1 713 274 5180
-- "Have you hugged your wolf today?"


Sun, 28 Aug 1994 08:11:16 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)


peter> I like J. It's the only language I know with a "gagging smiley
peter> with moustache" operator.

Actually, it kinda looks like a dedicated smiley creation language.
I've been exposed to dedicated oil-well measurement languages, but
smileys -- beats me.

/Lars
--

CS Dept., Aalborg Univ., DENMARK. | these things.  -- Calvin



Thu, 01 Sep 1994 02:39:22 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)
Me:

Peter da Silva:
   I like J. It's the only language I know with a "gagging smiley with
   moustache" operator.

Lars P. Fischer:
   Actually, it kinda looks like a dedicated smiley creation language.
   I've been exposed to dedicated oil-well measurement languages, but
   smileys -- beats me.

I dunno.. j really sucks as a dedicated smiley creation language.  For
example, practically all smilies which contain ')' or '(' are syntax
errors.  And, about half the smilies which contain '8' are syntax
errors.  Of the remainder, there are many, many smilies which don't do
anything particularly useful.

Maybe teco would be better?

Anyways, the real joke is that I'd posted buggy code -- it's got
misplaced parenthesis, and a significant logical error.  Which doesn't
do anything good to its readability...  [And makes me look dumb too --
but you already knew that.]

What I should have written is more along the lines of:



   delim=:      short_seq  in not_in esc
   out=:   ; transl {~ alph i. delim <;.2 in

'not_in' and 'short_seq' are functions.  not_in finds those elements
of the left argument which aren't members of the right argument.
short_seq takes the result of a predicate (e.g. not_in), and changes
0s to 1s if they are not followed by 1s.

For example, if the escape character is '/', and we use as a test
case:
   in =: '/this///is/a/test'

Then, 'in not_in esc' is
0 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 1

Therefore, delim is:
0 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1

So, delim <;.2 in is:
+--+-+-+-+-+-+--+-+--+--+-+-+-+
|/t|h|i|s|/|/|/i|s|/a|/t|e|s|t|
+--+-+-+-+-+-+--+-+--+--+-+-+-+

In other words, the escape character has significance only where it
preceeds a non-escape character.  In the last part of the above
example, the boxes around the sub-strings are meant to indicate that
the enclosed characters are "sub-arrays".  For this case (where the
sub-arrays have only one dimension), it might be instructive to
represent the thing using lisp notation:
   ["/t" "h" "i" "s" "/" "/" "/i" "s" "/a" "/t" "e" "s" "t"]

This doesn't provide full generality (the only place an escape
character can be placed, transparently, is in front of an escape
character sequence), but it is easy to code in the sense that it has
very little in the way of serial requirements.

To provide full generality, you'd want to replace the definition of
'short_seq' with something a little more involved (like a state
machine with more than two states)...

--



Thu, 01 Sep 1994 14:08:48 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)

Quote:
>Me:

>Peter da Silva:
>   I like J. It's the only language I know with a "gagging smiley with
>   moustache" operator.

Iverson clearly set out to prove that it was possible to have a less readable
language than APL without using a unique, runic character set. Until J, it
was arguable whether Obfuscated C (that seasonal dialect) could qualify, since
there were a few scattered examples of useful _and_ readable C programs.

Quote:
>I dunno.. j really sucks as a dedicated smiley creation language.  For
>example, practically all smilies which contain ')' or '(' are syntax
>errors.  And, about half the smilies which contain '8' are syntax
>errors.  Of the remainder, there are many, many smilies which don't do
>anything particularly useful.

Wait a while. As extensible as any language purports to be, as the years go
by, people end up having to extend the language by reclaiming the syntax errors.
If the language survives, the syntax errors will go away.

Quote:
>Anyways, the real joke is that I'd posted buggy code -- it's got
>misplaced parenthesis, and a significant logical error.  Which doesn't
>do anything good to its readability...  [And makes me look dumb too --
>but you already knew that.]

Never mind. With J, who can tell?

Quote:
>What I should have written is more along the lines of:



>   delim=:      short_seq  in not_in esc
>   out=:   ; transl {~ alph i. delim <;.2 in

Ah yes. It's all becoming clear to me. This means that you must have gone
wrong somewhere. Should you be using so many alphabetic characters?

Quote:
>To provide full generality, you'd want to replace the definition of
>'short_seq' with something a little more involved (like a state
>machine with more than two states)...

Yes. Prefereably a universal self-replicating multi-tape Turing machine with
proof of undecidability for every input, which rewrites itself into the
shortest possible J representation and composes poetry, too.

Later,
Andrew Mullhaupt



Fri, 02 Sep 1994 03:10:52 GMT  
 A useful example for Perl (was Re: Which language can write this but Perl)
Andrew Mullhaupt:
   Iverson clearly set out to prove that it was possible to have a
   less readable language than APL without using a unique, runic
   character set.

Funny, that's what most of the APL programmers who I know say about
_any_ language which doesn't use the APL character set.

   Wait a while. As extensible as any language purports to be, as the
   years go by, people end up having to extend the language by
   reclaiming the syntax errors.  If the language survives, the syntax
   errors will go away.

I dunno... what kind of extensions did you have in mind?



   >   delim=:      short_seq  in not_in esc
   >   out=:   ; transl {~ alph i. delim <;.2 in

Andrew Mullhaupt:
   Ah yes. It's all becoming clear to me. This means that you must
   have gone wrong somewhere. Should you be using so many alphabetic
   characters?

Obviously an invitation for more documentation:

e.  is a set membership operator.  It checks each item in the left
argument for membership in the set of objects in the right argument.
The result is boolean, and each 1 or 0 corresponds to an element in
the left argument.  For example,
        'This is a test' e. 'hers'
0 1 0 1 0 0 1 0 0 0 0 1 1 0
Here, the first element of the result is 0 because 'T' does not occur
in 'hers'

-. is logical negation.  It just changes 0s to 1s and vice versa.


the result of e. and feeds it into -.  So the line that says

defines an boolean operation which returns 1 for each element of the
left argument which is not in the right argument.

Quote:
}. drops an item off the front of an array.  For example,

        }. 1 2 3 4
2 3 4

, is a generic catenate operation.  For example,
        2 3 4 , 0
2 3 4 0

& curries an infix operation by fixing one of the arguments.  The
result is a prefix operation.  Therefore, the part of the code which
says

defines a left shift operation.  There are other ways of defining left
shift, but that's not important here.

Quote:
>: is similar to C's >= (in other words, it returns 1 where the right

argument is greater than or equal to the left argument).  The reason
Quote:
>: is used instead of >= lies in J's parsing rules -- in J, >: is a

single token while >= is two tokens.  If you feed >: boolean
arguments, the result behaves according to this truth table:

           right
           arg
      >:   0  1

left  0    1  0
arg   1    1  1

Next, a {*filter*} sequence of two functions results in a derived function.
If f and g are functions, and x is data,
        (f g) x
is equivalent to
        x (f g) x
which is equivalent to
        x f g x
which is equivalent to
        x f (g x)
It's equivalent to other things, but I'll stop here.

Anyways, the function definition

applies to a boolean list and returns a 1 for each 1 in the original.
It also returns a 1 for each 0 in the original which has a 0 to the
right [and, because ,&0 shifts a 0 onto the right end of the list,
you're guaranteed that the rightmost element of the result is a 1].
But, short_seq returns a 0 for each 0 in the argument which has a 1 to
the right.  In other words, if 0s in the argument mark each occurance
of an escape character, 0s in the result mark each occurance of an
escape character followed by a non-escape character.

For example:
        short_seq  1 1 0 1 0 0 0 1 1 0 1 0 1 0
1 1 0 1 1 1 0 1 1 0 1 0 1 1

Or, if 'in' is a variable holding the text 'ab/c///de/f/g/', and 'esc'
is a variable holding '/', then
        delim=: short_deq  in not_in esc
will set delim to: 1 1 0 1 1 1 0 1 1 0 1 0 1 1

;. is a functional which will apply a function to each part of a
sequence.  If f is a function, n is a number, x is a boolean array
with a 1 indicating delimiting characters, and y is the sequence to be
parsed,
        x  f;.n  y
will apply f to each of the subsequences in y indicated by x.  The
number n indicates if delimiters are leading or trailing delimiters,
and whether or not the delimiters are to be seen by f.  If n is 2,
delimiters are trailing, and delimiters are seen by f.

< when used as a monadic operation is analogous to & used as a monadic
operation in C.  In other words, it returns a reference to an array.
In J, array references have a print representation which consists of a
box drawn around the contents of the array.

So, with the sample I've been using ('in' defined as
'ab/c///de/f/g/'), monadic < is applied to the following sequences:
        'a'
        'b'
       '/c'
        '/'
        '/'
       '/d'
        'e'
       '/f'
       '/g'
        '/'
And, the print representation of that is:
+-+-+--+-+-+--+-+--+--+-+
|a|b|/c|/|/|/d|e|/f|/g|/|
+-+-+--+-+-+--+-+--+--+-+

i. is a lookup function, which will look up items in the right
argument which appear in the left argument, and return their indices.

For example, if 'alph' is a list which contains character sequences
(either single character, or escaped characters), such as:
+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+
|/|a|b|c|d|e|f|g|h|i|j|/a|/b|/c|/d|/e|/f|/g|/h|/i|/j|
+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+
then
        alph i.  delim <;.2 in
would have the result
        1 2 13 0 0 14 5 16 17 0

Ideally, alph would contain every ascii character, and every escape
sequence, but for test cases that is not necessary.

{ is an indexing function.  x{y is analogous to y[x] in C.

~ is a functional which reverses the order of arguments to an
operation.  So, a{~b is equivalent to b{a in J, and is similar to a[b]
in C.

So, if transl is a translate table, for instance
+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+
|/|a|b|c|d|e|f|g|h|i|j|APE|BAT|CAT|DOG|ELF|FLY|GNU|HOT|ILK|JON|
+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+
then
        transl {~ 1 2 13 0 0 14 5 16 17 0
would yield
+-+-+---+-+-+---+-+---+---+-+
|a|b|CAT|/|/|DOG|e|FLY|GNU|/|
+-+-+---+-+-+---+-+---+---+-+

; when used monadically, will take a list of array references and
and catenate those arrays together.  For example
        ; transl {~ 1 2 13 0 0 14 5 16 17 0
yields
abCAT//DOGeFLYGNU/

   >To provide full generality, you'd want to replace the definition
   >of 'short_seq' with something a little more involved (like a state
   >machine with more than two states)...

   Yes.  Prefereably a universal self-replicating multi-tape Turing
   machine with proof of undecidability for every input, which
   rewrites itself into the shortest possible J representation and
   composes poetry, too.

Um... I don't think that's necessary.  All I was trying to say was
that if you want a slash in the result you might want to make it so
that '//' returns a slash.  If you don't need that (for instance, if
you want '/s' to return a slash), then the above code would do fine.

[Yes, I recognize Andrew's comments as sarcasm, but I think he's
laying it on a little thick.  Anyways, hopefully this will have
cleared up any major questions about that section of code...]

--



Fri, 02 Sep 1994 10:28:14 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. A useful example for Perl (was Re: Which language can write this but Perl)

2. Which language can write this but Perl

3. Which language can write this but Perl

4. My Perl to Ruby Story (was: perl and rub y)

5. Scheme in Perl, or Perl as Scheme

6. Error/exception handling (was: Re: Perl spontaneously jumps to other Perl code)

7. Python and Perl (was: converting perl to python)

8. Error/exception handling (was: Re: Perl spontaneously jumps to other Perl code)

9. Perl vs TCL (was: Execution speed of Perl?)

10. example perl v. caml performance

11. Writing perl script to pipe to 2 files from tail -f command

12. Wanted: perl programmers to write custom Web applications

 

 
Powered by phpBB® Forum Software