apl and regular expressions 
Author Message
 apl and regular expressions

I have found that apl+win has  promising internet features. In
addition to that I should like to see good regular expression handling
functions. Does anybody know about this topic?


Fri, 16 Jan 2004 17:34:11 GMT  
 apl and regular expressions

Quote:
> I have found that apl+win has  promising internet features.
> In addition to that I should like to see good regular
> expression handling functions. Does anybody know about this topic?

The DLL used by J to handle Regular Expression is a slightly modified
version of a public domain library (documentation and source code
available on the web, along with precompiled versions). I know that it's
a relatively simple (although boring) exercise to provide a set of cover
functions in Dyalog APL to build the interface. I am not an expert
(read: I don't know anything) about the foreign interface in APL+WIN but
I believe there should be no problem either. In fact I wanted to write
an article for Vector about this but I never found the time. Since I
hardly have any use for general regexp's in my daily work I could never
justify the time I would need.

--

Homepage: http://www.*-*-*.com/

<<<All I Ever Learned, I Learned From Anime: ---
   If a sister falls in love with her brother, somewhere down the line
you will discover they are not {*filter*} related.>>>



Fri, 16 Jan 2004 18:30:02 GMT  
 apl and regular expressions
I have occasionally wondered about the
possibility of regular expressions being
added as a standard feature of APL.  They
could be extended to make them better fit
into the language.

It should be simple to extend the regular
expression syntax to allow matches of
mixed vectors with both characters and
numbers, increasing the usefulness of
pattern matching on APL arrays.  For
example, '(IBM' 51 3.5 'May' 1981 ')+'
to match one or more occurrences of the
information enclosed in parentheses. This
could be implemented as a system function
or via an unused APL character.

Another possibility I have been wondering
about is whether "find" would be better
implemented as an operator rather that a
function.  Then it could take as its
argument a function to determine equality,
allowing many types of searches, including
regular expression searches.  For simpler
searches one could simply use the "="
function as its argument.

--- Brian



Fri, 16 Jan 2004 21:09:50 GMT  
 apl and regular expressions

Quote:

> It should be simple to extend the regular
> expression syntax to allow matches of
> mixed vectors with both characters and
> numbers, increasing the usefulness of
> pattern matching on APL arrays.  For
> example, '(IBM' 51 3.5 'May' 1981 ')+'
> to match one or more occurrences of the
> information enclosed in parentheses. This
> could be implemented as a system function
> or via an unused APL character.

Am I missing the point? My APL+WIN already does this with member.

Fred



Fri, 16 Jan 2004 22:35:05 GMT  
 apl and regular expressions

Quote:

...

> It should be simple to extend the regular
> expression syntax to allow matches of
> mixed vectors with both characters and
> numbers, increasing the usefulness of
> pattern matching on APL arrays.  For
> example, '(IBM' 51 3.5 'May' 1981 ')+'
> to match one or more occurrences of the
> information enclosed in parentheses. This
> could be implemented as a system function
> or via an unused APL character.

> Another possibility I have been wondering
> about is whether "find" would be better
> implemented as an operator rather that a
> function.  Then it could take as its
> argument a function to determine equality,
> allowing many types of searches, including
> regular expression searches.  For simpler
> searches one could simply use the "="
> function as its argument.

> --- Brian

While RE matching in the context of characters in (ordered) strings clearly
makes sense, and has been a part of the computational milieu ever since McNaughton
and Yamada published their algorithm, I fail to get a feel for its application
to `objects' of the kind you appear to be describing. While J has rather complete
RE mechanisms, and K has some rudimentary facilities, they generally apply to strings
of characters, not to objects, where difficulties (for example) of returning a list of
variable length matches would often be a pain (except in K where lists of lists
give more flexibility than matricies as they can be `ragged').

I'm afraid I don't understand what you mean by `This could be implemented ...
via an unused APL character'. You must be picturing somethig that I don't see
at all.

As to the `find' idea, isn't that easy enough to implement anyplace that
you have access to an `execute' operation? I understand that it might well be the
case that implementation as an operator could speed things up, but the same could be said
for just about any function---and thus this doesn't seem, to me at least, to make
much of an argument that it is worth making a `special case' out of this particular
function.



Fri, 16 Jan 2004 23:38:09 GMT  
 apl and regular expressions

Quote:

> > It should be simple to extend the regular
> > expression syntax to allow matches of
> > mixed vectors with both characters and
> > numbers, increasing the usefulness of
> > pattern matching on APL arrays.  For
> > example, '(IBM' 51 3.5 'May' 1981 ')+'
> > to match one or more occurrences of the
> > information enclosed in parentheses. This
> > could be implemented as a system function
> > or via an unused APL character.

> Am I missing the point? My APL+WIN already does this with
> member.

That was my initial reaction, too.  But then I concluded that Brian didn't mean
a simple match on the entire example vector (as in {and}{dot}{match}), but
meant that you would get a match on 'IBM' *or* 51 *or* (1981 3.5 'May'), etc.,
with the parens enclosing allowable alternatives and the + indicating the
repetition.  Well, even with that, the quotes don't seem to be in the right
places.

How about it, Brian?  Can you clarify, please?  I find your ideas intriguing,
but I would like to see a more detailed proposal.  I'm not sure that what I'm
visualizing -- e.g., with the operator idea -- is really what you intend.

            /Jim Lucas



Sat, 17 Jan 2004 03:54:12 GMT  
 apl and regular expressions
Perhaps the example was not very good.  I am
just suggesting that if regular expression
pattern matching is implemented in APL there
is no reason why it should be limited to
character strings; there are many times when
it is necessary to search an array of numbers
or a mixed array, and it would be useful to be
able to apply the power of regular expressions
in those cases as well, just as the current
find (epsilon underbar) is more useful than
the earlier #SS function, which was limited to
character strings.

For example, the regular expression '[0-9]+'
will find a sequence of one or more digits;
this could be extended to allow an expression
such as '[' 12.7 '-' 92.4 ']+' to find a
sequence of one or more numbers between 12.7
and 92.4, inclusive.  Or consider something
like '(Ralph|Frank|Harry)', which matches any
of the specified names.  It might be desirable
to be able to specify a pattern such as
'(' 124 '|IBM|' 27 ')' to match 124, or "IBM",
or 27.  While simple examples like these are
not hard to implement in current APL versions,
regular expressions make it easy to perform
much more complicated searches.  Following
Perl's lead and implementing search-and-replace
as well as search would provide a powerful
means of updating arrays.

One application that immediately comes to mind
is implementing a small relational database as
a set of APL arrays.  Regular expressions
applicable to mixed data would make it easy to
implement sophisticated queries and updates.

My point regarding the use of operators in
place of functions such as find (or index) was
that these functions currently perform only a
direct comparison (equality).  There are times
when it would be useful to perform a similar
search based on more complex criteria, e.g. a
regular expression search.  I was just thinking
that if these functions were implemented as
operators the search function could be specified
as an argument, allowing searches of arbitrary
complexity.  If the argument function was an
intrinsic APL function, such as "=" or a future
regular expression function, these searches could
be performed far more efficiently than if you had
to perform several separate searches, scanning an
array multiple times and generating several
temporary boolean arrays, and then combine the
search results with "and" or "or".

--- Brian



Sat, 17 Jan 2004 23:10:32 GMT  
 apl and regular expressions

Quote:

> Perhaps the example was not very good.

I don't think the content was the problem so much as interpreting your
notation.

Quote:
> ...just as the current find (epsilon underbar) is more useful
> than the earlier #SS function,...

And I believe that #SS was limited to a single vendor, while {find} has now
been implemented by all of them.

Quote:
> For example, the regular expression '[0-9]+' will find a
> sequence of one or more digits; this could be extended
> to allow an expression such as '[' 12.7 '-' 92.4 ']+' to
> find a sequence of one or more numbers between 12.7
> and 92.4, inclusive.  Or consider something like
> '(Ralph|Frank|Harry)', which matches any of the
> specified names.  It might be desirable to be able to
> specify a pattern such as '(' 124 '|IBM|' 27 ')' to match
> 124, or "IBM", or 27.

I think the idea has merit.  I'm less enthusiastic about the syntax.  Though I
see how it is directly derived by extending the established character-string
regexp syntax to include heterogeneous arrays, I think even your simple
examples are very difficult to "see" (i.e., for a human to interpret on sight),
and a complete rethinking of the syntax/notation would be more productive.
Specifically, the plethora of quotes makes it difficult to see which elements
are enclosed in quotes and which are not, and the merging of the control syntax
with the character arguments multiplies that difficulty.  Even
    '(' 124 '|' 'IBM' '|' 27 ')'
would be easier to read, in my opinion, than your above version.

While I'm not prepared to propose a particular syntax without considerable
further analysis, I think it would be much more intuitive (and more APL-ish,
whatever that means) if the control syntax could be completely separated from
the arguments, and hopefully not be in the form of quoted strings.  E.g., the
following might be one possibility, though I don't consider it to be a formal
proposal:

With a new syntactic class, distinguished by a notation like K's symbols (or a
monadic version of J's "tie" or "gerund"), one could specify operations such as
`| for "or" (actually, I'd prefer to use the APL {or}, instead of "|", but the
traditional regexp symbol is easier to put in an email).  Then your second
example above could be
    (124 `| 'IBM' `| 27)
(the parens assume this entire expression simply defines one argument of a
larger expression), or even
    (`|/ 124 'IBM' 27)
i.e., using or-reduction on (that segment of) the comparison.  Your first
example is more difficult, because APL lacks a notation for range-generation,
so for the moment I'll invent it as "#".  Then I could construct something like
    [12.7 `# 92.4] `/
(where `/ indicates potential replication), or
    (12.7 `# 92.4) `/
if there's no specific need to require a different kind of bracket in this
particular syntax.

One advantage of this sort of notation is that it would (I think) eliminate the
necessity for escape-character conventions to distinguish between, e.g., "|" as
an element of a string to be searched for and "|" (`| in this syntax) as
constraint in the search specification.

This is just off the top of my head.  I haven't pursued all of its
ramifications, and I'm actually not fluent enough in standard regexp syntax.
But I hope it gives you some ideas, and I hope that you might come up with a
syntax better than either it or what you have used above.  What do you think?

Quote:
> While simple examples like these are
> not hard to implement in current APL versions,
> regular expressions make it easy to perform
> much more complicated searches.  Following
> Perl's lead and implementing search-and-replace
> as well as search would provide a powerful
> means of updating arrays.

An attractive idea, but I think someone else already mentioned the main problem
with this in APL:  Replacing a string of one length with one of a different
length within a matrix or higher-rank array requires some rule to specify which
elements in one row should line up with (e.g., be in the same column as) which
elements in other rows, if the resultant row lengths differ, or even if the
replacements begin in different columns.  Also, what to do with differing row
lengths in any case?  Fill to the length of the longest?  LENGTH ERROR?  Some
additional syntax for specifying more complex rules?

One might also ask if it shouldn't be possible to search (and possibly replace)
for structures other than strings in a row.  Searching for strings oriented on
another axis -- e.g., down columns -- might require an axis specification, but
can otherwise be simply derived from the row-oriented case by a pair of
transposes.  But what about searching for smaller submatrices in a large
matrix.  That would seem a natural thing to ask for in APL.  How can this be
included in a regexp syntax?  Actually, I think the syntax I suggest above
could do it (if it's not inconsistent for some other reason), by simply
replacing the individual search elements/strings with names of variables that
contain more complex arrays (e.g., matrices).  However, certain operations
(e.g., repetition) may either be invalid or require additional specification
(e.g., repetition along which axis?).  Presumably the unequal-length problem
could be solved as in the search-for-vector case (*if* it can be solved
*there*) by separate application on each axis, but it might be necessary to
specify an order for the application.

Quote:
> One application that immediately comes to mind
> is implementing a small relational database as
> a set of APL arrays.  Regular expressions
> applicable to mixed data would make it easy to
> implement sophisticated queries and updates.

I don't quite see why this can't already be done with current systems and
nested arrays.  Can you give us an example where regexp's would be a
significant improvement or -- even better -- a necessity?

Quote:
> My point regarding the use of operators in
> place of functions such as find (or index) was
> that these functions currently perform only a
> direct comparison (equality).  There are times
> when it would be useful to perform a similar
> search based on more complex criteria, e.g. a
> regular expression search.  I was just thinking
> that if these functions were implemented as
> operators the search function could be specified
> as an argument, allowing searches of arbitrary
> complexity.  If the argument function was an
> intrinsic APL function, such as "=" or a future
> regular expression function, these searches could
> be performed far more efficiently than if you had
> to perform several separate searches, scanning an
> array multiple times and generating several
> temporary boolean arrays, and then combine the
> search results with "and" or "or".

I think I see your point, as in replacing "=" with ">" in such comparisons,
which I think is easier to visualize than your suggestion of regular
expressions, though I hope the same concept.  Yes?  But I'm not sure that
turning {find} into an operator is necessarily the right way to go.  Neither am
I sure that it's not.  I'll need more time to think about it.  In the meantime,
if you can give us a few specific examples of the syntax you have in mind, it
might help.

This is interesting.  I'm looking forward to your response.

            /Jim



Sun, 18 Jan 2004 04:11:51 GMT  
 apl and regular expressions
I was about to go into a long piece disagreeing both the original post and with
much of Lucas' response. However, given the likelihood of actual time being spent
on implementation of any of this, I decided it wasn't worth it.

So I'll just make a couple of brief observations instead.

Quote:


> > Perhaps the example was not very good.

> I don't think the content was the problem so much as interpreting your
> notation.

My problem was not with either the example or the notation. I don't think
that the underlying idea is very good.

[snip of a long discussion]

I find the idea `muddled'. The usefulness of the notion of `regular expressions'
is rather more intimately linked to the `free form'-ness of text than has been
considered in this discussion, and it is not a good surrogate for a form of
`object' pattern matching which, while quite meaningful in the context of several
other languages, is not of much importance (IMO) to domains where APL and its
related languages shine.

The idea is already implemented in J, and I would commend anyone thinking about it
here to see how it has been handled there where it has (sensibly) been introduced
in the context of character strings. As to its applicability to other data, J (and K)
are mute, properly mute, AFAICS.

As to `object matching', I would suggest that REBOL, PROLOG or even LISP would be
a more fruitful approach than to attempt to accomplish something through some
twisting of the RE concept into APL. As for RE's on strings, perl is, of course,
the definitive language. It is perhaps revealing that even in perl, were RE's are
a most important linguistic construct, they are pretty much _only_ applied to
character strings, not to perl objects of other kinds. This is wisdom on the part
of perl's implementors, not ignorance.

In short, the idea strikes me as a non-starter for APL unless it is applied to
character strings. Applied to character strings, it has some limited usefulness---as
is demonstrated in J---where it has been available for a few years, and is
occasionally used.



Sun, 18 Jan 2004 12:31:13 GMT  
 apl and regular expressions

Quote:

> For example, the regular expression '[0-9]+'
> will find a sequence of one or more digits;
> this could be extended to allow an expression
> such as '[' 12.7 '-' 92.4 ']+' to find a
> sequence of one or more numbers between 12.7
> and 92.4, inclusive.  Or consider something

I think the participants in this discussion are losing sight ow what a regular expression is and why
various pattern matching facilities use the concept.

In the example above, the notation [0-9] is shorthand or "syntactic sugar" for the regular
expression

  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

assuming, of course, that in the alphabet for the regular expressions at issue 0123456789 form a
contiguous subsequence.

[ 12.7 - 92.4 ] admits of no such simple interpretation.

Trying to extend the alpahbet from a reasonably sized set of characters to an extremely large
(albeit still finite (or are you meaning to include real numbers as well as machine representable
ones?)) mixed bag of characters, numbers as used in APL (machine representable numbers - for the
underlying machine) arrays? who knows what else? is ill conceived.

Regular expressions are used as the basis of many string pattern matchers for the following reasons:

 1. Regular expressions are sufficient to express many useful patterns users are likely to want to
match

 2. The underlying theory enables an extremely efficient implementation which has near linear
performance in many practical cases.  This is because a) there is an algorithm to convert any
regular expression to a non-deterministic finite automaton that reconizes precisely the strings in
the regular set corresponding to the given regular expression, b) there is another alogrithm that
converts any non-deterministic finite automaton to a deterministic finite automaton that recognizes
precisely the same set of strings, and c) a deterministic finite automation processes its input
string in linear time.  Note that for this to be useful it must be possible to generate the DFA in a
reasonable amount of time (and space) compared to the time that it is in use.  Otherwise it may be
better to use a different approach.

All of this goes out the window with the so called "extensions" being discussed here.



Sun, 18 Jan 2004 14:17:46 GMT  
 apl and regular expressions

Quote:

> I think the participants in this discussion are losing sight ow what a regular expression is and why
> various pattern matching facilities use the concept.

...

> Regular expressions are used as the basis of many string pattern matchers for the following reasons:

>  1. Regular expressions are sufficient to express many useful patterns users are likely to want to
> match

>  2. The underlying theory enables an extremely efficient implementation which has near linear
> performance in many practical cases.  

...

This is well said. I would only add, for completeness, some characterization of the sensible
`domain' of REs---much more closely tied to character strings than has been clear in the
earlier discussion.



Sun, 18 Jan 2004 21:25:50 GMT  
 apl and regular expressions

Quote:
> This is well said. I would only add, for completeness, some
> characterization of the sensible `domain' of REs---much more
> closely tied to character strings than has been clear in the
> earlier discussion.

While I won't argue with this statement which, in my opinion, is quite
sensible, there's more to regular expressions than what hits the eye.

First, a reference to something a bit unusual:
http://citeseer.nj.nec.com/matz97regular.html

Second, a couple of consideration from a purely theoretical point of
view: regular expressions are just a way to represent patterns accepted
by a language (in the abstract meaning of language, as sequence of
symbols whose sentences are constrained by rules). For instance:
http://grid.let.rug.nl/~hkuipers/scriptie/node37.html

In particular, finite automata and regular expressions have the same
expressive power and, in fact, as stated already, regular expression
matchers are usually implemented translating the regexp into a finite
automata (either table driven, or hard-encoded in a computer language of
choice). Therefore it is correct that the domain of application of
regular expressions is a string, but a string can be interpreted as an
ordered set of symbols. An heterogeneous (but also a nested) array is,
in fact, an ordered set of complex symbols. It's true that some of the
shortcuts currently are harder to interpret in a domain like the real
numbers ("[a-z]" is somewhat easier to expand than "[1.1-5.9]") but I
won't comment on this since I am not an expert in grammars on an
infinite set of symbols. Nevertheless, there is nothing implicitly
impossible in the building of a finite automaton which would be able to
match an interval over the real numbers and its efficiency wouldn't be
considerably less (considering the implicit complexity of the domain)
than that of a character matcher.

--

Homepage: http://come.to/wildheart/

<<<Omae ga michi ni mayottara hohoemide yamiwo terasou ---
   When you get lost, I shall enlighten you with my smiles>>>



Sun, 18 Jan 2004 23:25:53 GMT  
 apl and regular expressions
Quote:

> > This is well said. I would only add, for completeness, some
> > characterization of the sensible `domain' of REs---much more
> > closely tied to character strings than has been clear in the
> > earlier discussion.

> While I won't argue with this statement which, in my opinion, is quite
> sensible, there's more to regular expressions than what hits the eye.

[snip]

Thank you. I'll return `the favor' by not arguing much with your characterization
either, other than to make a couple of further obeservations...

In my reading of the literature, Matz work didn't `go far' thus suggesting that while
RE's _could_ be applied in this context, there wasn't a whole lot of gain to be had
in doing so. On the other hand this domain of study is well out of my `normal scan range'
so I may not know recent developments at all well. It all reminds me vaguely of some work
we did in the early 70s (largely by my then student Zisman who later ran Lotus) using RE
and Petri Nets to characterize communications flows.

Second, central to (but perhaps only implicitly) my view is not that REs are necessarily
_impossible_ to apply in APL domains, but rather that there is likely little gain to be
had in doing so. From a broad brush overview, I find APLs, Js and Ks tend to be particularly
productive for me in circumstances where there is some natural structuring to the data---stock
ticks, organized data structures, ... and REs are pretty useless in most of these domains,
other than as applied to original (external world) strings. I like REs a lot, and make
heavy use of them, but it tends to be when I am doing string patterns in perl or searches
in EMacs where the data that I am dealing with is often quite loosely structured.

So I guess in summary I am making principally an `engineering point'. Theoretically, just about
anything can be applied to just about anything. Some vague notion of REs can probably be stretched
to apply to some form of APL structure. I simply regard the liklihood of doing so productively to
be _very small_, and even if it were done it would still likely (IMO) be distinctly inferior to
other solution domains where they are a much more natural construct.



Mon, 19 Jan 2004 02:55:25 GMT  
 apl and regular expressions

Quote:

> .... Therefore it is correct that the domain of application of
> regular expressions is a string, but a string can be interpreted as an
> ordered set of symbols. An heterogeneous (but also a nested) array is,
> in fact, an ordered set of complex symbols. It's true that some of the
> shortcuts currently are harder to interpret in a domain like the real
> numbers ("[a-z]" is somewhat easier to expand than "[1.1-5.9]") but I
> won't comment on this since I am not an expert in grammars on an
> infinite set of symbols. Nevertheless, there is nothing implicitly
> impossible in the building of a finite automaton which would be able to
> match an interval over the real numbers and its efficiency wouldn't be
> considerably less (considering the implicit complexity of the domain)
> than that of a character matcher.

An essential characteristic of a finite automaton is that its input alphabet is finite and fixed in
advance.  For the purposes of realizing the efficient implementation of recognizers mentioned in my
previous post, finite can be taken to mean modest in size, no more than a few hundred symbols.
Trying to consider heterogeneous strings of arbitrary apl objects makes the input alphabet the set
of all possible apl objects.  This is indeed a finite set but it is huge and doesn't lend itself to
a straightforward ordering.  It would be impractical in the extreme, I venture to say impossible, to
construct a practical finite automaton over this input alphabet; and if it could be done, the
resulting machine would be anything but efficient.


Mon, 19 Jan 2004 05:47:37 GMT  
 apl and regular expressions

Quote:


...
> > For example, the regular expression '[0-9]+' will find a
> > sequence of one or more digits; this could be extended
> > to allow an expression such as '[' 12.7 '-' 92.4 ']+' to
> > find a sequence of one or more numbers between 12.7
> > and 92.4, inclusive.  Or consider something like
> > '(Ralph|Frank|Harry)', which matches any of the
> > specified names.  It might be desirable to be able to
> > specify a pattern such as '(' 124 '|IBM|' 27 ')' to match
> > 124, or "IBM", or 27.

> I think the idea has merit.  I'm less enthusiastic about the syntax.  Though I
> see how it is directly derived by extending the established character-string
> regexp syntax to include heterogeneous arrays, I think even your simple
> examples are very difficult to "see" (i.e., for a human to interpret on sight),
> and a complete rethinking of the syntax/notation would be more productive.
> Specifically, the plethora of quotes makes it difficult to see which elements
> are enclosed in quotes and which are not, and the merging of the control syntax
> with the character arguments multiplies that difficulty.  Even
>     '(' 124 '|' 'IBM' '|' 27 ')'
> would be easier to read, in my opinion, than your above version.

> While I'm not prepared to propose a particular syntax without considerable
> further analysis, I think it would be much more intuitive (and more APL-ish,
> whatever that means) if the control syntax could be completely separated from
> the arguments, and hopefully not be in the form of quoted strings.  E.g., the
> following might be one possibility, though I don't consider it to be a formal
> proposal:

> With a new syntactic class, distinguished by a notation like K's symbols (or a
> monadic version of J's "tie" or "gerund"), one could specify operations such as
> `| for "or" (actually, I'd prefer to use the APL {or}, instead of "|", but the
> traditional regexp symbol is easier to put in an email).  Then your second
> example above could be
>     (124 `| 'IBM' `| 27)
> (the parens assume this entire expression simply defines one argument of a
> larger expression), or even
>     (`|/ 124 'IBM' 27)
> i.e., using or-reduction on (that segment of) the comparison.  Your first
> example is more difficult, because APL lacks a notation for range-generation,
> so for the moment I'll invent it as "#".  Then I could construct something like
>     [12.7 `# 92.4] `/
> (where `/ indicates potential replication), or
>     (12.7 `# 92.4) `/
> if there's no specific need to require a different kind of bracket in this
> particular syntax.

> One advantage of this sort of notation is that it would (I think) eliminate the
> necessity for escape-character conventions to distinguish between, e.g., "|" as
> an element of a string to be searched for and "|" (`| in this syntax) as
> constraint in the search specification.

> This is just off the top of my head.  I haven't pursued all of its
> ramifications, and I'm actually not fluent enough in standard regexp syntax.
> But I hope it gives you some ideas, and I hope that you might come up with a
> syntax better than either it or what you have used above.  What do you think?

I have no objections to changing the notation if it will make the
expression clearer without losing functionality.  Introducing a new
syntactic class just to support one language extension makes me a
little queasy; if this class is useful for much more than simply
expressing regular expressions cleanly then I don't mind.

- Show quoted text -

Quote:

> > While simple examples like these are
> > not hard to implement in current APL versions,
> > regular expressions make it easy to perform
> > much more complicated searches.  Following
> > Perl's lead and implementing search-and-replace
> > as well as search would provide a powerful
> > means of updating arrays.

> An attractive idea, but I think someone else already mentioned the main problem
> with this in APL:  Replacing a string of one length with one of a different
> length within a matrix or higher-rank array requires some rule to specify which
> elements in one row should line up with (e.g., be in the same column as) which
> elements in other rows, if the resultant row lengths differ, or even if the
> replacements begin in different columns.  Also, what to do with differing row
> lengths in any case?  Fill to the length of the longest?  LENGTH ERROR?  Some
> additional syntax for specifying more complex rules?

If the operation can be considered to temporarily break the array
into subarrays, then then operation of rejoining these into a
single array could be done in the same way as tolerant disclose
(e.g. '>' in Sharp APL) by padding as necessary.

Quote:

> One might also ask if it shouldn't be possible to search (and possibly replace)
> for structures other than strings in a row.  Searching for strings oriented on
> another axis -- e.g., down columns -- might require an axis specification, but
> can otherwise be simply derived from the row-oriented case by a pair of
> transposes.  But what about searching for smaller submatrices in a large
> matrix.  That would seem a natural thing to ask for in APL.  How can this be
> included in a regexp syntax?  Actually, I think the syntax I suggest above
> could do it (if it's not inconsistent for some other reason), by simply
> replacing the individual search elements/strings with names of variables that
> contain more complex arrays (e.g., matrices).  

The notation should allow not just variable names but arbitrary APL
expressions that generate arrays to be specified in place of constants.
For example,  (`|/ 124 'IBM' (3 9 {rho}{iota}27))

Quote:
> However, certain operations
> (e.g., repetition) may either be invalid or require additional specification
> (e.g., repetition along which axis?).  Presumably the unequal-length problem
> could be solved as in the search-for-vector case (*if* it can be solved
> *there*) by separate application on each axis, but it might be necessary to
> specify an order for the application.

Perhaps this could be solved by appending an ordered axis list,
e.g. *{1,3,2} would mean search along axis 1, then axis 3, then
axis 2.  The braces might be replaced by some other symbol if
we wanted to use them for specifying the number of times to
match a subpattern, as in Perl.

Quote:

> > One application that immediately comes to mind
> > is implementing a small relational database as
> > a set of APL arrays.  Regular expressions
> > applicable to mixed data would make it easy to
> > implement sophisticated queries and updates.

> I don't quite see why this can't already be done with current systems and
> nested arrays.  Can you give us an example where regexp's would be a
> significant improvement or -- even better -- a necessity?

Regexps simplify searches for patterns that are more complex
than a constant string, e.g. suppose we want to find each
location in an array where a group of five consecutive numbers
greater than 1000 or less than 5 is followed by a character string
starting with 'A', 'B', or 'C', and then a number between 10 and 20.
This can be done in current APL systems, but could be specified
much more easily as a regular expression, reducing the likelihood
of errors.  Moreover, a standard APL solution would require
scanning the array multiple times and generate several temporary
boolean arrays.  The use of a regular expression would reduce
the number of scans required and eliminate the need for potentially
huge temporary arrays.  

- Show quoted text -

Quote:
> > My point regarding the use of operators in
> > place of functions such as find (or index) was
> > that these functions currently perform only a
> > direct comparison (equality).  There are times
> > when it would be useful to perform a similar
> > search based on more complex criteria, e.g. a
> > regular expression search.  I was just thinking
> > that if these functions were implemented as
> > operators the search function could be specified
> > as an argument, allowing searches of arbitrary
> > complexity.  If the argument function was an
> > intrinsic APL function, such as "=" or a future
> > regular expression function, these searches could
> > be performed far more efficiently than if you had
> > to perform several separate searches, scanning an
> > array multiple times and generating several
> > temporary boolean arrays, and then combine the
> > search results with "and" or "or".

> I think I see your point, as in replacing "=" with ">" in such comparisons,
> which I think is easier to visualize than your suggestion of regular
> expressions, though I hope the same concept.  Yes?  But I'm not sure that
> turning {find} into an operator is necessarily the right way to go.  Neither am
> I sure that it's not.  I'll need more time to think about it.  In the meantime,
> if you can give us a few specific examples of the syntax you have in mind, it
> might help.

> This is interesting.  I'm looking forward to your response.

>             /Jim

This is just an idea I was kicking around.  It seems to me that
using an operator allows the find operation to be customized
easily and allows the same symbol to be used for all types of
similar searches.  I was thinking that one would type something
like   leftarg ={epsilon_underbar} rightarg    for the standard
find operation or   leftarg  MYCOMPARE {epsilon_underbar} rightarg
for a customized search using the user-written dyadic APL function
MYCOMPARE.  The function would return a boolean explicit result
based on its two arguments and could be arbitrarily complex.

--- Brian



Mon, 19 Jan 2004 22:51:28 GMT  
 
 [ 28 post ]  Go to page: [1] [2]

 Relevant Pages 

1. regular expression matching in J ? (or APL)

2. php like regular expressions in apl?

3. Support for regular expressions in APL?

4. Tgen, linear algebra, and regular expression package available

5. Regular Expressions

6. Regular Expression Matcher v1.1

7. Benchmarking Regular Expressions in J3.05

8. regular expression discussion

9. Regular Expressions in J

10. Regular Expression to match HTML elements

11. Using regular expressions to validate data

12. Variable length regular expression

 

 
Powered by phpBB® Forum Software