apl and regular expressions
Author |
Message |
Juhani V?h?niit #1 / 28
|
 apl and regular expressions
I have found that apl+win has promising internet features. In addition to that I should like to see good regular expression handling functions. Does anybody know about this topic?
|
Fri, 16 Jan 2004 17:34:11 GMT |
|
 |
Stefano Lanzavecchi #2 / 28
|
 apl and regular expressions
Quote: > I have found that apl+win has promising internet features. > In addition to that I should like to see good regular > expression handling functions. Does anybody know about this topic?
The DLL used by J to handle Regular Expression is a slightly modified version of a public domain library (documentation and source code available on the web, along with precompiled versions). I know that it's a relatively simple (although boring) exercise to provide a set of cover functions in Dyalog APL to build the interface. I am not an expert (read: I don't know anything) about the foreign interface in APL+WIN but I believe there should be no problem either. In fact I wanted to write an article for Vector about this but I never found the time. Since I hardly have any use for general regexp's in my daily work I could never justify the time I would need. --
Homepage: http://www.*-*-*.com/ <<<All I Ever Learned, I Learned From Anime: --- If a sister falls in love with her brother, somewhere down the line you will discover they are not {*filter*} related.>>>
|
Fri, 16 Jan 2004 18:30:02 GMT |
|
 |
Brian McGuinne #3 / 28
|
 apl and regular expressions
I have occasionally wondered about the possibility of regular expressions being added as a standard feature of APL. They could be extended to make them better fit into the language. It should be simple to extend the regular expression syntax to allow matches of mixed vectors with both characters and numbers, increasing the usefulness of pattern matching on APL arrays. For example, '(IBM' 51 3.5 'May' 1981 ')+' to match one or more occurrences of the information enclosed in parentheses. This could be implemented as a system function or via an unused APL character. Another possibility I have been wondering about is whether "find" would be better implemented as an operator rather that a function. Then it could take as its argument a function to determine equality, allowing many types of searches, including regular expression searches. For simpler searches one could simply use the "=" function as its argument. --- Brian
|
Fri, 16 Jan 2004 21:09:50 GMT |
|
 |
Fred Hone #4 / 28
|
 apl and regular expressions
Quote:
> It should be simple to extend the regular > expression syntax to allow matches of > mixed vectors with both characters and > numbers, increasing the usefulness of > pattern matching on APL arrays. For > example, '(IBM' 51 3.5 'May' 1981 ')+' > to match one or more occurrences of the > information enclosed in parentheses. This > could be implemented as a system function > or via an unused APL character.
Am I missing the point? My APL+WIN already does this with member. Fred
|
Fri, 16 Jan 2004 22:35:05 GMT |
|
 |
David Nes #5 / 28
|
 apl and regular expressions
Quote:
... > It should be simple to extend the regular > expression syntax to allow matches of > mixed vectors with both characters and > numbers, increasing the usefulness of > pattern matching on APL arrays. For > example, '(IBM' 51 3.5 'May' 1981 ')+' > to match one or more occurrences of the > information enclosed in parentheses. This > could be implemented as a system function > or via an unused APL character. > Another possibility I have been wondering > about is whether "find" would be better > implemented as an operator rather that a > function. Then it could take as its > argument a function to determine equality, > allowing many types of searches, including > regular expression searches. For simpler > searches one could simply use the "=" > function as its argument. > --- Brian
While RE matching in the context of characters in (ordered) strings clearly makes sense, and has been a part of the computational milieu ever since McNaughton and Yamada published their algorithm, I fail to get a feel for its application to `objects' of the kind you appear to be describing. While J has rather complete RE mechanisms, and K has some rudimentary facilities, they generally apply to strings of characters, not to objects, where difficulties (for example) of returning a list of variable length matches would often be a pain (except in K where lists of lists give more flexibility than matricies as they can be `ragged'). I'm afraid I don't understand what you mean by `This could be implemented ... via an unused APL character'. You must be picturing somethig that I don't see at all. As to the `find' idea, isn't that easy enough to implement anyplace that you have access to an `execute' operation? I understand that it might well be the case that implementation as an operator could speed things up, but the same could be said for just about any function---and thus this doesn't seem, to me at least, to make much of an argument that it is worth making a `special case' out of this particular function.
|
Fri, 16 Jan 2004 23:38:09 GMT |
|
 |
Jim Luca #6 / 28
|
 apl and regular expressions
Quote:
> > It should be simple to extend the regular > > expression syntax to allow matches of > > mixed vectors with both characters and > > numbers, increasing the usefulness of > > pattern matching on APL arrays. For > > example, '(IBM' 51 3.5 'May' 1981 ')+' > > to match one or more occurrences of the > > information enclosed in parentheses. This > > could be implemented as a system function > > or via an unused APL character. > Am I missing the point? My APL+WIN already does this with > member.
That was my initial reaction, too. But then I concluded that Brian didn't mean a simple match on the entire example vector (as in {and}{dot}{match}), but meant that you would get a match on 'IBM' *or* 51 *or* (1981 3.5 'May'), etc., with the parens enclosing allowable alternatives and the + indicating the repetition. Well, even with that, the quotes don't seem to be in the right places. How about it, Brian? Can you clarify, please? I find your ideas intriguing, but I would like to see a more detailed proposal. I'm not sure that what I'm visualizing -- e.g., with the operator idea -- is really what you intend. /Jim Lucas
|
Sat, 17 Jan 2004 03:54:12 GMT |
|
 |
Brian McGuinne #7 / 28
|
 apl and regular expressions
Perhaps the example was not very good. I am just suggesting that if regular expression pattern matching is implemented in APL there is no reason why it should be limited to character strings; there are many times when it is necessary to search an array of numbers or a mixed array, and it would be useful to be able to apply the power of regular expressions in those cases as well, just as the current find (epsilon underbar) is more useful than the earlier #SS function, which was limited to character strings. For example, the regular expression '[0-9]+' will find a sequence of one or more digits; this could be extended to allow an expression such as '[' 12.7 '-' 92.4 ']+' to find a sequence of one or more numbers between 12.7 and 92.4, inclusive. Or consider something like '(Ralph|Frank|Harry)', which matches any of the specified names. It might be desirable to be able to specify a pattern such as '(' 124 '|IBM|' 27 ')' to match 124, or "IBM", or 27. While simple examples like these are not hard to implement in current APL versions, regular expressions make it easy to perform much more complicated searches. Following Perl's lead and implementing search-and-replace as well as search would provide a powerful means of updating arrays. One application that immediately comes to mind is implementing a small relational database as a set of APL arrays. Regular expressions applicable to mixed data would make it easy to implement sophisticated queries and updates. My point regarding the use of operators in place of functions such as find (or index) was that these functions currently perform only a direct comparison (equality). There are times when it would be useful to perform a similar search based on more complex criteria, e.g. a regular expression search. I was just thinking that if these functions were implemented as operators the search function could be specified as an argument, allowing searches of arbitrary complexity. If the argument function was an intrinsic APL function, such as "=" or a future regular expression function, these searches could be performed far more efficiently than if you had to perform several separate searches, scanning an array multiple times and generating several temporary boolean arrays, and then combine the search results with "and" or "or". --- Brian
|
Sat, 17 Jan 2004 23:10:32 GMT |
|
 |
Jim Luca #8 / 28
|
 apl and regular expressions
Quote: > Perhaps the example was not very good.
I don't think the content was the problem so much as interpreting your notation. Quote: > ...just as the current find (epsilon underbar) is more useful > than the earlier #SS function,...
And I believe that #SS was limited to a single vendor, while {find} has now been implemented by all of them. Quote: > For example, the regular expression '[0-9]+' will find a > sequence of one or more digits; this could be extended > to allow an expression such as '[' 12.7 '-' 92.4 ']+' to > find a sequence of one or more numbers between 12.7 > and 92.4, inclusive. Or consider something like > '(Ralph|Frank|Harry)', which matches any of the > specified names. It might be desirable to be able to > specify a pattern such as '(' 124 '|IBM|' 27 ')' to match > 124, or "IBM", or 27.
I think the idea has merit. I'm less enthusiastic about the syntax. Though I see how it is directly derived by extending the established character-string regexp syntax to include heterogeneous arrays, I think even your simple examples are very difficult to "see" (i.e., for a human to interpret on sight), and a complete rethinking of the syntax/notation would be more productive. Specifically, the plethora of quotes makes it difficult to see which elements are enclosed in quotes and which are not, and the merging of the control syntax with the character arguments multiplies that difficulty. Even '(' 124 '|' 'IBM' '|' 27 ')' would be easier to read, in my opinion, than your above version. While I'm not prepared to propose a particular syntax without considerable further analysis, I think it would be much more intuitive (and more APL-ish, whatever that means) if the control syntax could be completely separated from the arguments, and hopefully not be in the form of quoted strings. E.g., the following might be one possibility, though I don't consider it to be a formal proposal: With a new syntactic class, distinguished by a notation like K's symbols (or a monadic version of J's "tie" or "gerund"), one could specify operations such as `| for "or" (actually, I'd prefer to use the APL {or}, instead of "|", but the traditional regexp symbol is easier to put in an email). Then your second example above could be (124 `| 'IBM' `| 27) (the parens assume this entire expression simply defines one argument of a larger expression), or even (`|/ 124 'IBM' 27) i.e., using or-reduction on (that segment of) the comparison. Your first example is more difficult, because APL lacks a notation for range-generation, so for the moment I'll invent it as "#". Then I could construct something like [12.7 `# 92.4] `/ (where `/ indicates potential replication), or (12.7 `# 92.4) `/ if there's no specific need to require a different kind of bracket in this particular syntax. One advantage of this sort of notation is that it would (I think) eliminate the necessity for escape-character conventions to distinguish between, e.g., "|" as an element of a string to be searched for and "|" (`| in this syntax) as constraint in the search specification. This is just off the top of my head. I haven't pursued all of its ramifications, and I'm actually not fluent enough in standard regexp syntax. But I hope it gives you some ideas, and I hope that you might come up with a syntax better than either it or what you have used above. What do you think? Quote: > While simple examples like these are > not hard to implement in current APL versions, > regular expressions make it easy to perform > much more complicated searches. Following > Perl's lead and implementing search-and-replace > as well as search would provide a powerful > means of updating arrays.
An attractive idea, but I think someone else already mentioned the main problem with this in APL: Replacing a string of one length with one of a different length within a matrix or higher-rank array requires some rule to specify which elements in one row should line up with (e.g., be in the same column as) which elements in other rows, if the resultant row lengths differ, or even if the replacements begin in different columns. Also, what to do with differing row lengths in any case? Fill to the length of the longest? LENGTH ERROR? Some additional syntax for specifying more complex rules? One might also ask if it shouldn't be possible to search (and possibly replace) for structures other than strings in a row. Searching for strings oriented on another axis -- e.g., down columns -- might require an axis specification, but can otherwise be simply derived from the row-oriented case by a pair of transposes. But what about searching for smaller submatrices in a large matrix. That would seem a natural thing to ask for in APL. How can this be included in a regexp syntax? Actually, I think the syntax I suggest above could do it (if it's not inconsistent for some other reason), by simply replacing the individual search elements/strings with names of variables that contain more complex arrays (e.g., matrices). However, certain operations (e.g., repetition) may either be invalid or require additional specification (e.g., repetition along which axis?). Presumably the unequal-length problem could be solved as in the search-for-vector case (*if* it can be solved *there*) by separate application on each axis, but it might be necessary to specify an order for the application. Quote: > One application that immediately comes to mind > is implementing a small relational database as > a set of APL arrays. Regular expressions > applicable to mixed data would make it easy to > implement sophisticated queries and updates.
I don't quite see why this can't already be done with current systems and nested arrays. Can you give us an example where regexp's would be a significant improvement or -- even better -- a necessity? Quote: > My point regarding the use of operators in > place of functions such as find (or index) was > that these functions currently perform only a > direct comparison (equality). There are times > when it would be useful to perform a similar > search based on more complex criteria, e.g. a > regular expression search. I was just thinking > that if these functions were implemented as > operators the search function could be specified > as an argument, allowing searches of arbitrary > complexity. If the argument function was an > intrinsic APL function, such as "=" or a future > regular expression function, these searches could > be performed far more efficiently than if you had > to perform several separate searches, scanning an > array multiple times and generating several > temporary boolean arrays, and then combine the > search results with "and" or "or".
I think I see your point, as in replacing "=" with ">" in such comparisons, which I think is easier to visualize than your suggestion of regular expressions, though I hope the same concept. Yes? But I'm not sure that turning {find} into an operator is necessarily the right way to go. Neither am I sure that it's not. I'll need more time to think about it. In the meantime, if you can give us a few specific examples of the syntax you have in mind, it might help. This is interesting. I'm looking forward to your response. /Jim
|
Sun, 18 Jan 2004 04:11:51 GMT |
|
 |
David Nes #9 / 28
|
 apl and regular expressions
I was about to go into a long piece disagreeing both the original post and with much of Lucas' response. However, given the likelihood of actual time being spent on implementation of any of this, I decided it wasn't worth it. So I'll just make a couple of brief observations instead. Quote:
> > Perhaps the example was not very good. > I don't think the content was the problem so much as interpreting your > notation.
My problem was not with either the example or the notation. I don't think that the underlying idea is very good. [snip of a long discussion] I find the idea `muddled'. The usefulness of the notion of `regular expressions' is rather more intimately linked to the `free form'-ness of text than has been considered in this discussion, and it is not a good surrogate for a form of `object' pattern matching which, while quite meaningful in the context of several other languages, is not of much importance (IMO) to domains where APL and its related languages shine. The idea is already implemented in J, and I would commend anyone thinking about it here to see how it has been handled there where it has (sensibly) been introduced in the context of character strings. As to its applicability to other data, J (and K) are mute, properly mute, AFAICS. As to `object matching', I would suggest that REBOL, PROLOG or even LISP would be a more fruitful approach than to attempt to accomplish something through some twisting of the RE concept into APL. As for RE's on strings, perl is, of course, the definitive language. It is perhaps revealing that even in perl, were RE's are a most important linguistic construct, they are pretty much _only_ applied to character strings, not to perl objects of other kinds. This is wisdom on the part of perl's implementors, not ignorance. In short, the idea strikes me as a non-starter for APL unless it is applied to character strings. Applied to character strings, it has some limited usefulness---as is demonstrated in J---where it has been available for a few years, and is occasionally used.
|
Sun, 18 Jan 2004 12:31:13 GMT |
|
 |
James J. Weinka #10 / 28
|
 apl and regular expressions
Quote:
> For example, the regular expression '[0-9]+' > will find a sequence of one or more digits; > this could be extended to allow an expression > such as '[' 12.7 '-' 92.4 ']+' to find a > sequence of one or more numbers between 12.7 > and 92.4, inclusive. Or consider something
I think the participants in this discussion are losing sight ow what a regular expression is and why various pattern matching facilities use the concept. In the example above, the notation [0-9] is shorthand or "syntactic sugar" for the regular expression 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 assuming, of course, that in the alphabet for the regular expressions at issue 0123456789 form a contiguous subsequence. [ 12.7 - 92.4 ] admits of no such simple interpretation. Trying to extend the alpahbet from a reasonably sized set of characters to an extremely large (albeit still finite (or are you meaning to include real numbers as well as machine representable ones?)) mixed bag of characters, numbers as used in APL (machine representable numbers - for the underlying machine) arrays? who knows what else? is ill conceived. Regular expressions are used as the basis of many string pattern matchers for the following reasons: 1. Regular expressions are sufficient to express many useful patterns users are likely to want to match 2. The underlying theory enables an extremely efficient implementation which has near linear performance in many practical cases. This is because a) there is an algorithm to convert any regular expression to a non-deterministic finite automaton that reconizes precisely the strings in the regular set corresponding to the given regular expression, b) there is another alogrithm that converts any non-deterministic finite automaton to a deterministic finite automaton that recognizes precisely the same set of strings, and c) a deterministic finite automation processes its input string in linear time. Note that for this to be useful it must be possible to generate the DFA in a reasonable amount of time (and space) compared to the time that it is in use. Otherwise it may be better to use a different approach. All of this goes out the window with the so called "extensions" being discussed here.
|
Sun, 18 Jan 2004 14:17:46 GMT |
|
 |
David Nes #11 / 28
|
 apl and regular expressions
Quote:
> I think the participants in this discussion are losing sight ow what a regular expression is and why > various pattern matching facilities use the concept. ... > Regular expressions are used as the basis of many string pattern matchers for the following reasons: > 1. Regular expressions are sufficient to express many useful patterns users are likely to want to > match > 2. The underlying theory enables an extremely efficient implementation which has near linear > performance in many practical cases.
... This is well said. I would only add, for completeness, some characterization of the sensible `domain' of REs---much more closely tied to character strings than has been clear in the earlier discussion.
|
Sun, 18 Jan 2004 21:25:50 GMT |
|
 |
Stefano Lanzavecchi #12 / 28
|
 apl and regular expressions
Quote: > This is well said. I would only add, for completeness, some > characterization of the sensible `domain' of REs---much more > closely tied to character strings than has been clear in the > earlier discussion.
While I won't argue with this statement which, in my opinion, is quite sensible, there's more to regular expressions than what hits the eye. First, a reference to something a bit unusual: http://citeseer.nj.nec.com/matz97regular.html Second, a couple of consideration from a purely theoretical point of view: regular expressions are just a way to represent patterns accepted by a language (in the abstract meaning of language, as sequence of symbols whose sentences are constrained by rules). For instance: http://grid.let.rug.nl/~hkuipers/scriptie/node37.html In particular, finite automata and regular expressions have the same expressive power and, in fact, as stated already, regular expression matchers are usually implemented translating the regexp into a finite automata (either table driven, or hard-encoded in a computer language of choice). Therefore it is correct that the domain of application of regular expressions is a string, but a string can be interpreted as an ordered set of symbols. An heterogeneous (but also a nested) array is, in fact, an ordered set of complex symbols. It's true that some of the shortcuts currently are harder to interpret in a domain like the real numbers ("[a-z]" is somewhat easier to expand than "[1.1-5.9]") but I won't comment on this since I am not an expert in grammars on an infinite set of symbols. Nevertheless, there is nothing implicitly impossible in the building of a finite automaton which would be able to match an interval over the real numbers and its efficiency wouldn't be considerably less (considering the implicit complexity of the domain) than that of a character matcher. --
Homepage: http://come.to/wildheart/ <<<Omae ga michi ni mayottara hohoemide yamiwo terasou --- When you get lost, I shall enlighten you with my smiles>>>
|
Sun, 18 Jan 2004 23:25:53 GMT |
|
 |
David Nes #13 / 28
|
 apl and regular expressions
Quote:
> > This is well said. I would only add, for completeness, some > > characterization of the sensible `domain' of REs---much more > > closely tied to character strings than has been clear in the > > earlier discussion. > While I won't argue with this statement which, in my opinion, is quite > sensible, there's more to regular expressions than what hits the eye.
[snip] Thank you. I'll return `the favor' by not arguing much with your characterization either, other than to make a couple of further obeservations... In my reading of the literature, Matz work didn't `go far' thus suggesting that while RE's _could_ be applied in this context, there wasn't a whole lot of gain to be had in doing so. On the other hand this domain of study is well out of my `normal scan range' so I may not know recent developments at all well. It all reminds me vaguely of some work we did in the early 70s (largely by my then student Zisman who later ran Lotus) using RE and Petri Nets to characterize communications flows. Second, central to (but perhaps only implicitly) my view is not that REs are necessarily _impossible_ to apply in APL domains, but rather that there is likely little gain to be had in doing so. From a broad brush overview, I find APLs, Js and Ks tend to be particularly productive for me in circumstances where there is some natural structuring to the data---stock ticks, organized data structures, ... and REs are pretty useless in most of these domains, other than as applied to original (external world) strings. I like REs a lot, and make heavy use of them, but it tends to be when I am doing string patterns in perl or searches in EMacs where the data that I am dealing with is often quite loosely structured. So I guess in summary I am making principally an `engineering point'. Theoretically, just about anything can be applied to just about anything. Some vague notion of REs can probably be stretched to apply to some form of APL structure. I simply regard the liklihood of doing so productively to be _very small_, and even if it were done it would still likely (IMO) be distinctly inferior to other solution domains where they are a much more natural construct.
|
Mon, 19 Jan 2004 02:55:25 GMT |
|
 |
James J. Weinka #14 / 28
|
 apl and regular expressions
Quote:
> .... Therefore it is correct that the domain of application of > regular expressions is a string, but a string can be interpreted as an > ordered set of symbols. An heterogeneous (but also a nested) array is, > in fact, an ordered set of complex symbols. It's true that some of the > shortcuts currently are harder to interpret in a domain like the real > numbers ("[a-z]" is somewhat easier to expand than "[1.1-5.9]") but I > won't comment on this since I am not an expert in grammars on an > infinite set of symbols. Nevertheless, there is nothing implicitly > impossible in the building of a finite automaton which would be able to > match an interval over the real numbers and its efficiency wouldn't be > considerably less (considering the implicit complexity of the domain) > than that of a character matcher.
An essential characteristic of a finite automaton is that its input alphabet is finite and fixed in advance. For the purposes of realizing the efficient implementation of recognizers mentioned in my previous post, finite can be taken to mean modest in size, no more than a few hundred symbols. Trying to consider heterogeneous strings of arbitrary apl objects makes the input alphabet the set of all possible apl objects. This is indeed a finite set but it is huge and doesn't lend itself to a straightforward ordering. It would be impractical in the extreme, I venture to say impossible, to construct a practical finite automaton over this input alphabet; and if it could be done, the resulting machine would be anything but efficient.
|
Mon, 19 Jan 2004 05:47:37 GMT |
|
 |
Brian McGuinne #15 / 28
|
 apl and regular expressions
Quote:
... > > For example, the regular expression '[0-9]+' will find a > > sequence of one or more digits; this could be extended > > to allow an expression such as '[' 12.7 '-' 92.4 ']+' to > > find a sequence of one or more numbers between 12.7 > > and 92.4, inclusive. Or consider something like > > '(Ralph|Frank|Harry)', which matches any of the > > specified names. It might be desirable to be able to > > specify a pattern such as '(' 124 '|IBM|' 27 ')' to match > > 124, or "IBM", or 27. > I think the idea has merit. I'm less enthusiastic about the syntax. Though I > see how it is directly derived by extending the established character-string > regexp syntax to include heterogeneous arrays, I think even your simple > examples are very difficult to "see" (i.e., for a human to interpret on sight), > and a complete rethinking of the syntax/notation would be more productive. > Specifically, the plethora of quotes makes it difficult to see which elements > are enclosed in quotes and which are not, and the merging of the control syntax > with the character arguments multiplies that difficulty. Even > '(' 124 '|' 'IBM' '|' 27 ')' > would be easier to read, in my opinion, than your above version. > While I'm not prepared to propose a particular syntax without considerable > further analysis, I think it would be much more intuitive (and more APL-ish, > whatever that means) if the control syntax could be completely separated from > the arguments, and hopefully not be in the form of quoted strings. E.g., the > following might be one possibility, though I don't consider it to be a formal > proposal: > With a new syntactic class, distinguished by a notation like K's symbols (or a > monadic version of J's "tie" or "gerund"), one could specify operations such as > `| for "or" (actually, I'd prefer to use the APL {or}, instead of "|", but the > traditional regexp symbol is easier to put in an email). Then your second > example above could be > (124 `| 'IBM' `| 27) > (the parens assume this entire expression simply defines one argument of a > larger expression), or even > (`|/ 124 'IBM' 27) > i.e., using or-reduction on (that segment of) the comparison. Your first > example is more difficult, because APL lacks a notation for range-generation, > so for the moment I'll invent it as "#". Then I could construct something like > [12.7 `# 92.4] `/ > (where `/ indicates potential replication), or > (12.7 `# 92.4) `/ > if there's no specific need to require a different kind of bracket in this > particular syntax. > One advantage of this sort of notation is that it would (I think) eliminate the > necessity for escape-character conventions to distinguish between, e.g., "|" as > an element of a string to be searched for and "|" (`| in this syntax) as > constraint in the search specification. > This is just off the top of my head. I haven't pursued all of its > ramifications, and I'm actually not fluent enough in standard regexp syntax. > But I hope it gives you some ideas, and I hope that you might come up with a > syntax better than either it or what you have used above. What do you think?
I have no objections to changing the notation if it will make the expression clearer without losing functionality. Introducing a new syntactic class just to support one language extension makes me a little queasy; if this class is useful for much more than simply expressing regular expressions cleanly then I don't mind. Quote: > > While simple examples like these are > > not hard to implement in current APL versions, > > regular expressions make it easy to perform > > much more complicated searches. Following > > Perl's lead and implementing search-and-replace > > as well as search would provide a powerful > > means of updating arrays. > An attractive idea, but I think someone else already mentioned the main problem > with this in APL: Replacing a string of one length with one of a different > length within a matrix or higher-rank array requires some rule to specify which > elements in one row should line up with (e.g., be in the same column as) which > elements in other rows, if the resultant row lengths differ, or even if the > replacements begin in different columns. Also, what to do with differing row > lengths in any case? Fill to the length of the longest? LENGTH ERROR? Some > additional syntax for specifying more complex rules?
If the operation can be considered to temporarily break the array into subarrays, then then operation of rejoining these into a single array could be done in the same way as tolerant disclose (e.g. '>' in Sharp APL) by padding as necessary. Quote: > One might also ask if it shouldn't be possible to search (and possibly replace) > for structures other than strings in a row. Searching for strings oriented on > another axis -- e.g., down columns -- might require an axis specification, but > can otherwise be simply derived from the row-oriented case by a pair of > transposes. But what about searching for smaller submatrices in a large > matrix. That would seem a natural thing to ask for in APL. How can this be > included in a regexp syntax? Actually, I think the syntax I suggest above > could do it (if it's not inconsistent for some other reason), by simply > replacing the individual search elements/strings with names of variables that > contain more complex arrays (e.g., matrices).
The notation should allow not just variable names but arbitrary APL expressions that generate arrays to be specified in place of constants. For example, (`|/ 124 'IBM' (3 9 {rho}{iota}27)) Quote: > However, certain operations > (e.g., repetition) may either be invalid or require additional specification > (e.g., repetition along which axis?). Presumably the unequal-length problem > could be solved as in the search-for-vector case (*if* it can be solved > *there*) by separate application on each axis, but it might be necessary to > specify an order for the application.
Perhaps this could be solved by appending an ordered axis list, e.g. *{1,3,2} would mean search along axis 1, then axis 3, then axis 2. The braces might be replaced by some other symbol if we wanted to use them for specifying the number of times to match a subpattern, as in Perl. Quote: > > One application that immediately comes to mind > > is implementing a small relational database as > > a set of APL arrays. Regular expressions > > applicable to mixed data would make it easy to > > implement sophisticated queries and updates. > I don't quite see why this can't already be done with current systems and > nested arrays. Can you give us an example where regexp's would be a > significant improvement or -- even better -- a necessity?
Regexps simplify searches for patterns that are more complex than a constant string, e.g. suppose we want to find each location in an array where a group of five consecutive numbers greater than 1000 or less than 5 is followed by a character string starting with 'A', 'B', or 'C', and then a number between 10 and 20. This can be done in current APL systems, but could be specified much more easily as a regular expression, reducing the likelihood of errors. Moreover, a standard APL solution would require scanning the array multiple times and generate several temporary boolean arrays. The use of a regular expression would reduce the number of scans required and eliminate the need for potentially huge temporary arrays. Quote: > > My point regarding the use of operators in > > place of functions such as find (or index) was > > that these functions currently perform only a > > direct comparison (equality). There are times > > when it would be useful to perform a similar > > search based on more complex criteria, e.g. a > > regular expression search. I was just thinking > > that if these functions were implemented as > > operators the search function could be specified > > as an argument, allowing searches of arbitrary > > complexity. If the argument function was an > > intrinsic APL function, such as "=" or a future > > regular expression function, these searches could > > be performed far more efficiently than if you had > > to perform several separate searches, scanning an > > array multiple times and generating several > > temporary boolean arrays, and then combine the > > search results with "and" or "or". > I think I see your point, as in replacing "=" with ">" in such comparisons, > which I think is easier to visualize than your suggestion of regular > expressions, though I hope the same concept. Yes? But I'm not sure that > turning {find} into an operator is necessarily the right way to go. Neither am > I sure that it's not. I'll need more time to think about it. In the meantime, > if you can give us a few specific examples of the syntax you have in mind, it > might help. > This is interesting. I'm looking forward to your response. > /Jim
This is just an idea I was kicking around. It seems to me that using an operator allows the find operation to be customized easily and allows the same symbol to be used for all types of similar searches. I was thinking that one would type something like leftarg ={epsilon_underbar} rightarg for the standard find operation or leftarg MYCOMPARE {epsilon_underbar} rightarg for a customized search using the user-written dyadic APL function MYCOMPARE. The function would return a boolean explicit result based on its two arguments and could be arbitrarily complex. --- Brian
|
Mon, 19 Jan 2004 22:51:28 GMT |
|
|
Page 1 of 2
|
[ 28 post ] |
|
Go to page:
[1]
[2] |
|