String manipulation in CLISP
Author |
Message |
Weiguang S #1 / 21
|
String manipulation in CLISP
Hi, It seems to me that the CLISP that I am learning lacks string manipulation functions (please do correct me if I am wrong). I am wondering particularly if there is a function that can split all the space-separated words in a string, which might be just read through (setq string (read-line)). into an array or, even better, a list. Thanks very much Weiguang
|
Sat, 02 Aug 2003 10:49:45 GMT |
|
|
Vebjorn Ljos #2 / 21
|
String manipulation in CLISP
| | It seems to me that the CLISP that I am learning lacks string | manipulation functions (please do correct me if I am wrong). I am | wondering particularly if there is a function that can split all the | space-separated words in a string, which might be just read through | | (setq string (read-line)). | | into an array or, even better, a list. Common Lisp has a plethora of functions for manipulating strings and other sequences. a string is a sequence, so the functions for manipulating sequences can also be used for manipulating strings. I recommend reading the appendix to Graham's "ANSI Common Lisp" from beginning to end. then make it a habit to look up details in the Common Lisp Hyperspec. here's a function which uses POSITION and SUBSEQ to do what you want: (defun split-sequence (sequence &key (separator #\space)) (loop with start = 0 for end = (position separator sequence :start start) collect (subseq sequence start end) until (null end) do (setf start (1+ end)))) -- Vebjorn
|
Sat, 02 Aug 2003 18:40:44 GMT |
|
|
Marco Antoniott #3 / 21
|
String manipulation in CLISP
Quote:
> Hi, > It seems to me that the CLISP that I am learning lacks string manipulation > functions (please do correct me if I am wrong). I am wondering particularly if > there is a function that can split all the space-separated words in a string, > which might be just read through > (setq string (read-line)). > into an array or, even better, a list.
Come on. First of all let's think before we ask such a question. Suppose you have a string (usually a line form a file) which contains blanks as "field" separators. Suppose the fields are numbers. The simple solution to your question is (setf the-fields (read-from-string (concatenate 'string "(" (read-line stream) ")"))) Now 'the-fields' will contain a list of NUMBERs. "Look ma', no parsing!" Of course if you want a much more general solution, SPLIT-SEQUENCE is just 5 or 6 lines away :) Cheers -- Marco Antoniotti ============================================================= NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://galt.mrl.nyu.edu/valis Like DNA, such a language [Lisp] does not go out of style. Paul Graham, ANSI Common Lisp
|
Sun, 03 Aug 2003 00:09:41 GMT |
|
|
Tim Bradsha #4 / 21
|
String manipulation in CLISP
Quote:
> Come on. First of all let's think before we ask such a question. > Suppose you have a string (usually a line form a file) which contains > blanks as "field" separators. Suppose the fields are numbers. The > simple solution to your question is > (setf the-fields > (read-from-string (concatenate 'string > "(" > (read-line stream) > ")"))) > Now 'the-fields' will contain a list of NUMBERs. "Look ma', no > parsing!"
This is the Lisp equivalent of the buffer-overflow problems which are so pervasive in C. --tim
|
Sun, 03 Aug 2003 01:04:49 GMT |
|
|
Weiguang S #5 / 21
|
String manipulation in CLISP
Quote:
>Common Lisp has a plethora of functions for manipulating strings and >other sequences. a string is a sequence, so the functions for >manipulating sequences can also be used for manipulating strings. >I recommend reading the appendix to Graham's "ANSI Common Lisp" from >beginning to end. then make it a habit to look up details in the >Common Lisp Hyperspec.
Thanks. I will. Quote: >here's a function which uses POSITION and SUBSEQ to do what you want: >(defun split-sequence (sequence &key (separator #\space)) > (loop > with start = 0 > for end = (position separator sequence :start start) > collect (subseq sequence start end) > until (null end) > do > (setf start (1+ end))))
Thanks. It worked! Weiguang
|
Sun, 03 Aug 2003 01:02:49 GMT |
|
|
Sashank Var #6 / 21
|
String manipulation in CLISP
Quote:
[snip] >> there is a function that can split all the space-separated words in a string, >> which might be just read through >> (setq string (read-line)). [snip] > (setf the-fields > (read-from-string (concatenate 'string > "(" > (read-line stream) > ")"))) >Now 'the-fields' will contain a list of NUMBERs. "Look ma', no >parsing!"
i have taken to rendering this trick as: (setf the-fields (read-from-string (format nil "(~A)" (read-line stream)))) sashank
|
Sun, 03 Aug 2003 23:36:41 GMT |
|
|
Tim Bradsha #7 / 21
|
String manipulation in CLISP
Quote:
> (setf the-fields (read-from-string (format nil "(~A)" (read-line stream))))
Sometimes I feel that I must be some kind of obsessive loony, but whenever I see this kind of thing it just makes me terrified. If I was going to tokenise some string using this technique then before I did so I'd want to go through the string character by character to check it had no bad things in it. While I was doing this I'd tokenise it, since this adds about a line to the checking code. Then I'd take out most of the checks because if you're not using READ you don't need to be so paranoid. I'm not saying that READ doesn't have its place -- indeed I wrote a whole bunch of stuff a while ago about making READ safe(r), and I regularly use READ/PRINT as a cheap, sane way of doing what XML does in an expensive, insane way. but I think if you want to split some string on whitespace you should, well, split it on whitespace. It's like saying that the way to split a string is to use an XML parser. (Of course this is probably *exactly* the kind of insanity that the XML {*filter*} has in mind fo us all, but never mind that.) I guess partly this horror comes from the fact that in my real life I'm a systems person, and so I have to update some security-critical package to fix some buffer-overflow vulnerability approximately once a week. So I'm kind of biassed I guess. --tim
|
Mon, 04 Aug 2003 01:10:43 GMT |
|
|
Pierre R. Ma #8 / 21
|
String manipulation in CLISP
Quote:
> > (setf the-fields (read-from-string (format nil "(~A)" (read-line stream)))) > Sometimes I feel that I must be some kind of obsessive loony, but > whenever I see this kind of thing it just makes me terrified. If I
FWIW I feel the same way about such uses of read, so that would make at least two obsessive loonies... ;) Regs, Pierre. --
The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents. -- Nathaniel Borenstein
|
Mon, 04 Aug 2003 02:14:32 GMT |
|
|
Marco Antoniott #9 / 21
|
String manipulation in CLISP
Quote:
> > > (setf the-fields (read-from-string (format nil "(~A)" (read-line stream)))) > > Sometimes I feel that I must be some kind of obsessive loony, but > > whenever I see this kind of thing it just makes me terrified. If I > FWIW I feel the same way about such uses of read, so that would make > at least two obsessive loonies... ;)
Well. I agree. But the truth is that we do not have a *PORTABLE* Lex for Common Lisp. So, if you know that pretty much what is on the line fits what can be fed to READ, it is simpler to go ahead and do the dirty thing. Cheers. -- Marco Antoniotti ============================================================= NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://galt.mrl.nyu.edu/valis Like DNA, such a language [Lisp] does not go out of style. Paul Graham, ANSI Common Lisp
|
Mon, 04 Aug 2003 06:32:32 GMT |
|
|
Tim Bradsha #10 / 21
|
String manipulation in CLISP
Quote:
> Well. I agree. But the truth is that we do not have a *PORTABLE* Lex > for Common Lisp. So, if you know that pretty much what is on the line > fits what can be fed to READ, it is simpler to go ahead and do the > dirty thing.
But this is just the danger I'm terrified of. The people who wrote sendmail/BIND/blah _knew_ that the stuff they had fit within a buffer of length x and it was all just OK, so they just wrote the obvious code. Unfortunately someone else knew that they knew this too... Lisp doesn't have these issues, thank God, but it has other issues, and one of them is using READ on data about which you don't know enough. Unless you have a suitably armour-plated READ (and I think such a thing is more-or-less possible to create) then knowing `pretty much' what the data is still leaves you vulnerable to catastrophic magazine explosions. Even when it doesn't leave you at the bottom of the North sea you still have the problem that it's much too general -- the original question was to tokenize a string -- gratuitously parsing it into symbols, numbers &c may not be what you want. This isn't to say that I don't think READ has its uses -- since it does all the interesting bits of XML it obvously has uses, I'm just a bit wary of people saying it's a good way of doing things which it's really not good at, and which might leave you vulnerable to bad problems when used without considerable understanding. --tim
|
Mon, 04 Aug 2003 21:23:51 GMT |
|
|
Hrvoje Niksi #11 / 21
|
String manipulation in CLISP
Quote:
> Lisp doesn't have these issues, thank God, but it has other issues, > and one of them is using READ on data about which you don't know > enough. Unless you have a suitably armour-plated READ (and I think > such a thing is more-or-less possible to create)
What's the big deal with READ, if you disable the obvious `#.'?
|
Mon, 04 Aug 2003 21:26:09 GMT |
|
|
Tim Bradsha #12 / 21
|
String manipulation in CLISP
Quote:
> What's the big deal with READ, if you disable the obvious `#.'?
disabling #. is the most important thing. Other issues are things like it can return cirular structure -- (mapcar ... (read ...)) may fail to terminate. It may intern symbols in random packages &c which may not be desirable. You need to be very sure you know everything about your readtable and the reader-control variables. If you're using it for some constrained purpose, you need to check ruthlessly that what it returns is something like what you expect it to return (if you expect a compound object, this check probably needs to do an occurs check...). As I said, an armour-plated READ is possible, I think -- there was a thread a while ago (last year? maybe 1999) where I think I posted something that claimed to be such a thing. The trick is to control the readtable &c, and then to do a walk over the result to check it's `good'. And you have to trust the implementation's READ not to blow up in bad ways -- I have a scheme to write a test-harness that fires random data at READ for a few hours, but I've not done that yet -- this should not be a huge problem anyway. --tim
|
Mon, 04 Aug 2003 21:39:58 GMT |
|
|
Marco Antoniott #13 / 21
|
String manipulation in CLISP
Quote:
> > Well. I agree. But the truth is that we do not have a *PORTABLE* Lex > > for Common Lisp. So, if you know that pretty much what is on the line > > fits what can be fed to READ, it is simpler to go ahead and do the > > dirty thing. > But this is just the danger I'm terrified of. The people who wrote > sendmail/BIND/blah _knew_ that the stuff they had fit within a buffer > of length x and it was all just OK, so they just wrote the obvious > code. Unfortunately someone else knew that they knew this too...
... Quote: > This isn't to say that I don't think READ has its uses -- since it > does all the interesting bits of XML it obvously has uses, I'm just a > bit wary of people saying it's a good way of doing things which it's > really not good at, and which might leave you vulnerable to bad > problems when used without considerable understanding.
Look, I agree with you. But the problem remains. If you want to really "parse" something, you have - at least - to (1) build some form of AST, and (2) check that the parsed input matches a given specification. Shorter than that you are "unsafe" one way or the other. I actually judged the intentions of the original poster, by picking up the lurking argument: "how come I can do this in Perl and the latest wheel on the block avec forced indentation, and I cannot do it in CL?" :) Cheers -- Marco Antoniotti ============================================================= NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://galt.mrl.nyu.edu/valis Like DNA, such a language [Lisp] does not go out of style. Paul Graham, ANSI Common Lisp
|
Mon, 04 Aug 2003 23:08:49 GMT |
|
|
Pierre R. Ma #14 / 21
|
String manipulation in CLISP
Quote:
> Well. I agree. But the truth is that we do not have a *PORTABLE* Lex > for Common Lisp. So, if you know that pretty much what is on the line > fits what can be fed to READ, it is simpler to go ahead and do the > dirty thing.
In the context of this thread, where the goal was to split a string at whitespace boundaries, I don't think anyone uses LEX in the C world, unless all of this is part of some larger task, which does warrant the use of a full-blown lexer generator. So whether a portable lexer generator is available or not for CL seems beside the point. It is very, very easy to just do the right thing in CL with 4 or 5 lines of code, which is not much more verbose than read, but obviously much less problematic. Furthermore there are numerous implementations of portable partitioning and splitting functions that have been posted to c.l.l over the course of the years, which will reliably solve this and related problems in a simple one-liner. Now if we were talking about writing a complete lexer for a complex input language, together with a corresponding parser, that would be a completely different matter, and in that context I could see the use of either some lexer/parser generator (like e.g. Zebu) or indeed a suitably set-up call to read. Regs, Pierre. --
The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents. -- Nathaniel Borenstein
|
Tue, 05 Aug 2003 01:38:05 GMT |
|
|
Johann Hibschma #15 / 21
|
String manipulation in CLISP
Quote: Marco Antoniotti writes: > I actually judged the intentions of the original poster, by picking up > the lurking argument: "how come I can do this in Perl and the latest > wheel on the block avec forced indentation, and I cannot do it in CL?" :)
Speaking of which, there is a split-string function in the CLOCC, at http://clocc.sourceforge.net. I haven't used it, so I can't comment on its efficiency, but it's there. To start a new subthread, what string functions would people want? I volunteer to collect any code and forward it to the CLOCC people, for possible inclusion. (I've never dealt with them, so I don't know how hard it is to get anything included.) The obvious string methods that we're missing are strip (remove whitespace), rstrip, lstrip (left and right ends), transform (needs some concept of character sets), find, concatenate (yes, it exists, but string-cat is a nice abbreviation for (concatenate 'string ...)), and so on. I should look at Olin Shiver's Scheme string utilities; they're well-designed, even if I susupect that judicious use of keyword arguments would help them quite a bit. --
|
Tue, 05 Aug 2003 03:09:43 GMT |
|
|
Page 1 of 2
|
[ 21 post ] |
|
Go to page:
[1]
[2] |
|