Joys of eval 
Author Message
 Joys of eval

Hi Albert,


Quote:

> A few weeks ago I posted a request for help with regexp, split, scan, et.
> al., in an attempt to separate a comma delimited string into its component
> fields.  I just stumbled upon an obvious but previously unseen solution to
> the problem:

> x = '"abc","de,f",123,4.56,"ghi"'
> y = eval('['<<x<<']')

> y is now:  ["abc", "de,f", 123, 4.56, "ghi"]

Just wondering if the regexp version had indeed worked out OK, or have
you encountered data since then it couldn't handle?

(The one in [ruby-talk:22534] didn't handle embedded quotes for instance;
the one in [ruby-talk:22537] can handle embedded quotes, but, I suspect,
probably not the way something like MSAccess might output them, given
what was learned about Excel's seemingly strange way of quoting in
[ruby-talk:22541] . . . :-)

By the way I recently discovered that the flatten!.compact! used in
the code referenced above should be changed to flatten.compact without
the exclamation points.  Or at least the 'flatten' at any rate - because
the 'flatten!' version "returns nil if no modifications were made."
(Pickaxe, Ch. 22)  Whereas, we need the resulting array always, modified
or not, to pass on to 'compact'.

If you have need of handling embedded quotes in the CSV parsing, perhaps
you could post some examples here of the quoting syntax MSAccess uses to
encode the data?  I'm pretty sure the regexp can be adapted to handle it.

For instance, Excel had taken:

+-----+---------------+
|blerb|"hey,you","man,|
+-----+---------------+

And had written (to the .csv file):

blerb,"""hey,you"",""man,"

. . . But, odd-looking or not, this appears well within the realm of
regexps to handle.  :-)

Regards,

Bill



Fri, 23 Apr 2004 04:25:21 GMT  
 Joys of eval
Hi, Bill.

Yes, I have been using your regexp with no problems.  It just had finally
noticed that Ruby was parsing the contents of an array almost exactly like a
CSV file.  So I started playing with eval.  I still haven't tried the regexp
on a significant sample of files.  I'll get back to you when I get into
trouble.  I still think it is a cool regexp.  Thanks again.


Quote:
> Hi Albert,

> Just wondering if the regexp version had indeed worked out OK, or have
> you encountered data since then it couldn't handle?

> (The one in [ruby-talk:22534] didn't handle embedded quotes for instance;
> the one in [ruby-talk:22537] can handle embedded quotes, but, I suspect,
> probably not the way something like MSAccess might output them, given
> what was learned about Excel's seemingly strange way of quoting in
> [ruby-talk:22541] . . . :-)

> By the way I recently discovered that the flatten!.compact! used in
> the code referenced above should be changed to flatten.compact without
> the exclamation points.  Or at least the 'flatten' at any rate - because
> the 'flatten!' version "returns nil if no modifications were made."
> (Pickaxe, Ch. 22)  Whereas, we need the resulting array always, modified
> or not, to pass on to 'compact'.

> If you have need of handling embedded quotes in the CSV parsing, perhaps
> you could post some examples here of the quoting syntax MSAccess uses to
> encode the data?  I'm pretty sure the regexp can be adapted to handle it.

> For instance, Excel had taken:

> +-----+---------------+

> |blerb|"hey,you","man,|

> +-----+---------------+

> And had written (to the .csv file):

> blerb,"""hey,you"",""man,"

> . . . But, odd-looking or not, this appears well within the realm of
> regexps to handle.  :-)

> Regards,

> Bill



Fri, 23 Apr 2004 04:40:12 GMT  
 Joys of eval

Quote:

> > Hi, I'm somewhat of a regexp-enjoying fiend myself; but after
> > Randal Schwartz posted this horrendous thing last March:
> > http://www.ruby-talk.com/cgi-bin/scat.rb/ruby/ruby-talk/12815
> > I've been somewhat wary whenever someone mentions RFC822.  :-)

> Instead of the obfuscated code layout of
>    http://www.ruby-talk.com/cgi-bin/scat.rb/ruby/ruby-talk/12815
> it's definitely better for maintenance, debugging, updtaing,
> refactoring, to write complex regexen in a kinda BNF style:
>    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65125

> P.S. (they discussed that there in the first URL, but the output still
> is a mess.)

What is this email beyond simple Perl bashing? or if not Perl bashing,
just a pointless mail? you say you prefer reading source code instead of
compiler output???

You're comparing a *generated* regexp with a regexp *generator* and you
somehow say the former is "obfuscated", but it was never meant to be
hand-edited or even looked at. I haven't read the mentioned Perl book but
it seems to me that the former result was generated with a program very
similar to the latter program.

I've already seen the regexp-interpolation trick (of the python
article) used in Perl and I've used it myself in Ruby too.

What I've come up with in Ruby ended up to be a bit different:

class Regexp
      def to_s  
              x=inspect
              x[1,x.length-2]
      end
      def +(b); Regexp.new "#{a}#{b}"; end
      def -(b); Regexp.new "(?!#{b})#{self}"; end
      def &(b); Regexp.new "(?=#{b})#{self}"; end
      def |(b); Regexp.new "(?:#{self}|#{b})"; end
      def *(n); Regexp.new "(?:#{self}){#{n},#{n}}"; end
end

Which allows some infix arithmetic on regexps, respectively: sequence,
negative lookahead, positive lookahead, alternative, fixed repetition.

This could also be done in Perl 5.6 (5.5?) using qr// (which is the
equivalent of Ruby's //) and operator overloading (which is the contorted
equivalent of Ruby's method def with a non-alphanumeric name).

I think Python is quite capable of all this stuff (except for the longish
re.compile(), but it's easy to get around that, as shown in the cited
example)

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Fri, 23 Apr 2004 23:33:03 GMT  
 Joys of eval
Quote:
----- Original Message -----




Sent: Monday, November 05, 2001 9:33 AM
Subject: [ruby-talk:24395] Re: Joys of eval

> What I've come up with in Ruby ended up to be a bit different:

> class Regexp
>       def to_s
>               x=inspect
>               x[1,x.length-2]
>       end
>       def +(b); Regexp.new "#{a}#{b}"; end
>       def -(b); Regexp.new "(?!#{b})#{self}"; end
>       def &(b); Regexp.new "(?=#{b})#{self}"; end
>       def |(b); Regexp.new "(?:#{self}|#{b})"; end
>       def *(n); Regexp.new "(?:#{self}){#{n},#{n}}"; end
> end

> Which allows some infix arithmetic on regexps, respectively: sequence,
> negative lookahead, positive lookahead, alternative, fixed repetition.

That's very interesting. I had been thinking about
some ways to enhance regexes, but this is a new
technique to me.

Do you have examples of using these?

Hal



Sat, 24 Apr 2004 04:46:15 GMT  
 Joys of eval

Quote:
> That's very interesting. I had been thinking about
> some ways to enhance regexes, but this is a new
> technique to me.

An alternative approach is distributed as eregexp with Ruby.

Dave



Sat, 24 Apr 2004 04:51:36 GMT  
 Joys of eval

Quote:
----- Original Message -----


Sent: Monday, November 05, 2001 2:49 PM
Subject: [ruby-talk:24423] Re: Joys of eval


> > That's very interesting. I had been thinking about
> > some ways to enhance regexes, but this is a new
> > technique to me.

> An alternative approach is distributed as eregexp with Ruby.

It's called eregex.rb (no p) on my system.

Never noticed that one before.

matju's version seems a little more sophisticated to me.

I don't normally use giant regexes anyway... though if they
were easier to manage, I might use bigger ones.

Has anyone ever tried to produce static software metrics
for regular expressions? It might be amusing to quantify
their hairiness.

"I have complete confidence in this mission, Dave."

Hal



Sat, 24 Apr 2004 05:12:53 GMT  
 Joys of eval

Quote:

> > Which allows some infix arithmetic on regexps, respectively: sequence,
> > negative lookahead, positive lookahead, alternative, fixed repetition.
> That's very interesting. I had been thinking about
> some ways to enhance regexes, but this is a new
> technique to me.
> Do you have examples of using these?

No, I deleted all relevant code because it didn't work properly. To make
any use of the Regexp class worthwhile to me, Regexps should support
embedded Ruby expressions, access to info like which branch of an
alternative was chosen, failure handling, and continuations based on the
availability of data.

So I decided to go the hand-crafted parser way because I didn't want to
use a parser generator for that. (For anything bigger I would've used a
parser generator, or I'd have used regexps but with the restriction that
the whole file must be loaded as a string beforehand).

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Sat, 24 Apr 2004 08:19:29 GMT  
 Joys of eval
Hello --

Quote:


> > > Which allows some infix arithmetic on regexps, respectively: sequence,
> > > negative lookahead, positive lookahead, alternative, fixed repetition.
> > That's very interesting. I had been thinking about
> > some ways to enhance regexes, but this is a new
> > technique to me.
> > Do you have examples of using these?

> No, I deleted all relevant code because it didn't work properly. To make
> any use of the Regexp class worthwhile to me, Regexps should support
> embedded Ruby expressions, access to info like which branch of an
> alternative was chosen, failure handling, and continuations based on the
> availability of data.

Well, one out of four ain't bad :-)

  irb 10> a = "abc"
   ==>"abc"
  irb 11> /#{"abcd".chop}/.match(a)[0]
   ==>"abc"

David

--
David Alan Black


Web:  http://pirate.shu.edu/~blackdav



Sat, 24 Apr 2004 10:16:47 GMT  
 Joys of eval

Quote:


> > No, I deleted all relevant code because it didn't work properly. To make
> > any use of the Regexp class worthwhile to me, Regexps should support
> > embedded Ruby expressions, access to info like which branch of an
> > alternative was chosen, failure handling, and continuations based on the
> > availability of data.
> Well, one out of four ain't bad :-)
>   irb 10> a = "abc"
>    ==>"abc"
>   irb 11> /#{"abcd".chop}/.match(a)[0]
>    ==>"abc"

That's zero of four. You see, in another post today I've demonstrated
heavy use of regexp-interpolation, but that's still not what I mean by
embedded Ruby expressions. I mean executing code each time a match is
found; in particular, binding a block to each pair of (collecting)
parentheses. That way I wouldn't get only the last match; I could get all
the matches instead, and perform some action as they are found.

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Sat, 24 Apr 2004 11:16:13 GMT  
 Joys of eval
Hello --

Quote:



> > > No, I deleted all relevant code because it didn't work properly. To make
> > > any use of the Regexp class worthwhile to me, Regexps should support
> > > embedded Ruby expressions, access to info like which branch of an
> > > alternative was chosen, failure handling, and continuations based on the
> > > availability of data.
> > Well, one out of four ain't bad :-)
> >   irb 10> a = "abc"
> >    ==>"abc"
> >   irb 11> /#{"abcd".chop}/.match(a)[0]
> >    ==>"abc"

> That's zero of four. You see, in another post today I've demonstrated
> heavy use of regexp-interpolation, but that's still not what I mean by
> embedded Ruby expressions. I mean executing code each time a match is
> found; in particular, binding a block to each pair of (collecting)
> parentheses. That way I wouldn't get only the last match; I could get all
> the matches instead, and perform some action as they are found.

I imagine this isn't what you meant either, but just to clarify for
me... what about the block form of String#scan?

David

--
David Alan Black


Web:  http://pirate.shu.edu/~blackdav



Sat, 24 Apr 2004 19:47:04 GMT  
 Joys of eval

D> I imagine this isn't what you meant either, but just to clarify for
D> me... what about the block form of String#scan?

 Probably wrong but I think that he want something like (?{})

Guy Decoux



Sat, 24 Apr 2004 19:57:40 GMT  
 Joys of eval
Hi Rik,


Quote:

> The delimeter set is variable with each invocation, but never
> '"', '(' or '\' (which are handled specially be the tokenizer
> anyway)

> Yes, you can have \" within ""

> I'm actually down to only three calls in the whole parser, having
> replaced everything else with regexps while porting to Ruby.




> Note: last two params mean 'skip comments' ( anything in () )
>       and 'quoted tokens' (anything in "" should be treated
>       as a token, keep the surrounding quotes)

> e.g.

> Delimiter:  .
> Input:      The.(quick)."brown.fox".jumps.("over").the."lazy.\"dog\""
> Output:     <The> <"brown.fox"> <jumps> <the> <"lazy."dog"">

Here's an approach that constructs a tokenizer object given the
delimiter string and skipComments/quotedTokens flags, which can
then be used repeatedly to tokenize strings.

Note, I haven't tried any timing tests with it.  I'd be curious
how much time the x.gsub! part adds to the process, for instance...

Also, the last couple unit tests document its current behavior
in a couple situations where I wasn't sure what the desired
behaviour necessarily was.

Regards,

Bill

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require 'runit/testcase'
require 'runit/cui/testrunner'

class Regexp
    def to_s; inspect[1..-2]; end  # allow easy compositing of regexps
end

class Tokenizer
    ZapBackslashes = /\\(.)/

    def initialize(delimStr, skipComments, quotedTokens)

    end

    def tokenize(str)

        tokens.each {|x| x.gsub!(ZapBackslashes, '\1')}
    end

    def construct_regexp(delimStr, skipComments, quotedTokens)
        tokenCh = /(?:\\.|[^#{delimStr}])/
        quotedTokenCh = /(?:\\.|[^"])/
        commentTokenCh = /(?:\\.|[^)])/
        comment = /\(#{commentTokenCh}*\)/
        if quotedTokens
            token = /"#{quotedTokenCh}*"|#{tokenCh}+/
        else
            token = /#{tokenCh}+/
        end
        if skipComments
            /(?:#{comment}|(#{token}))/
        else
            /(#{comment}|#{token})/
        end
    end
end

class TestTokenizer < RUNIT::TestCase
    def testTokenizer

        # test stripping comments
        t = Tokenizer.new(".", true, true)
        a = t.tokenize 'The.(quick)."brown.fox".jumps.("over").the."lazy.\"dog\""'
        assert a == ['The', '"brown.fox"', 'jumps', 'the', '"lazy."dog""']

        # test kept comments
        t = Tokenizer.new(".", false, true)
        a = t.tokenize 'The.(quick)."brown.fox".jumps.("over").the."lazy.\"dog\""'
        assert a == ['The', '(quick)', '"brown.fox"', 'jumps', '("over")', 'the', '"lazy."dog""']

        # test escaping
        a = t.tokenize 'ignore.\"escaped.quotes\".here'
        assert a == ['ignore', '"escaped', 'quotes"', 'here']
        a = t.tokenize 'ignore.\(escaped.comment\).here'
        assert a == ['ignore', '(escaped', 'comment)', 'here']
        a = t.tokenize 'before."embedded \"escaped quote".after'
        assert a == ['before', '"embedded "escaped quote"', 'after']
        a = t.tokenize 'before.(embedded \)escaped close-comment).after'
        assert a == ['before', '(embedded )escaped close-comment)', 'after']

        # test backslash de-escaping
        a = t.tokenize %Q{hello\\"there\\\\this\\\\\\\\morning}
        assert a == [%Q{hello"there\\this\\\\morning}]

        # test multiple delimiter chars
        t = Tokenizer.new(" \\r\\n", false, true)
        a = t.tokenize %Q{   this  \n(is\ra \n) \r "\n test "\r\n}
        assert a == ['this', "(is\ra \n)", %Q{"\n test "}]

        # does quote-handling need to happen within a comment?
        # this test assumes not:
        t = Tokenizer.new(".", false, true)
        a = t.tokenize 'ruby.("my)"dear)"'
        assert a == ['ruby', '("my)', '"dear)"']

        # what's the expected behavior when quotedTokens==false?
        # here's the current behavior:
        t = Tokenizer.new(".", false, false)
        a = t.tokenize 'quotes."not.special".here'
        assert a == ['quotes', '"not', 'special"', 'here']
    end
end

RUNIT::CUI::TestRunner.run(TestTokenizer.suite)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Sun, 25 Apr 2004 01:55:04 GMT  
 Joys of eval
#if Bill Kelly

Quote:
> Here's an approach that constructs a tokenizer object given the
> delimiter string and skipComments/quotedTokens flags, which can
> then be used repeatedly to tokenize strings.
> [...]

Great, thanks. I'll try it out and see what happens.

Rik



Sun, 25 Apr 2004 02:43:09 GMT  
 Joys of eval

Quote:

> Hi, I'm somewhat of a regexp-enjoying fiend myself; but after
> Randal Schwartz posted this horrendous thing last March:
> http://www.ruby-talk.com/cgi-bin/scat.rb/ruby/ruby-talk/12815
> I've been somewhat wary whenever someone mentions RFC822.  :-)

Instead of the obfuscated code layout of
   http://www.ruby-talk.com/cgi-bin/scat.rb/ruby/ruby-talk/12815
it's definitely better for maintenance, debugging, updtaing,
refactoring, to write complex regexen in a kinda BNF style:
   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65125

Tobi

P.S. (they discussed that there in the first URL, but the output still
is a mess.)

--
Tobias Reif
http://www.pinkjuice.com/myDigitalProfile.xhtml

go_to('www.ruby-lang.org').get(ruby).play.create.have_fun
http://www.pinkjuice.com/ruby/



Fri, 23 Apr 2004 18:15:29 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. Ruby Performance (was Re: Joys of eval)

2. To eval or not to eval

3. Speeding compile()/eval() (was : Slowness in compile()/eval())

4. MVS memory programming (the joys of channel I/O)

5. The Joys of Polymorphic Recursion

6. The Joys of X !!

7. Embedding Linda and other joys of concurrent logic programming

8. The joys and jilts of non-blocking sockets

9. Equivalent to chr(eval(self.myhex)) without using eval?

10. Parse and eval an expression string?

11. eval and Kernighan-Pike overscript script with pipeline

12. EVAL function

 

 
Powered by phpBB® Forum Software