: Small improvement(?) of re.py 
Author Message
 : Small improvement(?) of re.py

Hello all,

I have inserted a small improvement in re.py. I find re.py very useable
compared with regex(p).py. But one thing has disturbed me from day one.

I love it to use if ... elif ... constructs, which was possible with
the old 'regex' module. But it is not any longer elegant (in my
opinion of course) with the new 're' module.

I have attached three small scripts, which does not do anything useful
thing, but to serve as example.

In the first two examples, I show, how I have to do the task with a
unmodified re.py. The third script would show how it may be done if my
patch was introduced. IMHO the third example would be the most elegant
one of the three.

I think my patch (fourth attachment) should not have any drawbacks.
Please let me know what do you think about it.

Thanks in advance.

Bye,
Cle.

--
| Clemens Hintze * ACB/EO  ____  OMC-R Software Developement
| Phone: +49 30 7002-3241  \  /  ALCATEL Mobile Communication Division ITD
| Fax  : +49 30 7002-3851   \/   Colditzstr. 34-36, D-12099 Berlin, Germany

[ testreold.py < 1K ]
#!/bin/env python

import re, string

user = re.compile('^([^:]+):([^:]+)')
pseudo = re.compile('^([^:]+)')

fd = open('/etc/passwd', 'r')

for line in fd.readlines():
    mat = user.match(line)
    if mat:
        name, password = tuple(mat.group(1,2))
        print 'user='+name,'password (encoded)='+password
        continue
    mat = pseudo.match(line)
    if mat:
        name = string.strip(mat.group(1))
        print 'peudouser='+name
        continue

[ testreold2.py < 1K ]
#!/bin/env python

import re, string

user = re.compile('^([^:]+):([^:]+)')
pseudo = re.compile('^([^:]+)')

fd = open('/etc/passwd', 'r')

for line in fd.readlines():
    mat1 = user.match(line)
    mat2 = pseudo.match(line)
    if mat1:
        name, password = tuple(mat1.group(1,2))
        print 'user='+name,'password (encoded)='+password
    elif mat2:
        name = string.strip(mat2.group(1))
        print 'peudouser='+name

[ testre.py < 1K ]
#!/bin/env python

import re, string

user = re.compile('^([^:]+):([^:]+)')
pseudo = re.compile('^([^:]+)')

fd = open('/etc/passwd', 'r')

for line in fd.readlines():
    if user.match(line):
        mat = user.lastMatchObject
        name, password = tuple(mat.group(1,2))
        print 'user='+name,'password (encoded)='+password
    elif pseudo.match(line):
        mat = pseudo.lastMatchObject
        name = string.strip(mat.group(1))
        print 'peudouser='+name

[ re.py.diff 1K ]
*** /home/omcr/public/lib/python1.5/re.py       Tue Jan 27 14:31:13 1998
--- ./re.py     Wed Feb 18 10:36:47 1998
***************
*** 87,92 ****
--- 87,93 ----
        self.flags = flags
        self.pattern = pattern
        self.groupindex = groupindex
+       self.lastMatchObject = None

      def search(self, string, pos=0, endpos=None):
        """Scan through string looking for a match to the pattern, returning
***************
*** 99,109 ****
        if regs is None:
            return None
        self._num_regs=len(regs)
!      
!       return MatchObject(self,
!                          string,
!                          pos, endpos,
!                          regs)

      def match(self, string, pos=0, endpos=None):
        """Try to apply the pattern at the start of the string, returning
--- 100,110 ----
        if regs is None:
            return None
        self._num_regs=len(regs)
!       self.lastMatchObject = MatchObject(self,
!                                          string,
!                                          pos, endpos,
!                                          regs)
!       return self.lastMatchObject

      def match(self, string, pos=0, endpos=None):
        """Try to apply the pattern at the start of the string, returning
***************
*** 116,125 ****
        if regs is None:
            return None
        self._num_regs=len(regs)
!       return MatchObject(self,
!                          string,
!                          pos, endpos,
!                          regs)

      def sub(self, repl, string, count=0):
        """Return the string obtained by replacing the leftmost
--- 117,127 ----
        if regs is None:
            return None
        self._num_regs=len(regs)
!       self.lastMatchObject = MatchObject(self,
!                                          string,
!                                          pos, endpos,
!                                          regs)
!       return self.lastMatchObject

      def sub(self, repl, string, count=0):
        """Return the string obtained by replacing the leftmost



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

Hi,

I like this behaviour of re.py too, so I would vote for including it into
re.py. It would ease programming a lot in my eyes.

Best regards, Oliver Andrich

--
Oliver Andrich, Rhein-Zeitung/RZ-Online, Schlossstrasse 42, D-56068 Koblenz

PGPKey: request the following URL



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

Quote:

> I love it to use if ... elif ... constructs, which was possible with
> the old 'regex' module. But it is not any longer elegant (in my
> opinion of course) with the new 're' module.
> I think my patch (fourth attachment) should not have any drawbacks.
> Please let me know what do you think about it.

When writing my very first python script (we're considering
making a transition form perl to python in our company),
exactly this was what I found a little disturbing (apart from
the many things I liked) - being forced to use continue statements
or pre-matching all regular expressions. So yes, I'd rather like
to write it like you show in your 3rd example.

Regards, Jan

--
===================================================================
Jan Decaluwe              ===              Easics               ===
Design Manager            ===  VHDL-based ASIC design services  ===
Tel: +32-16-395 600          ===================================
Fax: +32-16-395 619      Interleuvenlaan 86, B-3001 Leuven, BELGIUM



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

Clemens,

This feature was removed from re with a very specific reason.  It is
not thread safe: if two threads share the same compiled RE object, the
lastMatchObject attribute may be overwritten by the second thread
before the first thread has had a chance to look at it.

(You may argue "then don't use the feature in a program that doesn't
use threads" -- but I worry that people would get so used to it that
they would use it in modules that they share and that eventually end
up being used in threaded programs.  So it's better to avoid the
problem at the source.)

--Guido van Rossum (home page: http://www.python.org/~guido/)



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

Guido,

you are right, unfortunately! But then ... what do you think to introduce
the proposal from another guy? He has proposed to introduce a 'if-from' and
a 'while-from' statement into python. So it would be possible to write:

...
    if mat from user.match(line):
        [name, password] = mat.group(1,2)
...

I think such constructs would make the language more elegant (IMHO, of
course). In our company it has disturbed many beginners, that they have no
such if/while-from constructs.

Bye,
Cle.

--
| Clemens Hintze * ACB/EO  ____  OMC-R Software Developement
| Phone: +49 30 7002-3241  \  /  ALCATEL Mobile Communication Division ITD
| Fax  : +49 30 7002-3851   \/   Colditzstr. 34-36, D-12099 Berlin, Germany



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

Quote:
> you are right, unfortunately! But then ... what do you think to introduce
> the proposal from another guy? He has proposed to introduce a 'if-from' and
> a 'while-from' statement into python. So it would be possible to write:

> ...
>     if mat from user.match(line):
>         [name, password] = mat.group(1,2)
> ...

> I think such constructs would make the language more elegant (IMHO, of
> course). In our company it has disturbed many beginners, that they have no
> such if/while-from constructs.

Clemens,

While I understand the desire of beginners to have all the features of
their previous favorite language in Python, that's not the way to make
them happy.  Besides, one person's favorite feature is another
person's worst nightmare.  Plus, I'd really rather keep the language
stable, at least at the syntactic level.

--Guido van Rossum (home page: http://www.python.org/~guido/)



Sun, 06 Aug 2000 03:00:00 GMT  
 : Small improvement(?) of re.py

The change doesn't really bother me (not that it matters), save for two things:

it is not thread-safe.
it may do bad things with reference counting? (i.e. match objects may wind up
{*filter*} around and holding references to big strings which aren't necessary).

If those aren't factors in your designs, I suppose that's not a problem for you.
I would think including the functionality would require that we flag it as
non-thread-safe in the manuals.

Possibly a non-thread-safe subclass optimised specifically for
scripting-language-style processing would be a better approach?

Enjoy yourselves,
Mike


GMT:
...
(Adding lastMatchObject attribute to re.RegexObject's)
...
________________________________
 M i k e   C .  F l e t c h e r


http://www.*-*-*.com/ ~mcfletch/
________________________________
 Design, Consultation, Training



Sun, 06 Aug 2000 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. urllib.urlencode small improvement

2. Date handling improvements for rfc822.py

3. BYO small improvement for help.

4. Small change in cgi.py

5. A small bug in whrandom.py

6. trying to run boa: problem with stc.py / stc_.py / stc_c.py

7. Q: Psion and Small (really small) talk

8. small small question about linking libraries

9. Small small question about linking libraries

10. Deleting rexec.py and Bastion.py

11. site.py, sitecustomize.py and unicode errors

12. trace.py and coverage.py

 

 
Powered by phpBB® Forum Software