RE 
Author Message
 RE

Hello, I'm new to python and having some Regular expressing problems:
I want to find a pattern in HTML page. I have the HTML page in a variable
named data.
Now I would like to search a pattern that starts with:
-----------------------------------------------
<a href=" http://www.*-*-*.com/ ;
class="w2b" style="color: #660000;"><span dir=rtl>
------------------------------------------------------
and ends with:
</span></a></td>

in the middle there can be any occurence.

result = re.search(r'style="color: #660000;"><span
dir=rtl>.*</span></a></td>', data);
if result:
    print result.group();
else:
    print 'no match';

but it seems like it doesn't work.

Anyone has ideas ???

Thanks.
Bashan.



Mon, 21 Jun 2004 06:33:24 GMT  
 RE
Hello,

In regular expressions, the dot (.) doesn't match newlines.
Try to replace .* by (.|\n)*

Christophe.

Quote:

> Hello, I'm new to Python and having some Regular expressing problems:
> I want to find a pattern in HTML page. I have the HTML page in a variable
> named data.
> Now I would like to search a pattern that starts with:
> -----------------------------------------------
> <a href="http://news.walla.co.il/ts.cgi?tsscript=item&path=&id=163799"
> class="w2b" style="color: #660000;"><span dir=rtl>
> ------------------------------------------------------
> and ends with:
> </span></a></td>

> in the middle there can be any occurence.


> result = re.search(r'style="color: #660000;"><span
> dir=rtl>.*</span></a></td>', data);
> if result:
>     print result.group();
> else:
>     print 'no match';

> but it seems like it doesn't work.

> Anyone has ideas ???

> Thanks.
> Bashan.

--
Christophe Delord
http://christophe.delord.free.fr


Fri, 25 Jun 2004 05:26:29 GMT  
 RE

Quote:

> In regular expressions, the dot (.) doesn't match newlines.

It does if you pass the re.S flag, e.g.

    re.search('<tag>(.*)</tag>', data, re.S)

--
         Carey Evans  http://home.clear.net.nz/pages/c.evans/

                             Cavem canus.



Fri, 25 Jun 2004 15:37:48 GMT  
 
 [ 3 post ] 

 Relevant Pages 
 

 
Powered by phpBB® Forum Software