Regular Expression Problem 
Author Message
 Regular Expression Problem

Hi,

I'm having troubles running a script that auto-hyperlinks plaintext urls. A
simple regular expression substitute does not seem to recognise urls which
contain a question mark. I've isolated the problem in the following script:

#!/usr/local/bin/perl

$url = " http://www.*-*-*.com/ ;;
$url =~ s/$url/<<a href=\"$url\">$url<\/a>>/;
print "$url";

Which returns:

http://www.*-*-*.com/

When it should be returning:

<<a
href=" http://www.*-*-*.com/ ;> http://www.*-*-*.com/
?param=1</a>>

I'd appreciate any help on this.

By the way, would anyone know where I can find the most complete url regex?

Thanks a lot.



Wed, 22 Sep 2004 12:44:34 GMT  
 Regular Expression Problem
The problem are the special RE characters in
your URL.  Especially the "?" and maybe the "/"s.

You had....

Quote:
> $url = "http://www.example.com/script.pl?param=1";
> $url =~ s/$url/&lt;<a href=\"$url\">$url<\/a>&gt;/;
> print "$url";

> Which returns:

> http://www.example.com/script.pl?param=1

One trick is to avoid the "/" as a delimeter.  Although
you might be able to do it without this, but it is certainly
easier to read with another 'bracket' character.

I converted to this so that I could see that the match
had failed....

$url = "http://www.example.com/script.pl?param=1";
if ($url =~ s/$url/&lt;<a href=\"$url\">$url<\/a>&gt;/) {
  print "$url";

Quote:
} else {

  print "\$url is not matching $url\n";

Quote:
}

$url = "http://www.example.com/script.pl?param=1";
if ($url =~ s#http://www.example.com/script.pl\?param=1#&lt;<a
href=\"$url\">$url<\/a>&gt;#) {
  print "$url";
Quote:
} else {

  print "\$url is not matching $url\n";

Quote:
}

Try the HTML:LinkExtor.  It seems to be part of the
standard [ActivePerl] destribution.  It's documented
in the help files as....

"HTML::LinkExtor is an HTML parser that extracts
links from an HTML document. The HTML::LinkExtor
is a subclass of HTML::Parser. This means that the
document should be given to the parser by calling the
$p->parse() or $p->parse_file() methods...."

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

Quote:
> When it should be returning:

> &lt;<a

href="http://www.example.com/script.pl?param=1">http://www.example.com/scrip
t.pl
Quote:
> ?param=1</a>&gt;

> I'd appreciate any help on this.

> By the way, would anyone know where I can find the most complete url
regex?

> Thanks a lot.



Wed, 22 Sep 2004 17:00:31 GMT  
 Regular Expression Problem
I replied originally due to an interest in a similar problem.
My issue is to "URLify" ordinary text that contains references to
other sites.

Background:  I was given a large web site by a friend
who has spent years gathering info for student pilots and
flight instructors but who is not well versed in web design.
His web site is a treasure trove of information but it does
frequently lack some user interface niceties like having
the URL references actually CODED as links instead of
text.

This is not however consistent; some are coded while
others are not.

I very much want to process these files to wrap link code
around the uncoded URLs so they become hyperlinks.
Finding URLs is a non-trivial task; finding only those "urls"
that are really text is even harder.

Since most of these are on a "line by themselves" or with
minimal coding, I have tried the following (naive) match:

     print if (m#^(\s*http://)([^\s<>]+)(<.+>)*$#i);

It finds things like:
   http://www.monmouth.com/~jsd/how</FONT><BR>

This may be one of those problems where if I can solve 90%
need with twenty minutes of coding but the other 10% will never
be perfect, then I should just "take what the easy way out."

Concrete suggestions will be greatly appreciated....

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds



Wed, 22 Sep 2004 17:43:26 GMT  
 Regular Expression Problem

Quote:
> I replied originally due to an interest in a similar problem.
> My issue is to "URLify" ordinary text that contains references to
> other sites.
>[...]
> Concrete suggestions will be greatly appreciated....

You might want to take a look at

        <http://www.foad.org/~abigail/Perl/url2.html>

--
felix



Wed, 22 Sep 2004 18:08:56 GMT  
 Regular Expression Problem

Quote:
> You might want to take a look at

>     <http://www.foad.org/~abigail/Perl/url2.html>

Thanks.  I did and I do appreciate the reference.
Haven't had a chance to play with the results there but
it certainly may help solve the (full) problem.

It is interesting to note a couple of things:

1) Tom Christiansen (most recognize as a well-known
Perl wizard has commented and offered bug fixes to
the author of this Perl utility.  This doesn't guarantee
quality but it does indicate that this is likely a well-thought
out idea.

2)  This is a "utility" to create 'regexes'; the Perl code
generates Perl code.  This is a (direct I believe) reflection
of the difficulty of writing a generic "url recognizer".

It's a non-trivial problem.  <grin>

Now, I will have to see if I can stop it from recognizing already
coded URLs and match only those which are "in text".  Perhaps
I will use the HTML:Parser to first find the parts of the HTML
document and avoid running this recognizer against anything
except text.

Thanks,

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds



Wed, 22 Sep 2004 18:31:27 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. OT: Regular Expression problem

2. Perl regular expression PROBLEM

3. Regular expression problem

4. regular expression problem

5. Regular Expression Problem

6. regular expression problem

7. regular expression problem

8. Regular Expressions Problem

9. Regular expression problem!!!

10. Regular expression problem

11. Regular expression problem in 4.019

12. Regular Expression problem

 

 
Powered by phpBB® Forum Software