Replace with regular expressions "except if..." 
Author Message
 Replace with regular expressions "except if..."

Hi,
I am having difficulties figuring out, how to match a regular
expression and hope that someone in the Tcl world can help me out here.

The goal is to find any URLs in a text and expand it into an HTML
anchor, e.g. expand http://www.*-*-*.com/ into <a
href=" http://www.*-*-*.com/ "> http://www.*-*-*.com/ </a>. The problem is that I
don't want to expand lines if they are already written as HTML. I

works for most general cases. What I can't figure out, is how to tell
it not to expand it if it is inside a tag. It would suffice if the URL
inside the <a href...> marker is not expanded but if it is possible to
also not expand what is between the tags I would be even more happy,
i.e. if someone writes <a href="..."> http://www.*-*-*.com/ </a>.

I would be eternally grateful if someone could help me out here.

Best regards, Thomas Nielsen

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Mon, 12 May 2003 03:00:00 GMT  
 Replace with regular expressions "except if..."
Try this:

set RE {(\s+)(([^:"=]+):(//)?([^:/\s]+)(:[0-9]+)?/?([^\s<]*))}
set someText {
<random tag> not.an.url.com
<a href="http://myhost.com/helloworld.html">www.hello.com</a>


other stuff

Quote:
}

regsub -all $RE $someText {\1<a href="\2">\2</a>} output

puts "$output"

Quote:

> Hi,
> I am having difficulties figuring out, how to match a regular
> expression and hope that someone in the Tcl world can help me out here.

> The goal is to find any URLs in a text and expand it into an HTML
> anchor, e.g. expand http://fee.foo.fum into <a
> href="http://fee.foo.fum">http://fee.foo.fum</a>. The problem is that I
> don't want to expand lines if they are already written as HTML. I

> works for most general cases. What I can't figure out, is how to tell
> it not to expand it if it is inside a tag. It would suffice if the URL
> inside the <a href...> marker is not expanded but if it is possible to
> also not expand what is between the tags I would be even more happy,
> i.e. if someone writes <a href="...">http://fee.foo.fum</a>.

> I would be eternally grateful if someone could help me out here.

> Best regards, Thomas Nielsen

> Sent via Deja.com http://www.deja.com/
> Before you buy.



Mon, 12 May 2003 03:00:00 GMT  
 Replace with regular expressions "except if..."
Hi,
I am having difficulties figuring out, how to match a regular
expression and hope that someone in the Tcl world can help me out here.

The goal is to find any URLs in a text and expand it into an HTML
anchor, e.g. expand http://fee.foo.fum into <a
href="http://fee.foo.fum">http://fee.foo.fum</a>. The problem is that I
don't want to expand lines if they are already written as HTML. I

works for most general cases. What I can't figure out, is how to tell
it not to expand it if it is inside a tag. It would suffice if the URL
inside the <a href...> marker is not expanded but if it is possible to
also not expand what is between the tags I would be even more happy,
i.e. if someone writes <a href="...">http://fee.foo.fum</a>.

I would be eternally grateful if someone could help me out here.

Best regards, Thomas Nielsen

Sent via Deja.com http://www.deja.com/
Before you buy.



Mon, 12 May 2003 19:15:27 GMT  
 Replace with regular expressions "except if..."
Thanks Neil,
I tried this but it matches too much and too little. I tried putting a
URL after the last string (Other stuff) and it chose to match \2 from
the "t" in stuff and forward :-).

The expression I showed initially matches any string beginning
with ...:// and returns it in \1 so that I can use it in {<a
href="\1">\1</a>. I just don't want it to match adresses that are
already inside <a href...> tags. I have tried figuring out how to
utilize lookahead and negate it (DeMorgan-wise) but it is really not
very easy for me to construct. I don't know any formalized methods of
designing these regular expressions and my brain is simply too small to
keep it in the air without one. I truly admire those who do.

Best regards, Thomas Nielsen

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 13 May 2003 03:00:00 GMT  
 Replace with regular expressions "except if..."
Yes, I see it doesn't work right. Unfortunately, that's about where my
knowledge of regexps comes to an end. :-( I am sure you should be able
to do this with regexps, but I think that one perhaps wouldn't be
enough. The problem is when you have nested tags, eg:

<a href="http://something"><b>http://something</b></a>   - this doesn;t
need to be expanded

<a href="http://something>something</a><b>http://hello</b> - this does
need to be expanded
                                          ^^^^^^^^^^^^

For these cases, you need to build up a knolwedge of the structure of
the HTML before and after the URL you match (which would become hugely
complicated (impossible?) with one regexp). I would suggest finding an
HTML parser in Tcl. The Tcl-XML parser by Steve Ball (<URL:
http://sourceforge.net/projects/tclxml>) could also provide a starting
point. You would have to parse looking for URLs as well as tags, and
then decide if the URL came after an <a href...> but before a </a>. Good
luck!

Quote:

> Thanks Neil,
> I tried this but it matches too much and too little. I tried putting a
> URL after the last string (Other stuff) and it chose to match \2 from
> the "t" in stuff and forward :-).

> The expression I showed initially matches any string beginning
> with ...:// and returns it in \1 so that I can use it in {<a
> href="\1">\1</a>. I just don't want it to match adresses that are
> already inside <a href...> tags. I have tried figuring out how to
> utilize lookahead and negate it (DeMorgan-wise) but it is really not
> very easy for me to construct. I don't know any formalized methods of
> designing these regular expressions and my brain is simply too small to
> keep it in the air without one. I truly admire those who do.

> Best regards, Thomas Nielsen

> Sent via Deja.com http://www.deja.com/
> Before you buy.



Tue, 13 May 2003 03:00:00 GMT  
 Replace with regular expressions "except if..."
Thank you Neil,
I will take a look at the parser you mention.

It is definitely one of the most tricky disciplines I have coe across :-
).

Best regards, Thomas Nielsen

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 13 May 2003 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. "Regular Expression?"

2. Regular Expression Pattern Matching "State" Object

3. "Invert" regular expression matching

4. Del's "except"ional PEP Rejection

5. Regular expression problem: "|" interferring with ".*?"

6. replace string AFTER "size","initial", "next"

7. regular expression to parse 0,xx/xx/xx and replace 0 in GPEP

8. Regular Expressions to Replace Strings

9. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

10. awk "search and replace"

11. What replaces "EXIT"?

12. Replacing "Shutdown" Action Button in LaunchPad

 

 
Powered by phpBB® Forum Software