Regular Expression to match HTML elements 
Author Message
 Regular Expression to match HTML elements

I am trying to write a reg_exp that can match an img tag and append a
prefix to relative URL's.  My problem is that I need to match acceptable
   HTML which may not be properly formed.

The reg_ex below is working fairly well; however, it is not perfect
because it will not match img tags that do not contain quotes around the
src.  I have attempted to correct this by placing a ? after the [\"'] to
match one or none " or ' characters.  However this causes the expression
to match any img tag which contains a src attribute regardless if
http:// is contained in the url.

This pattern works:

$pattern = "/(<img[^>]*src\s*\=\s*[\"'])([^http"\/\/][^>]*\b)([^>]*>)/i";

but this does not:
$pattern = "/(<img[^>]*src\s*\=\s*[\"']?)([^http"\/\/][^>]*\b)([^>]*>)/i";

Any help would be greatly appreciated.



Sun, 08 Jan 2006 05:59:46 GMT  
 Regular Expression to match HTML elements

Quote:

> I am trying to write a reg_exp that can match an img tag and append a
> prefix to relative URL's.  My problem is that I need to match
>    acceptable HTML which may not be properly formed.

> The reg_ex below is working fairly well; however, it is not perfect
> because it will not match img tags that do not contain quotes around
> the src.  I have attempted to correct this by placing a ? after the
> [\"'] to match one or none " or ' characters.  However this causes
> the expression to match any img tag which contains a src attribute
> regardless if
> http:// is contained in the url.

> This pattern works:

> $pattern =
> "/(<img[^>]*src\s*\=\s*[\"'])([^http"\/\/][^>]*\b)([^>]*>)/i";

> but this does not:
> $pattern =
> "/(<img[^>]*src\s*\=\s*[\"']?)([^http"\/\/][^>]*\b)([^>]*>)/i";

> Any help would be greatly appreciated.

Nope, neither of these work in awk.  Try another language!

--
Peter S Tillier
"Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?"



Sun, 08 Jan 2006 11:50:32 GMT  
 Regular Expression to match HTML elements

Quote:

> I am trying to write a reg_exp that can match an img tag and append a
> prefix to relative URL's.  My problem is that I need to match acceptable
>    HTML which may not be properly formed.

 Parsing something that 'looks like HTML' to find tags is
 almost impossible using AWK or any other language. You need
 somekind of an intelligent algorithm which regexp's
 certainly aren't.

--

 gpg_key http://www.wellu.org/key.pgp
No tears please, it's a waste of good suffering.



Sun, 08 Jan 2006 14:58:28 GMT  
 Regular Expression to match HTML elements


% I am trying to write a reg_exp that can match an img tag and append a
% prefix to relative URL's.  My problem is that I need to match acceptable
%    HTML which may not be properly formed.

You need to do this with more than one test. Match the img tag, find the
href attribute, test the value to see if it needs to be changed.

--

Patrick TJ McPhee
East York  Canada



Mon, 09 Jan 2006 00:03:31 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. iss-matching - the free Regular Expression / Pattern Matching cluster

2. Individual elements of regular expressions

3. regular expression matching in J ? (or APL)

4. regular expression: matching ( )

5. Regular expression matching with Halstenbach's REGEXP

6. Regular Expression for Match Pattern (string) Function

7. Bug in regular expression pattern matching?

8. Binding style and the universality of REs (was: Regular Expression Matching)

9. Regular expression string pattern matching: Embedding pop-11 procedures, and more

10. regular expression matching

11. Regular Expression matching...

12. Pattern-matching regular-expression algorithm?

 

 
Powered by phpBB® Forum Software