Regular expressions question 
Author Message
 Regular expressions question

I'm trying to match the <script> ... </script> in the following HTML text
using a regular expression:
<script language="vbscr>ipt" for="idDoit" event="onclick">
Dim re, Match, Matches, s, m

  Set re = new RegExp

  re.Pattern = idPattern.value
  re.Global = true
  re.IgnoreCase = true

  s = idInput.value
  m = ""

  Set Matches = re.Execute(s)
  for each Match in Matches
   m = m & Match.Value & vbCrLf
  next

  idOutput.innerText = m
</script>

and using the pattern:

<.?script.*>[.\s]*

As far as my knowledge of regexps goes, that pattern should match everything
after the first line right up to and including the last line.

However, the pattern actually only matches as follows (starting position is
indicated):
_________________________________________________
0: <script language="vbscr>ipt" for="idDoit" event="onclick">

357: </script>
_________________________________________________

My question is: how can I match opening and closing HTML tags and their
content when spread across lines in a string?

TIA

John O'Connell



Mon, 16 Jun 2003 23:43:20 GMT  
 Regular expressions question

Perl has a flag whereby you can tell the Regex engine that dot (.) should also match newlines and crs. VBScript doesn't have that, but a simplework around comes with [^\v] this matches any character except a vertical tab (whatever that is).

So, my solution would be

re.Pattern = "<script[^\v>]*>[^\v]*?</script>"

Note that the non-greedy quantifier are important (the ? after * makes it non-greedy). Without it, the regex will match the first <script and the very last </script>-Not what you want if there are multiple blocks.
Cheers,
Mark Bosley

Check out Chris Stefano's RegExTools at http://www.vbxml.com/regextools/



Tue, 17 Jun 2003 06:30:06 GMT  
 Regular expressions question

Thanks for that.  I tried your exact solution but that caused an "Unexpected quantifier" error when the regexp was executed.  I  removed the optional match (?) and it worked okay but the greediness of the * becomes a problem.  I'm using VBScript 5.1, so perhaps I should try your solution with 5.5 or 5.6.

I'm fairly new to RegExps (I'm currently working through Mastering Regular Expressions) and don't see why the following doesn't work...

<script[.\s>]*>[.\s]*</script>

...given that . (dot) matches any character and \s matches any whitespace char including CR, LF etc .

Thanks again

John O'Connell



Wed, 18 Jun 2003 02:52:37 GMT  
 Regular expressions question

Whoops, right after I turned my computer I realized I should have mentioned this needs VBScript 5.5 (available at http://www.microsoft.com/msdownload/vbscript/scripting.asp)
This brings you much of what Perl offers (in terms of patterns, although Regex's are integrated into Perl to a remarkable degree)

Why do you need the non-greedy *? quantifier?
Consider the following:
<html>
<body>
<script language="JavaScript">
function foo() {
 return true;

Quote:
}

</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
 return false;
Quote:
}

</script>
</body>
</html>

With <script[.\s>]*>[.\s]*</script> the regex engine matches <script language="JavaScript">

With [.\s]* the Regex engine matches everything to the end of the text. Then it begins to back up looking to match </script>

This gives one match:
<html>
<body>
<script language="JavaScript">
function foo() {
 return true;

Quote:
}

</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
 return false;
Quote:
}

</script>
</body>
</html>

By adding ? to *, the regex engine is matching [.\s] and looking for </script> at the same time.

Thus with <script[.\s>]*>[.\s]*?</script>, we get two matches
<html>
<body>
<script language="JavaScript">
function foo() {
 return true;

Quote:
}

</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
 return false;
Quote:
}

</script>
</body>
</html>

* matches zero to infinite with a preference towards the infinite.
*? matches zero to infinite with a preference towards zero.
+ matches one to infinite with a preference towards the infinite.
+? matches one to infinite with a preference towards one.

If you have plenty of free time, you can check out my page at
http://www.vbxml.com/people/bosley/rgxpage.asp
This is not a substitute for Mastering Regular Expressions

Cheers,
Mark

  Thanks for that.  I tried your exact solution but that caused an "Unexpected quantifier" error when the regexp was executed.  I  removed the optional match (?) and it worked okay but the greediness of the * becomes a problem.  I'm using VBScript 5.1, so perhaps I should try your solution with 5.5 or 5.6.

  I'm fairly new to RegExps (I'm currently working through Mastering Regular Expressions) and don't see why the following doesn't work...

  <script[.\s>]*>[.\s]*</script>

  ...given that . (dot) matches any character and \s matches any whitespace char including CR, LF etc .

  Thanks again

  John O'Connell



Wed, 18 Jun 2003 06:06:22 GMT  
 Regular expressions question

Thanks for that info Mark.  Yep I sussed that I'd need 5.5

Thanks again

John

  Whoops, right after I turned my computer I realized I should have mentioned this needs VBScript 5.5 (available at http://www.microsoft.com/msdownload/vbscript/scripting.asp)
  This brings you much of what Perl offers (in terms of patterns, although Regex's are integrated into Perl to a remarkable degree)

  Why do you need the non-greedy *? quantifier?
  Consider the following:
  <html>
  <body>
  <script language="JavaScript">
  function foo() {
   return true;
  }
  </script>
  <h1>Text</h1>
  <script language="JavaScript">
  function faa() {
   return false;
  }
  </script>
  </body>
  </html>

  With <script[.\s>]*>[.\s]*</script> the regex engine matches <script language="JavaScript">

  With [.\s]* the Regex engine matches everything to the end of the text. Then it begins to back up looking to match </script>

  This gives one match:
  <html>
  <body>
  <script language="JavaScript">
  function foo() {
   return true;
  }
  </script>
  <h1>Text</h1>
  <script language="JavaScript">
  function faa() {
   return false;
  }
  </script>
  </body>
  </html>

  By adding ? to *, the regex engine is matching [.\s] and looking for </script> at the same time.

  Thus with <script[.\s>]*>[.\s]*?</script>, we get two matches
  <html>
  <body>
  <script language="JavaScript">
  function foo() {
   return true;
  }
  </script>
  <h1>Text</h1>
  <script language="JavaScript">
  function faa() {
   return false;
  }
  </script>
  </body>
  </html>

  * matches zero to infinite with a preference towards the infinite.
  *? matches zero to infinite with a preference towards zero.
  + matches one to infinite with a preference towards the infinite.
  +? matches one to infinite with a preference towards one.

  If you have plenty of free time, you can check out my page at
  http://www.vbxml.com/people/bosley/rgxpage.asp
  This is not a substitute for Mastering Regular Expressions

  Cheers,
  Mark


    Thanks for that.  I tried your exact solution but that caused an "Unexpected quantifier" error when the regexp was executed.  I  removed the optional match (?) and it worked okay but the greediness of the * becomes a problem.  I'm using VBScript 5.1, so perhaps I should try your solution with 5.5 or 5.6.

    I'm fairly new to RegExps (I'm currently working through Mastering Regular Expressions) and don't see why the following doesn't work...

    <script[.\s>]*>[.\s]*</script>

    ...given that . (dot) matches any character and \s matches any whitespace char including CR, LF etc .

    Thanks again

    John O'Connell



Fri, 20 Jun 2003 04:24:24 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Regular Expression Question

2. regular expression question

3. Rather simple regular expression question....

4. Regular expressions questions

5. regular expression question

6. regular expression question

7. Regular Expression question

8. Regular Expression Question

9. Regular expression question

10. Regular Expression question

11. complex regular expression question

12. Regular Expression question

 

 
Powered by phpBB® Forum Software