
Regular expressions question
Whoops, right after I turned my computer I realized I should have mentioned this needs VBScript 5.5 (available at http://www.microsoft.com/msdownload/vbscript/scripting.asp)
This brings you much of what Perl offers (in terms of patterns, although Regex's are integrated into Perl to a remarkable degree)
Why do you need the non-greedy *? quantifier?
Consider the following:
<html>
<body>
<script language="JavaScript">
function foo() {
return true;
Quote:
}
</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
return false;
Quote:
}
</script>
</body>
</html>
With <script[.\s>]*>[.\s]*</script> the regex engine matches <script language="JavaScript">
With [.\s]* the Regex engine matches everything to the end of the text. Then it begins to back up looking to match </script>
This gives one match:
<html>
<body>
<script language="JavaScript">
function foo() {
return true;
Quote:
}
</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
return false;
Quote:
}
</script>
</body>
</html>
By adding ? to *, the regex engine is matching [.\s] and looking for </script> at the same time.
Thus with <script[.\s>]*>[.\s]*?</script>, we get two matches
<html>
<body>
<script language="JavaScript">
function foo() {
return true;
Quote:
}
</script>
<h1>Text</h1>
<script language="JavaScript">
function faa() {
return false;
Quote:
}
</script>
</body>
</html>
* matches zero to infinite with a preference towards the infinite.
*? matches zero to infinite with a preference towards zero.
+ matches one to infinite with a preference towards the infinite.
+? matches one to infinite with a preference towards one.
If you have plenty of free time, you can check out my page at
http://www.vbxml.com/people/bosley/rgxpage.asp
This is not a substitute for Mastering Regular Expressions
Cheers,
Mark
Thanks for that. I tried your exact solution but that caused an "Unexpected quantifier" error when the regexp was executed. I removed the optional match (?) and it worked okay but the greediness of the * becomes a problem. I'm using VBScript 5.1, so perhaps I should try your solution with 5.5 or 5.6.
I'm fairly new to RegExps (I'm currently working through Mastering Regular Expressions) and don't see why the following doesn't work...
<script[.\s>]*>[.\s]*</script>
...given that . (dot) matches any character and \s matches any whitespace char including CR, LF etc .
Thanks again
John O'Connell