complex regular expression question 
Author Message
 complex regular expression question

I'm playing with something that compares two long documents together. It
basically looks for new words that have been added, and new instances of
words. Unfortunately the code is a bit unwieldy, and not that robust. I'm
thinking of moving it all to using regular expressions. Does anyone have any
ideas of the sorts of match patterns and logic that might work for this?

Cheers,

Tim.



Wed, 01 Jun 2005 00:43:32 GMT  
 complex regular expression question

Quote:

> I'm playing with something that compares two long documents together.
> It basically looks for new words that have been added, and new
> instances of words. Unfortunately the code is a bit unwieldy, and not
> that robust. I'm thinking of moving it all to using regular
> expressions. Does anyone have any ideas of the sorts of match
> patterns and logic that might work for this?

I'm not sure what your exact definition of a document is but if it is (or
can be represented as) plain text...

The simple example below uses two locally defined string variables.  These
each represent the total text of two separate text document sources.  For
example, they could be the contents of two text files read using FSO
methods.

A RegExp is used to parse the text into words.  Two dictionary objects are
then used to keep a word count of the unique word occurences in each text
source.  The dictionaries use text rather than binary compare mode, so words
tracked are case insensitive.

It then uses the two dictionaries to determine the words that are common to
each text source and also the words that are unique to each.

strThis = "one two, three. four"
strThat = "seven one, nine. four"

Set sdThisWords = CreateObject("Scripting.Dictionary")
sdThisWords.comparemode = vbTextCompare

Set sdThatWords = CreateObject("Scripting.Dictionary")
sdThatWords.comparemode = vbTextCompare

set reWordParse = new regexp
reWordParse.pattern = "\w+"
reWordParse.global = true

for each match in reWordParse.execute(strThis)
  'counts occurences of unique words (case insensitive)...
  word = cstr(match.value)
  sdThisWords(word) = sdThisWords(word) + 1
  'just tracks occurences of unique words
  'sdThisWords(match.value) = true 'could be any value...
next

for each match in reWordParse.execute(strThat)
  'counts occurences of unique words (case insensitive)...
  word = cstr(match.value)
  sdThatWords(word) = sdThatWords(word) + 1
  'just tracks occurences of unique words
  'sdThisWords(match) = true 'could be any value...
next

wscript.echo "strThis=",strThis
wscript.echo "strThat=",strThat
wscript.echo
wscript.echo "words in strThis that are also in strThat"

for each word in sdThisWords.keys
  if sdThatWords.Exists(word) then
    wscript.echo word
  end if
next

wscript.echo
wscript.echo "words in strThis that are NOT in strThat"

for each word in sdThisWords.keys
  if Not sdThatWords.Exists(word) then
    wscript.echo word
  end if
next

wscript.echo
wscript.echo "words in strThat that are NOT in strThis"

for each word in sdThatWords.keys
  if Not sdThisWords.Exists(word) then
    wscript.echo word
  end if
next

--
Michael Harris
Microsoft.MVP.Scripting
Seattle WA US



Thu, 02 Jun 2005 07:02:39 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Regular Expression Question

2. regular expression question

3. Rather simple regular expression and Array question....

4. Rather simple regular expression question....

5. Regular expressions questions

6. Question: Working with Regular Expressions in VB

7. regular expression question

8. regular expression question

9. Regular Expression question

10. Help, really simple question w/ Regular Expressions

11. Regular Expression replace question

12. Regular Expression Question

 

 
Powered by phpBB® Forum Software