how to gather html from several documents and place into one big document 
Author Message
 how to gather html from several documents and place into one big document

Hello,

I have about 100 .htm documents in eight different folders (If
necessary, I could put them all into one folder for this macro).

What I need to do is take the source code from each of the 100
documents, and place the code into one big word97 .doc.

Then, I would know how to write a word 97 macro to extract and analyze
certain numbers that appear only in the code, not on the web page as
opened in a browser, because I put those numbers in the code as
comments.

I know I can open each document in word 97, view the code and copy it
manually.

I would like to be able to automate gathering the code from these 100
documents  and placing it in one word97 document, because  I'll not
only have to do it this one time with these 100 .htm documents, but
over and over again later on, with an ever-increasing number of
documents.

Help much appreciated, as always!



Sun, 26 Oct 2003 14:02:38 GMT  
 how to gather html from several documents and place into one big document
First, I assume that these html files you want, are all under  the
same folder and that this tree contains only these html files and no
others or that the files you want have something in common that allows
you to uniquely search for them. (If that is not the case you will
have to make a list and use that list as a basis for the macro.)

You probably want to use the FileSearch object.
Look it up in the Help and continue from the examples given there.
You can then either put all the found files in one big document or
handle all files seperately to find the data you need.

Hope this helps.

Bart



Quote:
>Hello,

>I have about 100 .htm documents in eight different folders (If
>necessary, I could put them all into one folder for this macro).

>What I need to do is take the source code from each of the 100
>documents, and place the code into one big word97 .doc.

>Then, I would know how to write a word 97 macro to extract and analyze
>certain numbers that appear only in the code, not on the web page as
>opened in a browser, because I put those numbers in the code as
>comments.

>I know I can open each document in word 97, view the code and copy it
>manually.

>I would like to be able to automate gathering the code from these 100
>documents  and placing it in one word97 document, because  I'll not
>only have to do it this one time with these 100 .htm documents, but
>over and over again later on, with an ever-increasing number of
>documents.

>Help much appreciated, as always!



Sun, 26 Oct 2003 18:41:40 GMT  
 how to gather html from several documents and place into one big document
Word is a great word processor but VBA isn't a great text processing
language, and if this is going to be a regular kind of job for you
there are better tools than Word for doing it with. I'd use Perl,
which is specifically designed for extracting data from text files and
nowadays comes with a readymade module (HTML::TokeParser) for
extracting the meat from HTML tags.

But one can do a great deal with Word and VBA. Can you show us just
how the numbers you are  interested in appear in the comments in the
HTML?



Quote:
>Hello,

>I have about 100 .htm documents in eight different folders (If
>necessary, I could put them all into one folder for this macro).

>What I need to do is take the source code from each of the 100
>documents, and place the code into one big word97 .doc.

>Then, I would know how to write a word 97 macro to extract and analyze
>certain numbers that appear only in the code, not on the web page as
>opened in a browser, because I put those numbers in the code as
>comments.

>I know I can open each document in word 97, view the code and copy it
>manually.

>I would like to be able to automate gathering the code from these 100
>documents  and placing it in one word97 document, because  I'll not
>only have to do it this one time with these 100 .htm documents, but
>over and over again later on, with an ever-increasing number of
>documents.

>Help much appreciated, as always!

--
With best wishes
John

Please reply to the newsgroup and not by e-mail.



Mon, 27 Oct 2003 12:44:07 GMT  
 how to gather html from several documents and place into one big document
Thanks for the replies.

Each number is always between the letters jjj and yyy, as follows:

<!--qqq8/4/85jjj198yyy-->

Each document has 1-50 such entries (along with alot of other text).
This "jjj" followed by the number followed by "yyy" arrangement is, by
design, unique to the numbers I am looking for. The numbers range from
1-3060.

I'll await your response!

Jerry

On Thu, 10 May 2001 05:44:07 +0100, John Nurick

Quote:

>Word is a great word processor but VBA isn't a great text processing
>language, and if this is going to be a regular kind of job for you
>there are better tools than Word for doing it with. I'd use Perl,
>which is specifically designed for extracting data from text files and
>nowadays comes with a readymade module (HTML::TokeParser) for
>extracting the meat from HTML tags.

>But one can do a great deal with Word and VBA. Can you show us just
>how the numbers you are  interested in appear in the comments in the
>HTML?



>>Hello,

>>I have about 100 .htm documents in eight different folders (If
>>necessary, I could put them all into one folder for this macro).

>>What I need to do is take the source code from each of the 100
>>documents, and place the code into one big word97 .doc.

>>Then, I would know how to write a word 97 macro to extract and analyze
>>certain numbers that appear only in the code, not on the web page as
>>opened in a browser, because I put those numbers in the code as
>>comments.

>>I know I can open each document in word 97, view the code and copy it
>>manually.

>>I would like to be able to automate gathering the code from these 100
>>documents  and placing it in one word97 document, because  I'll not
>>only have to do it this one time with these 100 .htm documents, but
>>over and over again later on, with an ever-increasing number of
>>documents.

>>Help much appreciated, as always!



Tue, 28 Oct 2003 13:16:28 GMT  
 how to gather html from several documents and place into one big document
If all your tags look like that example it's easy. In Perl you can do
it with code like that below, which extracts the date and number from
all the qqq - jjj - yyy comments in every .htm file in the current
folder and puts them into one text file. With a few more lines of code
one can get it to process files in the subdirectories.

I wouldn't code a job like this in VBA unless I really had to. One way
would be to open the html source code in Word and then use a wildcard
search to find each comment before parsing it. A nicer way (IMO) would
be to use the VBScript FileSystem object and Regular Expression object
to do pretty much what the Perl code does - although in that case
there'd be no point using Word as it can be done just as well with
VBScript alone.

Here goes:

# getcodes: extracts numbers from HTML comments in the form
#     <!--qqq8/4/85jjj198yyy--> and returns them in a text file like
#     8/4/85 [tab] 198
#
# Processes all .htm files in current directory
#
# NB: If there is more than one of these comments in a line of the
# HTML code the second won't be found.


open OUTFILE, ">$outfile" or die "Can't open $outfile!\n";



        open INFILE, "<$infile" or die "Can't open $infile\n";
        while (<INFILE>) {
                if (m/<!--qqq(.*?)jjj(\d+)yyy-->/) {
                        print OUTFILE "$1\t$2\n";
                }
        }

Quote:
}



Quote:
>Thanks for the replies.

>Each number is always between the letters jjj and yyy, as follows:

><!--qqq8/4/85jjj198yyy-->

>Each document has 1-50 such entries (along with alot of other text).
>This "jjj" followed by the number followed by "yyy" arrangement is, by
>design, unique to the numbers I am looking for. The numbers range from
>1-3060.

>I'll await your response!

>Jerry

>On Thu, 10 May 2001 05:44:07 +0100, John Nurick

>>Word is a great word processor but VBA isn't a great text processing
>>language, and if this is going to be a regular kind of job for you
>>there are better tools than Word for doing it with. I'd use Perl,
>>which is specifically designed for extracting data from text files and
>>nowadays comes with a readymade module (HTML::TokeParser) for
>>extracting the meat from HTML tags.

>>But one can do a great deal with Word and VBA. Can you show us just
>>how the numbers you are  interested in appear in the comments in the
>>HTML?



>>>Hello,

>>>I have about 100 .htm documents in eight different folders (If
>>>necessary, I could put them all into one folder for this macro).

>>>What I need to do is take the source code from each of the 100
>>>documents, and place the code into one big word97 .doc.

>>>Then, I would know how to write a word 97 macro to extract and analyze
>>>certain numbers that appear only in the code, not on the web page as
>>>opened in a browser, because I put those numbers in the code as
>>>comments.

>>>I know I can open each document in word 97, view the code and copy it
>>>manually.

>>>I would like to be able to automate gathering the code from these 100
>>>documents  and placing it in one word97 document, because  I'll not
>>>only have to do it this one time with these 100 .htm documents, but
>>>over and over again later on, with an ever-increasing number of
>>>documents.

>>>Help much appreciated, as always!

--
With best wishes
John

Please reply to the newsgroup and not by e-mail.



Wed, 29 Oct 2003 05:17:04 GMT  
 how to gather html from several documents and place into one big document
Thanks for your reply, John, and the perl code.

I'm a little confused. I don't even know what Perl is.  Do I have to
buy it?  Where do I put it and how do I run it.  That code looks quite
intimidating to me, seemingly quite different from vba, which took me
long enough to have some familiarity with!

I also have no idea about VBscript coding.

Since I don't have the familiarity with those other two coding
systems, perhaps it's better if I do this in vba, which I at least
have some familiarity with.

In vba, i thought i'd just have to loop through every file in the
folder, open the source window, search for each instance, and after
each instance is found, copy and paste into another document. When no
more are found in that file, close, and do the same with the next
file.

I especially don't know how to loop thru files in a folder and make
sure each is in view source mode.

So if I could get vba code for this, I'd be really appreciative!

Jerry

On Fri, 11 May 2001 22:17:04 +0100, John Nurick

Quote:

>If all your tags look like that example it's easy. In Perl you can do
>it with code like that below, which extracts the date and number from
>all the qqq - jjj - yyy comments in every .htm file in the current
>folder and puts them into one text file. With a few more lines of code
>one can get it to process files in the subdirectories.

>I wouldn't code a job like this in VBA unless I really had to. One way
>would be to open the html source code in Word and then use a wildcard
>search to find each comment before parsing it. A nicer way (IMO) would
>be to use the VBScript FileSystem object and Regular Expression object
>to do pretty much what the Perl code does - although in that case
>there'd be no point using Word as it can be done just as well with
>VBScript alone.

>Here goes:

># getcodes: extracts numbers from HTML comments in the form
>#     <!--qqq8/4/85jjj198yyy--> and returns them in a text file like
>#     8/4/85 [tab] 198
>#
># Processes all .htm files in current directory
>#
># NB: If there is more than one of these comments in a line of the
># HTML code the second won't be found.


>open OUTFILE, ">$outfile" or die "Can't open $outfile!\n";




>    open INFILE, "<$infile" or die "Can't open $infile\n";
>    while (<INFILE>) {
>            if (m/<!--qqq(.*?)jjj(\d+)yyy-->/) {
>                    print OUTFILE "$1\t$2\n";
>            }
>    }
>}



>>Thanks for the replies.

>>Each number is always between the letters jjj and yyy, as follows:

>><!--qqq8/4/85jjj198yyy-->

>>Each document has 1-50 such entries (along with alot of other text).
>>This "jjj" followed by the number followed by "yyy" arrangement is, by
>>design, unique to the numbers I am looking for. The numbers range from
>>1-3060.

>>I'll await your response!

>>Jerry

>>On Thu, 10 May 2001 05:44:07 +0100, John Nurick

>>>Word is a great word processor but VBA isn't a great text processing
>>>language, and if this is going to be a regular kind of job for you
>>>there are better tools than Word for doing it with. I'd use Perl,
>>>which is specifically designed for extracting data from text files and
>>>nowadays comes with a readymade module (HTML::TokeParser) for
>>>extracting the meat from HTML tags.

>>>But one can do a great deal with Word and VBA. Can you show us just
>>>how the numbers you are  interested in appear in the comments in the
>>>HTML?



>>>>Hello,

>>>>I have about 100 .htm documents in eight different folders (If
>>>>necessary, I could put them all into one folder for this macro).

>>>>What I need to do is take the source code from each of the 100
>>>>documents, and place the code into one big word97 .doc.

>>>>Then, I would know how to write a word 97 macro to extract and analyze
>>>>certain numbers that appear only in the code, not on the web page as
>>>>opened in a browser, because I put those numbers in the code as
>>>>comments.

>>>>I know I can open each document in word 97, view the code and copy it
>>>>manually.

>>>>I would like to be able to automate gathering the code from these 100
>>>>documents  and placing it in one word97 document, because  I'll not
>>>>only have to do it this one time with these 100 .htm documents, but
>>>>over and over again later on, with an ever-increasing number of
>>>>documents.

>>>>Help much appreciated, as always!



Wed, 29 Oct 2003 13:14:30 GMT  
 how to gather html from several documents and place into one big document
I can't help, then; I've never nutted out the VBA necessary to work on
all documents in a folder.



Quote:
>Thanks for your reply, John, and the perl code.

>I'm a little confused. I don't even know what Perl is.  Do I have to
>buy it?  Where do I put it and how do I run it.  That code looks quite
>intimidating to me, seemingly quite different from vba, which took me
>long enough to have some familiarity with!

>I also have no idea about VBscript coding.

>Since I don't have the familiarity with those other two coding
>systems, perhaps it's better if I do this in vba, which I at least
>have some familiarity with.

>In vba, i thought i'd just have to loop through every file in the
>folder, open the source window, search for each instance, and after
>each instance is found, copy and paste into another document. When no
>more are found in that file, close, and do the same with the next
>file.

>I especially don't know how to loop thru files in a folder and make
>sure each is in view source mode.

>So if I could get vba code for this, I'd be really appreciative!

>Jerry

>On Fri, 11 May 2001 22:17:04 +0100, John Nurick

>>If all your tags look like that example it's easy. In Perl you can do
>>it with code like that below, which extracts the date and number from
>>all the qqq - jjj - yyy comments in every .htm file in the current
>>folder and puts them into one text file. With a few more lines of code
>>one can get it to process files in the subdirectories.

>>I wouldn't code a job like this in VBA unless I really had to. One way
>>would be to open the html source code in Word and then use a wildcard
>>search to find each comment before parsing it. A nicer way (IMO) would
>>be to use the VBScript FileSystem object and Regular Expression object
>>to do pretty much what the Perl code does - although in that case
>>there'd be no point using Word as it can be done just as well with
>>VBScript alone.

>>Here goes:

>># getcodes: extracts numbers from HTML comments in the form
>>#     <!--qqq8/4/85jjj198yyy--> and returns them in a text file like
>>#     8/4/85 [tab] 198
>>#
>># Processes all .htm files in current directory
>>#
>># NB: If there is more than one of these comments in a line of the
>># HTML code the second won't be found.


>>open OUTFILE, ">$outfile" or die "Can't open $outfile!\n";




>>        open INFILE, "<$infile" or die "Can't open $infile\n";
>>        while (<INFILE>) {
>>                if (m/<!--qqq(.*?)jjj(\d+)yyy-->/) {
>>                        print OUTFILE "$1\t$2\n";
>>                }
>>        }
>>}



>>>Thanks for the replies.

>>>Each number is always between the letters jjj and yyy, as follows:

>>><!--qqq8/4/85jjj198yyy-->

>>>Each document has 1-50 such entries (along with alot of other text).
>>>This "jjj" followed by the number followed by "yyy" arrangement is, by
>>>design, unique to the numbers I am looking for. The numbers range from
>>>1-3060.

>>>I'll await your response!

>>>Jerry

>>>On Thu, 10 May 2001 05:44:07 +0100, John Nurick

>>>>Word is a great word processor but VBA isn't a great text processing
>>>>language, and if this is going to be a regular kind of job for you
>>>>there are better tools than Word for doing it with. I'd use Perl,
>>>>which is specifically designed for extracting data from text files and
>>>>nowadays comes with a readymade module (HTML::TokeParser) for
>>>>extracting the meat from HTML tags.

>>>>But one can do a great deal with Word and VBA. Can you show us just
>>>>how the numbers you are  interested in appear in the comments in the
>>>>HTML?



>>>>>Hello,

>>>>>I have about 100 .htm documents in eight different folders (If
>>>>>necessary, I could put them all into one folder for this macro).

>>>>>What I need to do is take the source code from each of the 100
>>>>>documents, and place the code into one big word97 .doc.

>>>>>Then, I would know how to write a word 97 macro to extract and analyze
>>>>>certain numbers that appear only in the code, not on the web page as
>>>>>opened in a browser, because I put those numbers in the code as
>>>>>comments.

>>>>>I know I can open each document in word 97, view the code and copy it
>>>>>manually.

>>>>>I would like to be able to automate gathering the code from these 100
>>>>>documents  and placing it in one word97 document, because  I'll not
>>>>>only have to do it this one time with these 100 .htm documents, but
>>>>>over and over again later on, with an ever-increasing number of
>>>>>documents.

>>>>>Help much appreciated, as always!

--
With best wishes
John

Please reply to the newsgroup and not by e-mail.



Thu, 30 Oct 2003 13:28:58 GMT  
 how to gather html from several documents and place into one big document
Hi JMB,

Its not so hard to do in VBA. You need the FileSearch object

Something like is will be what you want

Dim i as Long
Dim CurrentDoc As Document
With Application.FileSearch
    .NewSearch
    .LookIn = "C:\My Documents\My special folder"
    .SearchSubFolders = True
    .FileName = "*.html"
    .FileType = msoFileTypeAllFiles
    If .Execute() > 0 Then
        For i = 1 To .FoundFiles.Count
            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
                    Format:=wdOpenFormatText)

'put your processing code here

            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
        Next i
    Else
        MsgBox "There were no files found."
    End If
End With

--
Regards
Jonathan West - Word MVP
MultiLinker - Automated generation of hyperlinks in Word
Conversion to PDF & HTML
http://www.multilinker.com
Word FAQs at http://www.multilinker.com/wordfaq
Please post any follow-up in the newsgroup. I do not reply to Word questions
by email


Quote:
> Thanks for your reply, John, and the perl code.

> I'm a little confused. I don't even know what Perl is.  Do I have to
> buy it?  Where do I put it and how do I run it.  That code looks quite
> intimidating to me, seemingly quite different from vba, which took me
> long enough to have some familiarity with!

> I also have no idea about VBscript coding.

> Since I don't have the familiarity with those other two coding
> systems, perhaps it's better if I do this in vba, which I at least
> have some familiarity with.

> In vba, i thought i'd just have to loop through every file in the
> folder, open the source window, search for each instance, and after
> each instance is found, copy and paste into another document. When no
> more are found in that file, close, and do the same with the next
> file.

> I especially don't know how to loop thru files in a folder and make
> sure each is in view source mode.

> So if I could get vba code for this, I'd be really appreciative!

> Jerry

> On Fri, 11 May 2001 22:17:04 +0100, John Nurick

> >If all your tags look like that example it's easy. In Perl you can do
> >it with code like that below, which extracts the date and number from
> >all the qqq - jjj - yyy comments in every .htm file in the current
> >folder and puts them into one text file. With a few more lines of code
> >one can get it to process files in the subdirectories.

> >I wouldn't code a job like this in VBA unless I really had to. One way
> >would be to open the html source code in Word and then use a wildcard
> >search to find each comment before parsing it. A nicer way (IMO) would
> >be to use the VBScript FileSystem object and Regular Expression object
> >to do pretty much what the Perl code does - although in that case
> >there'd be no point using Word as it can be done just as well with
> >VBScript alone.

> >Here goes:

> ># getcodes: extracts numbers from HTML comments in the form
> >#     <!--qqq8/4/85jjj198yyy--> and returns them in a text file like
> >#     8/4/85 [tab] 198
> >#
> ># Processes all .htm files in current directory
> >#
> ># NB: If there is more than one of these comments in a line of the
> ># HTML code the second won't be found.


> >open OUTFILE, ">$outfile" or die "Can't open $outfile!\n";




> > open INFILE, "<$infile" or die "Can't open $infile\n";
> > while (<INFILE>) {
> > if (m/<!--qqq(.*?)jjj(\d+)yyy-->/) {
> > print OUTFILE "$1\t$2\n";
> > }
> > }
> >}



> >>Thanks for the replies.

> >>Each number is always between the letters jjj and yyy, as follows:

> >><!--qqq8/4/85jjj198yyy-->

> >>Each document has 1-50 such entries (along with alot of other text).
> >>This "jjj" followed by the number followed by "yyy" arrangement is, by
> >>design, unique to the numbers I am looking for. The numbers range from
> >>1-3060.

> >>I'll await your response!

> >>Jerry

> >>On Thu, 10 May 2001 05:44:07 +0100, John Nurick

> >>>Word is a great word processor but VBA isn't a great text processing
> >>>language, and if this is going to be a regular kind of job for you
> >>>there are better tools than Word for doing it with. I'd use Perl,
> >>>which is specifically designed for extracting data from text files and
> >>>nowadays comes with a readymade module (HTML::TokeParser) for
> >>>extracting the meat from HTML tags.

> >>>But one can do a great deal with Word and VBA. Can you show us just
> >>>how the numbers you are  interested in appear in the comments in the
> >>>HTML?



> >>>>Hello,

> >>>>I have about 100 .htm documents in eight different folders (If
> >>>>necessary, I could put them all into one folder for this macro).

> >>>>What I need to do is take the source code from each of the 100
> >>>>documents, and place the code into one big word97 .doc.

> >>>>Then, I would know how to write a word 97 macro to extract and analyze
> >>>>certain numbers that appear only in the code, not on the web page as
> >>>>opened in a browser, because I put those numbers in the code as
> >>>>comments.

> >>>>I know I can open each document in word 97, view the code and copy it
> >>>>manually.

> >>>>I would like to be able to automate gathering the code from these 100
> >>>>documents  and placing it in one word97 document, because  I'll not
> >>>>only have to do it this one time with these 100 .htm documents, but
> >>>>over and over again later on, with an ever-increasing number of
> >>>>documents.

> >>>>Help much appreciated, as always!



Sat, 01 Nov 2003 03:18:40 GMT  
 how to gather html from several documents and place into one big document
Thanks, Jonathan, for your reply.  Seems to be just what I need.  I
think I'll be able to fill in the processing part because of all the
past help I've received here!



Quote:
>Hi JMB,

>Its not so hard to do in VBA. You need the FileSearch object

>Something like is will be what you want

>Dim i as Long
>Dim CurrentDoc As Document
>With Application.FileSearch
>    .NewSearch
>    .LookIn = "C:\My Documents\My special folder"
>    .SearchSubFolders = True
>    .FileName = "*.html"
>    .FileType = msoFileTypeAllFiles
>    If .Execute() > 0 Then
>        For i = 1 To .FoundFiles.Count
>            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
>                    Format:=wdOpenFormatText)

>'put your processing code here

>            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
>        Next i
>    Else
>        MsgBox "There were no files found."
>    End If
>End With



Sat, 01 Nov 2003 06:51:59 GMT  
 how to gather html from several documents and place into one big document
Well, I got it to work.  Thanks very much!

I realize I need one more macro that I can apply to the list after
it's gathered.

I'd like to eliminate all duplicate lines.

Each line is a separate paragraph and contains a number from 1-3060.

So, for example, if the list were:

1
47
3003
68
1
777
3003

I want the macro to eliminate the duplicate lines so that the list
will then be:

1
47
3003
68
777.

I know there's  a better way to do this than just to run an endless
series of searches and deletes.

Your help, as always, is much appreciated!

Jerry



Quote:
>Hi JMB,

>Its not so hard to do in VBA. You need the FileSearch object

>Something like is will be what you want

>Dim i as Long
>Dim CurrentDoc As Document
>With Application.FileSearch
>    .NewSearch
>    .LookIn = "C:\My Documents\My special folder"
>    .SearchSubFolders = True
>    .FileName = "*.html"
>    .FileType = msoFileTypeAllFiles
>    If .Execute() > 0 Then
>        For i = 1 To .FoundFiles.Count
>            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
>                    Format:=wdOpenFormatText)

>'put your processing code here

>            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
>        Next i
>    Else
>        MsgBox "There were no files found."
>    End If
>End With



Sat, 01 Nov 2003 08:36:37 GMT  
 how to gather html from several documents and place into one big document
Hi Jerry,

Does it matter whether the remaining lines are in their original order? If
it does, then you are stuck with your "endless series of searches and
deletes". If not, then you can proceed as follows.

1. Mark you list either with the Selection or with a Range variable

2. Use the Sort method to sort the paragraphs in the list

3. Go through the list once from top to bottom, deleting *adjacent*
duplicate paragraphs

This shouldn't be too hard to code, but if you get stuck, come back and I'll
see if I can put something together

--
Regards
Jonathan West - Word MVP
MultiLinker - Automated generation of hyperlinks in Word
Conversion to PDF & HTML
http://www.multilinker.com
Word FAQs at http://www.multilinker.com/wordfaq
Please post any follow-up in the newsgroup. I do not reply to Word questions
by email


Quote:
> Well, I got it to work.  Thanks very much!

> I realize I need one more macro that I can apply to the list after
> it's gathered.

> I'd like to eliminate all duplicate lines.

> Each line is a separate paragraph and contains a number from 1-3060.

> So, for example, if the list were:

> 1
> 47
> 3003
> 68
> 1
> 777
> 3003

> I want the macro to eliminate the duplicate lines so that the list
> will then be:

> 1
> 47
> 3003
> 68
> 777.

> I know there's  a better way to do this than just to run an endless
> series of searches and deletes.

> Your help, as always, is much appreciated!

> Jerry



> >Hi JMB,

> >Its not so hard to do in VBA. You need the FileSearch object

> >Something like is will be what you want

> >Dim i as Long
> >Dim CurrentDoc As Document
> >With Application.FileSearch
> >    .NewSearch
> >    .LookIn = "C:\My Documents\My special folder"
> >    .SearchSubFolders = True
> >    .FileName = "*.html"
> >    .FileType = msoFileTypeAllFiles
> >    If .Execute() > 0 Then
> >        For i = 1 To .FoundFiles.Count
> >            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
> >                    Format:=wdOpenFormatText)

> >'put your processing code here

> >            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
> >        Next i
> >    Else
> >        MsgBox "There were no files found."
> >    End If
> >End With



Sat, 01 Nov 2003 16:58:44 GMT  
 how to gather html from several documents and place into one big document
Hi Jerry

Do a sort and then use the steps described here:

Delete any paragraph that is an exact duplicate of the preceding paragraph,
using a Range object
http://www.mvps.org/word/FAQs/MacrosVBA/DeleteParaRnge.htm

Regards

Dave


| Well, I got it to work.  Thanks very much!
|
| I realize I need one more macro that I can apply to the list after
| it's gathered.
|
| I'd like to eliminate all duplicate lines.
|
| Each line is a separate paragraph and contains a number from 1-3060.
|
| So, for example, if the list were:
|
| 1
| 47
| 3003
| 68
| 1
| 777
| 3003
|
| I want the macro to eliminate the duplicate lines so that the list
| will then be:
|
| 1
| 47
| 3003
| 68
| 777.
|
| I know there's  a better way to do this than just to run an endless
| series of searches and deletes.
|
| Your help, as always, is much appreciated!
|
| Jerry
|

|
| >Hi JMB,
| >
| >Its not so hard to do in VBA. You need the FileSearch object
| >
| >Something like is will be what you want
| >
| >Dim i as Long
| >Dim CurrentDoc As Document
| >With Application.FileSearch
| >    .NewSearch
| >    .LookIn = "C:\My Documents\My special folder"
| >    .SearchSubFolders = True
| >    .FileName = "*.html"
| >    .FileType = msoFileTypeAllFiles
| >    If .Execute() > 0 Then
| >        For i = 1 To .FoundFiles.Count
| >            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
| >                    Format:=wdOpenFormatText)
| >
| >'put your processing code here
| >
| >            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
| >        Next i
| >    Else
| >        MsgBox "There were no files found."
| >    End If
| >End With
|



Sat, 01 Nov 2003 20:42:56 GMT  
 how to gather html from several documents and place into one big document
On Tue, 15 May 2001 10:58:44 +0200, "Jonathan West"

Quote:

> Does it matter whether the remaining lines are in their original order? If
> it does, then you are stuck with your "endless series of searches and
> deletes". ...

Nope. He'll not be stuck with it. He would need to put 'original
sequence numbers' before or after the remaining numbers in his
paragraphs separated by tabs. These two-field records can be
easily sorted by field.
Sort by data field, remove dupes as you did, then sort back by
sequence number and remove the sequence numbers.

Quote:
> ... If not, then you can proceed as follows.

> 1. Mark you list either with the Selection or with a Range variable

> 2. Use the Sort method to sort the paragraphs in the list

> 3. Go through the list once from top to bottom, deleting *adjacent*
> duplicate paragraphs

> This shouldn't be too hard to code, but if you get stuck, come back and I'll
> see if I can put something together

> --
> Regards
> Jonathan West - Word MVP
> MultiLinker - Automated generation of hyperlinks in Word
> Conversion to PDF & HTML
> http://www.multilinker.com
> Word FAQs at http://www.multilinker.com/wordfaq
> Please post any follow-up in the newsgroup. I do not reply to Word questions
> by email



> > Well, I got it to work.  Thanks very much!

> > I realize I need one more macro that I can apply to the list after
> > it's gathered.

> > I'd like to eliminate all duplicate lines.

> > Each line is a separate paragraph and contains a number from 1-3060.

> > So, for example, if the list were:

> > 1
> > 47
> > 3003
> > 68
> > 1
> > 777
> > 3003

> > I want the macro to eliminate the duplicate lines so that the list
> > will then be:

> > 1
> > 47
> > 3003
> > 68
> > 777.

> > I know there's  a better way to do this than just to run an endless
> > series of searches and deletes.

> > Your help, as always, is much appreciated!

> > Jerry



> > >Hi JMB,

> > >Its not so hard to do in VBA. You need the FileSearch object

> > >Something like is will be what you want

> > >Dim i as Long
> > >Dim CurrentDoc As Document
> > >With Application.FileSearch
> > >    .NewSearch
> > >    .LookIn = "C:\My Documents\My special folder"
> > >    .SearchSubFolders = True
> > >    .FileName = "*.html"
> > >    .FileType = msoFileTypeAllFiles
> > >    If .Execute() > 0 Then
> > >        For i = 1 To .FoundFiles.Count
> > >            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
> > >                    Format:=wdOpenFormatText)

> > >'put your processing code here

> > >            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
> > >        Next i
> > >    Else
> > >        MsgBox "There were no files found."
> > >    End If
> > >End With

--
Greetings from
 _____
 /_|__| Auke Reitsma, Delft, The Netherlands.
/  | \  -------------------------------------
        Remove SPAMBLOCK from my address ...


Mon, 03 Nov 2003 05:10:44 GMT  
 how to gather html from several documents and place into one big document
Thanks for the additional idea.

By 'original sequence numbers' I assume you don't mean the numbering
from the toolbar, because those will switch when the fields are
sorted.

I guess a macro that went to the beginning of each line and pasted in
"n"-tab, where n increased by 1 after each insertion, would work.
That I know how to do. If there's an easier way, let me know please.

Jerry


Quote:

>On Tue, 15 May 2001 10:58:44 +0200, "Jonathan West"

>> Does it matter whether the remaining lines are in their original order? If
>> it does, then you are stuck with your "endless series of searches and
>> deletes". ...

>Nope. He'll not be stuck with it. He would need to put 'original
>sequence numbers' before or after the remaining numbers in his
>paragraphs separated by tabs. These two-field records can be
>easily sorted by field.
>Sort by data field, remove dupes as you did, then sort back by
>sequence number and remove the sequence numbers.

>> ... If not, then you can proceed as follows.

>> 1. Mark you list either with the Selection or with a Range variable

>> 2. Use the Sort method to sort the paragraphs in the list

>> 3. Go through the list once from top to bottom, deleting *adjacent*
>> duplicate paragraphs

>> This shouldn't be too hard to code, but if you get stuck, come back and I'll
>> see if I can put something together

>> --
>> Regards
>> Jonathan West - Word MVP
>> MultiLinker - Automated generation of hyperlinks in Word
>> Conversion to PDF & HTML
>> http://www.multilinker.com
>> Word FAQs at http://www.multilinker.com/wordfaq
>> Please post any follow-up in the newsgroup. I do not reply to Word questions
>> by email



>> > Well, I got it to work.  Thanks very much!

>> > I realize I need one more macro that I can apply to the list after
>> > it's gathered.

>> > I'd like to eliminate all duplicate lines.

>> > Each line is a separate paragraph and contains a number from 1-3060.

>> > So, for example, if the list were:

>> > 1
>> > 47
>> > 3003
>> > 68
>> > 1
>> > 777
>> > 3003

>> > I want the macro to eliminate the duplicate lines so that the list
>> > will then be:

>> > 1
>> > 47
>> > 3003
>> > 68
>> > 777.

>> > I know there's  a better way to do this than just to run an endless
>> > series of searches and deletes.

>> > Your help, as always, is much appreciated!

>> > Jerry



>> > >Hi JMB,

>> > >Its not so hard to do in VBA. You need the FileSearch object

>> > >Something like is will be what you want

>> > >Dim i as Long
>> > >Dim CurrentDoc As Document
>> > >With Application.FileSearch
>> > >    .NewSearch
>> > >    .LookIn = "C:\My Documents\My special folder"
>> > >    .SearchSubFolders = True
>> > >    .FileName = "*.html"
>> > >    .FileType = msoFileTypeAllFiles
>> > >    If .Execute() > 0 Then
>> > >        For i = 1 To .FoundFiles.Count
>> > >            Set CurrentDoc = Documents.Open(FileName:=.FoundFiles(i), _
>> > >                    Format:=wdOpenFormatText)

>> > >'put your processing code here

>> > >            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
>> > >        Next i
>> > >    Else
>> > >        MsgBox "There were no files found."
>> > >    End If
>> > >End With



Mon, 03 Nov 2003 11:52:31 GMT  
 how to gather html from several documents and place into one big document
Since you are using unformatted text obtained from an ASCII file,
you might want to consider a macro like this:

(The numbers have already been extracted and reside in a
separate document by themselves when this code starts.)

Dim myString As String
Dim myPara As Paragraph

myString = vbCr & ActiveDocument.Paragraphs(1).Range.Text
For Each myPara In ActiveDocument.Paragraphs
If InStr(1, myString, vbCr & myPara.Range.Text, 1) = 0 Then myString =
myString & myPara.Range.Text
Next myPara
ActiveDocument.Content.Text = myString
ActiveDocument.Paragraphs(1).Range.Text = ""


Quote:
> Thanks for the additional idea.

> By 'original sequence numbers' I assume you don't mean the numbering
> from the toolbar, because those will switch when the fields are
> sorted.

> I guess a macro that went to the beginning of each line and pasted in
> "n"-tab, where n increased by 1 after each insertion, would work.
> That I know how to do. If there's an easier way, let me know please.

> Jerry



> >On Tue, 15 May 2001 10:58:44 +0200, "Jonathan West"

> >> Does it matter whether the remaining lines are in their original
order? If
> >> it does, then you are stuck with your "endless series of searches
and
> >> deletes". ...

> >Nope. He'll not be stuck with it. He would need to put 'original
> >sequence numbers' before or after the remaining numbers in his
> >paragraphs separated by tabs. These two-field records can be
> >easily sorted by field.
> >Sort by data field, remove dupes as you did, then sort back by
> >sequence number and remove the sequence numbers.

> >> ... If not, then you can proceed as follows.

> >> 1. Mark you list either with the Selection or with a Range variable

> >> 2. Use the Sort method to sort the paragraphs in the list

> >> 3. Go through the list once from top to bottom, deleting *adjacent*
> >> duplicate paragraphs

> >> This shouldn't be too hard to code, but if you get stuck, come back
and I'll
> >> see if I can put something together

> >> --
> >> Regards
> >> Jonathan West - Word MVP
> >> MultiLinker - Automated generation of hyperlinks in Word
> >> Conversion to PDF & HTML
> >> http://www.multilinker.com
> >> Word FAQs at http://www.multilinker.com/wordfaq
> >> Please post any follow-up in the newsgroup. I do not reply to Word
questions
> >> by email



> >> > Well, I got it to work.  Thanks very much!

> >> > I realize I need one more macro that I can apply to the list
after
> >> > it's gathered.

> >> > I'd like to eliminate all duplicate lines.

> >> > Each line is a separate paragraph and contains a number from
1-3060.

> >> > So, for example, if the list were:

> >> > 1
> >> > 47
> >> > 3003
> >> > 68
> >> > 1
> >> > 777
> >> > 3003

> >> > I want the macro to eliminate the duplicate lines so that the
list
> >> > will then be:

> >> > 1
> >> > 47
> >> > 3003
> >> > 68
> >> > 777.

> >> > I know there's  a better way to do this than just to run an
endless
> >> > series of searches and deletes.

> >> > Your help, as always, is much appreciated!

> >> > Jerry

> >> > On Mon, 14 May 2001 21:18:40 +0200, "Jonathan West"


- Show quoted text -

Quote:

> >> > >Hi JMB,

> >> > >Its not so hard to do in VBA. You need the FileSearch object

> >> > >Something like is will be what you want

> >> > >Dim i as Long
> >> > >Dim CurrentDoc As Document
> >> > >With Application.FileSearch
> >> > >    .NewSearch
> >> > >    .LookIn = "C:\My Documents\My special folder"
> >> > >    .SearchSubFolders = True
> >> > >    .FileName = "*.html"
> >> > >    .FileType = msoFileTypeAllFiles
> >> > >    If .Execute() > 0 Then
> >> > >        For i = 1 To .FoundFiles.Count
> >> > >            Set CurrentDoc =

Documents.Open(FileName:=.FoundFiles(i), _

- Show quoted text -

Quote:
> >> > >                    Format:=wdOpenFormatText)

> >> > >'put your processing code here

> >> > >            CurrentDoc.Close SaveChanges:=wdDoNotSaveChanges
> >> > >        Next i
> >> > >    Else
> >> > >        MsgBox "There were no files found."
> >> > >    End If
> >> > >End With



Tue, 04 Nov 2003 01:21:40 GMT  
 
 [ 17 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Merging several documents into one

2. Creating several tables in one document.

3. Combine several URLs in one document?

4. One Big Module or Several Small Ones?

5. Display a VBD file from an ActiveX Document Server DLL in a HTML Document

6. Converting RTF documents to HTML documents programmatically

7. create several small files from one big

8. Replace Several Different Words in A document

9. Document Generator, several questions

10. Build a TOC across several documents

11. Same information inputed into several documents

12. Using form to enter the same information into several documents

 

 
Powered by phpBB® Forum Software