Save HTML page as TXT 
Author Message
 Save HTML page as TXT

Hello everyone,

I spend over 6 hours to find an answer to my problem and was unsuccessfull.
Please help me if you know the answer. Her is what I'm trying to do.

I want to save html page as a text file using either webbrowser control or
inernet control. I know that there is some way to strip all the tags and
leave only the text.

Once I save it as a text file I want to find the most often repeated keyword
, do you know how this could be don eas well? I don't thing that parsing the
text file character by character is going to be very efficient.

Basically I want to strip html of all the tags so I can parse it then and
find 20-50 most often repeated words.

Any help would be greatly appreciated.

Martin



Wed, 18 Jun 1902 08:00:00 GMT  
 Save HTML page as TXT
The first part:
text = WebBrowser1.Document.documentElement.OuterText

The last part:
Create 2 arrays, one for the words & one for the count (or a single array of
UDTs). Look up each word in the word array, & if it exists increment the
count, otherwise add it to the list. For extra speed, you could use a hash
table.

Now you need someone to help you with the middle part (parsing the text to
extract individual words)! If I had to do it, I would dig around some of the
VB sites to look for an enhanced Split function that accepts multiple
delimiters (space, full-stop, comma etc). I'm fairly sure I've seen one in
www.vb2themax.com .

--
RobSmith


: Hello everyone,
:
: I spend over 6 hours to find an answer to my problem and was
unsuccessfull.
: Please help me if you know the answer. Her is what I'm trying to do.
:
: I want to save html page as a text file using either webbrowser control or
: inernet control. I know that there is some way to strip all the tags and
: leave only the text.
:
: Once I save it as a text file I want to find the most often repeated
keyword
: , do you know how this could be don eas well? I don't thing that parsing
the
: text file character by character is going to be very efficient.
:
: Basically I want to strip html of all the tags so I can parse it then and
: find 20-50 most often repeated words.
:
: Any help would be greatly appreciated.
:
: Martin
:
:
:



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Save current html page as html file

2. save html code to txt file

3. How can i save html page of iexplorer(or netscape)

4. Save each page as seperate HTML or PDF file

5. Saving modified HTML page

6. save html page

7. how can i save an html page in access

8. save html 1 (not many) pages

9. Need to retrieve HTML pages and save as Text via VB or VBA

10. Save a html page like a picture

11. Saving a HTML page

12. saving html page to a file

 

 
Powered by phpBB® Forum Software