Need help with extracting and comparing HTML text. 
Author Message
 Need help with extracting and comparing HTML text.

Hi Guys,

I am new to the Perl domain and would really appreciate if someone
can help shed some light on this particular issue.

Objective :

I need to compare 2 files (ie. File_A.hhc and File_B.hhc) for the
differences only on the title headings to verify what are the current
headings that are missing in File_A opposed to File_B.
(Given the assumption that File_B has more contents than File_A)

For example :

Contents in File_A.hhc

<LI> <OBJECT type="text/sitemap">
    <param name="Name" value="Modifying Datasource Groups">
    <param name="Local" value="html\ModifyDatasourceGroup.htm">
    </OBJECT>
   <LI> <OBJECT type="text/sitemap">
    <param name="Name" value="Removing Datasource Groups">
    <param name="Local" value="html\RemoveDatasourceGroup.htm">
    </OBJECT>

What I need to extract out from the HTML text is  :-

a) Modifying Datasource Groups
b) Remove Datasource Groups

Contents in File_B.hhc

<LI> <OBJECT type="text/sitemap">
    <param name="Name" value="Modifying Datasource Groups">
    <param name="Local" value="html\ModifyDatasourceGroup.htm">
    </OBJECT>
   <LI> <OBJECT type="text/sitemap">
    <param name="Name" value="Removing Datasource Groups">
    <param name="Local" value="html\RemoveDatasourceGroup.htm">
    </OBJECT>
<LI> <OBJECT type="text/sitemap">
    <param name="Name" value="Changing Datasource Groups">
    <param name="Local" value="html\ChangeGroups.htm">
    </OBJECT>

What I need to extract out from the HTML text is  :-

a) Modifying Datasource Groups
b) Remove Datasource Groups
c) Changing Datasource Groups

As you can see that File_B.hhc have an additional title heading called
"Changing Datasource
Groups". I would like to print out all the title headings that File_A.hhc do
not have in File_B.hhc.

Please let me know if you have a brief moment to spare solving this puzzle.

Thanks in advance.

Lim-



Wed, 18 Jun 1902 08:00:00 GMT  
 Need help with extracting and comparing HTML text.

Quote:
> Hi Guys,

> I am new to the Perl domain and would really appreciate if someone
> can help shed some light on this particular issue.

> Objective :

> I need to compare 2 files (ie. File_A.hhc and File_B.hhc) for the
> differences only on the title headings to verify what are the current
> headings that are missing in File_A opposed to File_B.
> (Given the assumption that File_B has more contents than File_A)

> For example :

> Contents in File_A.hhc

> <LI> <OBJECT type="text/sitemap">
>     <param name="Name" value="Modifying Datasource Groups">
>     <param name="Local" value="html\ModifyDatasourceGroup.htm">
>     </OBJECT>
>    <LI> <OBJECT type="text/sitemap">
>     <param name="Name" value="Removing Datasource Groups">
>     <param name="Local" value="html\RemoveDatasourceGroup.htm">
>     </OBJECT>

> What I need to extract out from the HTML text is  :-

> a) Modifying Datasource Groups
> b) Remove Datasource Groups

> Contents in File_B.hhc

> <LI> <OBJECT type="text/sitemap">
>     <param name="Name" value="Modifying Datasource Groups">
>     <param name="Local" value="html\ModifyDatasourceGroup.htm">
>     </OBJECT>
>    <LI> <OBJECT type="text/sitemap">
>     <param name="Name" value="Removing Datasource Groups">
>     <param name="Local" value="html\RemoveDatasourceGroup.htm">
>     </OBJECT>
> <LI> <OBJECT type="text/sitemap">
>     <param name="Name" value="Changing Datasource Groups">
>     <param name="Local" value="html\ChangeGroups.htm">
>     </OBJECT>

> What I need to extract out from the HTML text is  :-

> a) Modifying Datasource Groups
> b) Remove Datasource Groups
> c) Changing Datasource Groups

> As you can see that File_B.hhc have an additional title heading called
> "Changing Datasource
> Groups". I would like to print out all the title headings that File_A.hhc do
> not have in File_B.hhc.

> Please let me know if you have a brief moment to spare solving this puzzle.

It would be trivial to get the attributes to the param tags with the module
HTML::Parser and push them on to an array for each file and then use the
example in the FAQ to find the difference of the two arrays :

perlfaq4:

   How do I compute the difference of two arrays?  
   How do I compute the intersection of two arrays?

/J\
--
Jonathan Stowe
http://www.gellyfish.com
http://www.tackleway.co.uk



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Help Needed - dual text file parsing and comparing

2. need help extracting specific text area from input

3. Please help me how is easiest way to extract text between some variable text

4. Need help substituting text except when in an HTML anchor

5. Extracting key-value pairs from HTML FORM text

6. Extracting key-value pairs from HTML FORM text

7. Extract links and text from HTML

8. Need to extract the text from Microsoft Word 6 files

9. comparing data from 2 text file.. please help

10. Extracting from text files - Please help!

11. Extracting links from an HTML page - help!

12. Help extracting form fields from saved html file

 

 
Powered by phpBB® Forum Software