Parsing Word Docs 
Author Message
 Parsing Word Docs

I have been able to extractt he text from Word Docs with the following
bit of code:
#!/usr/bin/perl
# Parse word docs fed in at the command line
$file = $ARGV[0];
$startFlag = 0;
open(FILE,"<$file")||die $!;
while(<FILE>){
        chomp;
    if(s/^.*cgrid {//){
        $startFlag = 1;
        s/\\[a-z]+\d\d//g;
    }
    if($startFlag == 1){
        s/\\par //;
        s/[}{][}{]$//;
        s/\\[rl]quote /'/g;
        $h .= "$_\n";
    }

Quote:
}

close(FILE);
open(FILE,'>c:\temp\word_conversion.txt')||die $!;
print FILE $h;
close(FILE);
exit;

This only works on word 6/95. Has anyone been able to parse a word 97
doc?

Paul Coogan



Wed, 19 Jul 2000 03:00:00 GMT  
 Parsing Word Docs

Quote:

> This only works on word 6/95. Has anyone been able to parse a word 97
> doc?

Use Automation, and the SaveAs functionality to save the document
as text!

Scott
--
Look at Softbase Systems' client/server tools, www.softbase.com
Check out the Essential 97 package for Windows 95 www.skwc.com/essent
All my other cool web pages are available from that site too!
My demo tape, artwork, poetry, The Windows 95 Book FAQ, and more.



Wed, 19 Jul 2000 03:00:00 GMT  
 Parsing Word Docs

Quote:

> I have been able to extractt he text from Word Docs with the following
> bit of code: ...
> This only works on word 6/95. Has anyone been able to parse a word 97 doc?

Unfortunately the document file format of Word documents is that complicated,
that you cannot even be sure that your first version works. E.g. it will
fail for fastsaved documents, for documents with bigger texts, and for
documents that are internally too much segmented by what reason ever. Same
for Word 97. ... :-(

Because of these reasons you might have fun using OLE::Storage (or Laola)
instead, to be found at your CPAN site or at:

        http://wwwwbs.cs.tu-berlin.de/~schwartz/pmh/

Regards,

Martin

--
// Le degre zero de l'ecriture? Zero probleme!



Wed, 19 Jul 2000 03:00:00 GMT  
 Parsing Word Docs

Quote:


> > This only works on word 6/95. Has anyone been able to parse a word 97
> > doc?

> Use Automation, and the SaveAs functionality to save the document
> as text!

That's very nice if you are receiving the Word file on a PC with
Word installed, or if you can get the sender to do the SaveAs.
The real problem is when a PC user forwards a Word file to a non-PC
system.

--
Charles G. Margolin                   DSSD Internal Information Services




Fri, 21 Jul 2000 03:00:00 GMT  
 Parsing Word Docs



Quote:
> That's very nice if you are receiving the Word file on a PC with
> Word installed, or if you can get the sender to do the SaveAs.
> The real problem is when a PC user forwards a Word file to a non-PC
> system.

My solution was to use network protocols:  I mailed the Word file back to
the sender together with a note reading "I SAID PLAIN TEXT, DAMMIT!".

--
Rhodri James  *-*  Wildebeeste herder to the masses
If you don't know who I work for, you can't misattribute my words to them

... but that's a herring of a different colour



Sat, 22 Jul 2000 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Updating WORD docs via the web

2. Module to read Word Docs

3. Sending Word-Docs directly to the Browser via CGI

4. perl viewer for MS word docs

5. Parsing XML docs using a schema

6. parsing of here-docs

7. words words words

8. Parsing Lines for Words?

9. Parse a word into three strings

10. Parsing Word to ASCII

11. parsing a template and replacing certain words (from a form)

12. Parsing line of text into words

 

 
Powered by phpBB® Forum Software