Getting Tab separated data from Word to Access, Excel or simply process it from VB 
Author Message
 Getting Tab separated data from Word to Access, Excel or simply process it from VB

Hello,

I need to get a tab separated data in rtf or doc file to Access or Excel or
just be able to process it in vb code.

The problem worsens as some of the data is in Russian and some rows contain
totals and I will compute my own totals so these totals should be ignored.

Ideally it should be done using vb code in vb application. User opens the
file and application evaluates and transfers data. But I will be thankful
for any method to get the data to Access or Excel, including using macros or
manually resaving the file in some compatible format.

Thanks in advance.

 Denis.



Tue, 06 Jul 2004 01:24:37 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB

Quote:
>-----Original Message-----
>Hello,

>I need to get a tab separated data in rtf or doc file to
Access or Excel or
>just be able to process it in vb code.

>The problem worsens as some of the data is in Russian and
some rows contain
>totals and I will compute my own totals so these totals
should be ignored.

>Ideally it should be done using vb code in vb

application. User opens the
Quote:
>file and application evaluates and transfers data. But I
will be thankful
>for any method to get the data to Access or Excel,

including using macros or

Quote:
>manually resaving the file in some compatible format.

>Thanks in advance.

> Denis.

Take a look at the FileSystemObject.


Tue, 06 Jul 2004 04:24:57 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis,
In code, open the .rtf/.doc file with Word. Do a saveas .txt
Close word (or the file) and open the txt file with Excel.
Can be done from Word or Excel VBA or VB6.

John


Quote:
> Hello,

> I need to get a tab separated data in rtf or doc file to Access or Excel
or
> just be able to process it in vb code.

> The problem worsens as some of the data is in Russian and some rows
contain
> totals and I will compute my own totals so these totals should be ignored.

> Ideally it should be done using vb code in vb application. User opens the
> file and application evaluates and transfers data. But I will be thankful
> for any method to get the data to Access or Excel, including using macros
or
> manually resaving the file in some compatible format.

> Thanks in advance.

>  Denis.



Tue, 06 Jul 2004 07:00:14 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
HI Denis,

This can be done (either manually or from VB using Automation to
control the MS Office applications) but it might be hard work. Here's
some food for thought which should help you decide which way you want
to go. When you're ready, please post back to the one or two most
relevant groups: the scattergun approach across many groups is widely
disapproved of<g>.

1) Basically either Access or Excel will happily import tab-separated
data from a plain text file though not from a formatted .rtf or .doc
file. So using Word to convert the file to plain text *may* be a good
start, but see below.

2) The built-in Access import procedures won't work on files that have
inconsistent rows (except for a single header row containing field
names). So if the "totals" rows in your documents are structured
differently from the ordinary data rows, or if there are multiple
header rows, you'll need to either: use Excel; write VBA code to
custom-import them in to Access; or process the files to strip these
rows before importing them.

3) When you say "some of the data is in Russian", do you mean
(a) some files are in Russian and others in English?
(b) some fields are in Russian and others in English?
(c) Russian and English are indiscriminately mixed in the fields?

4) In the rtf/doc files, is the Russian text
(a) in a non-Unicode cyrillic font (the only way you could do it in
versions before Word 2000)?
(b) in the cyrillic "page" of a Unicode font (if this is the case, in
Word 2000 and later, it should be possible to format the entire
document in one Unicode font containing both roman and cyrillic
characters without screwing up the English and Russian)?

5) Ordinary Access text fields cannot contain mixed formatting. The
font etc. you specify for display of the field is applied to the
entire field. So if (3c) you need to mix cyrillic and roman within a
single field (either in one record or cyrillic in some records and
roman in others) the only practicable way is to use a version of
Access that knows about Unicode text (i.e. 2000 or XP).

(In principle you can store formatted text in a text field as an RTF
string, but this would make querying and updating hellishly
difficult.)

If (3a) but you need to combine cyrillic and roman documents into a
single table, the situation is the same as (3c). Otherwise - some
tables cyrillic, some roman, or (3b) some fields one, some the other -
no problem: just apply the right fonts to the fields.

Quote:

>Hello,

>I need to get a tab separated data in rtf or doc file to Access or Excel or
>just be able to process it in vb code.

>The problem worsens as some of the data is in Russian and some rows contain
>totals and I will compute my own totals so these totals should be ignored.

>Ideally it should be done using vb code in vb application. User opens the
>file and application evaluates and transfers data. But I will be thankful
>for any method to get the data to Access or Excel, including using macros or
>manually resaving the file in some compatible format.

>Thanks in advance.

> Denis.

John Nurick [Access MVP]

Please do not respond by email, but to the newgroup.



Tue, 06 Jul 2004 15:10:44 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Denis,

Most of this dual-alphabet stuff is beyond my expertise. Let's hope
someone in one of the Russian groups comes to the rescue. More
comments inline.

Quote:

>Unfortunately converting to plain text won't work (Russian text gets screwed
>up).

>About the files, some come only in Russian, some come mixed, but all of them
>are in Unicode.

When I tried saving your rtf sample as text with Word XP, it offered a
couple of 1-byte encodings that seemed to preserve the cyrillic
characters, though as I don't have a 1-byte cyrillic font I couldn't
make sure. If this is the case, you should be able to import to
Access, formatting the fields with a 1-byte cyrillic font (they
contain the roman characters too, don't they?) with the same encoding.

Also, I feel there ought to be some way of getting Word to save the
document as a Unicode (2-byte) text file, but don't know whether or
what it might be.

Quote:
>Tried selecting everything, copying to clipboard and posting it to Excel,
>that almost worked.
>2 problems:
>1. Excel converted some data to Date format. Example: I had 4/9 when pasted
>to Excel became: 5-Apr
>How do I stop it?

AFAIK you have to specify that the column is text when you import it,
or maybe you format the cells in the column in question as Text before
you import the stuff. Experiment and you'll find out. Another approach
- for Excel - is to put an apostrophe at the beginning of the field to
force it to be interpreted as text and not as a date.

Quote:
>2. Not able to compute totals and use formulas on most fields, even those
>that are currencies. Went to formatting and tried to convert to number or
>currency, didn't seem to work.

The RTF sample uses commas for decimal points. If your Windows
regional settings are for a zone that uses periods, Excel will
interpret the numbers in the sample as text. (e.g. 97,30 will go into
the cell as the text "97,30", not as the number 97.30000000).
Subsequent changes in the cell format won't affect this; you have to
re-enter the number with the correct decimal symbol. This problem
should disappear if you change regional settings to Russia or
somewhere else that uses a comma for the decimal point.

Quote:

>Kind of solved the second problem by after pasting to Excel copying again
>and pasting it in a table in Access, by default it converted almost all
>fields to text, so changed the appropriate fields to number and currency
>types, saved, then exported the table in excel format, now all the
>computations work, but this is such a pain in the {*filter*}procedure and there
>is still the first problem (date formatting).

>If you guys know how to solve these problems, please let me know.
>I am attaching a sample rtf file.

The sample appears to have been produced with no concern for anything
but the visible layout. The tab stops and numbers of tabs are
inconsistent from line to line, which means that corresponding items
in successive lines may actually be imported into different fields:
another lot of problems in addition to the dual fonts.

BTW, is this a really representative sample? I observe that the only
cyrillic characters in it are in
  (a) header and footer
  (b) the first field.
The header and footer can be handled separately - must be, if you're
using Access as a data store. And surely the first field could be
translated or transliterated with a simple search-and-replace in Word.
IOW, is it really necessary to have both roman and cyrillic in your
tables?

HTH

John Nurick [Access MVP]

Please do not respond by email, but to the newgroup.



Wed, 07 Jul 2004 15:05:56 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis,

Quote:
> I need to get a tab separated data in rtf or doc file to Access or Excel or
> just be able to process it in vb code.

Have you tried extracting it using the TextStream object of the
FileSystemObject? Then ADO or DAO to put it into Access or Excel?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister
http://www.mvps.org/word
http://go.compuserve.com/MSOfficeForum

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)



Fri, 09 Jul 2004 18:52:52 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
On Mon, 21 Jan 2002 11:52:52 +0100, Cindy Meister  -WordMVP-

Quote:

>> I need to get a tab separated data in rtf or doc file to Access or Excel or
>> just be able to process it in vb code.

>Have you tried extracting it using the TextStream object of the
>FileSystemObject? Then ADO or DAO to put it into Access or Excel?

Hi Cindy,

Am I missing something? I didn't think TextStream could parse .doc or
.rtf files.

John Nurick [Access MVP]

Please do not respond by email, but to the newgroup.



Sat, 10 Jul 2004 05:44:23 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hello Cindy,

I took a look at the  TextStream object, not sure how will it help me.
Doesn't it just allow to transfer data between files? Could you be more
specific on its usage? If you could paste a sample code that would be great.
Sorry for the dumb questions, I am new at this.

Thanks.
 Denis.



Quote:
> Hi Denis,

> > I need to get a tab separated data in rtf or doc file to Access or Excel
or
> > just be able to process it in vb code.

> Have you tried extracting it using the TextStream object of the
> FileSystemObject? Then ADO or DAO to put it into Access or Excel?

> Cindy Meister
> INTER-Solutions, Switzerland
> http://homepage.swissonline.ch/cindymeister
> http://www.mvps.org/word
> http://go.compuserve.com/MSOfficeForum

> This reply is posted in the Newsgroup; please post any follow question or
> reply in the newsgroup and not by e-mail :-)



Sun, 11 Jul 2004 01:22:45 GMT  
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Since my last post I've asked a question and experimented a bit.
Results seem to be:

1) Properly installed, Word XP can export the text of at least some
documents as plain Unicode text.  (File|Save As; Plain Text; Other
encoding; Unicode). Here's the command as captured by the macro
recorder:

  ActiveDocument.SaveAs FileName:="car1.17priceD2.txt", FileFormat:= _
    wdFormatText, LockComments:=False, Password:="",
AddToRecentFiles:=True, _
    WritePassword:="", ReadOnlyRecommended:=False,
EmbedTrueTypeFonts:=False, _
     SaveNativePictureFormat:=False, SaveFormsData:=False,
SaveAsAOCELetter:= _
    False, Encoding:=1200, InsertLineBreaks:=False,
AllowSubstitutions:=False _
    , LineEnding:=wdCRLF

2) Access XP can read and import data from Unicode textfiles into text
fields while retaining the Unicode. (I just used Get External Data to
import your sample file into a table with just one 255-character wide
text field.)

So in principle what you want is possible. All that remains is the
really {*filter*} stuff of extracting the data from an irregularly
formatted word processor file and parsing it into the right fields.

Possibly the way to go would be to use the VBScript TextStream and
RegExp objects (usable in VBA code if you set references to the
appropriate libraries); I think these will give you more flexibility
than trying to do it all in VBA with InStr() and Mid().

Hope this helps!



Quote:
>Denis,

>Most of this dual-alphabet stuff is beyond my expertise. Let's hope
>someone in one of the Russian groups comes to the rescue. More
>comments inline.


>>Unfortunately converting to plain text won't work (Russian text gets screwed
>>up).

>>About the files, some come only in Russian, some come mixed, but all of them
>>are in Unicode.

>When I tried saving your rtf sample as text with Word XP, it offered a
>couple of 1-byte encodings that seemed to preserve the cyrillic
>characters, though as I don't have a 1-byte cyrillic font I couldn't
>make sure. If this is the case, you should be able to import to
>Access, formatting the fields with a 1-byte cyrillic font (they
>contain the roman characters too, don't they?) with the same encoding.

>Also, I feel there ought to be some way of getting Word to save the
>document as a Unicode (2-byte) text file, but don't know whether or
>what it might be.

>>Tried selecting everything, copying to clipboard and posting it to Excel,
>>that almost worked.
>>2 problems:
>>1. Excel converted some data to Date format. Example: I had 4/9 when pasted
>>to Excel became: 5-Apr
>>How do I stop it?

>AFAIK you have to specify that the column is text when you import it,
>or maybe you format the cells in the column in question as Text before
>you import the stuff. Experiment and you'll find out. Another approach
>- for Excel - is to put an apostrophe at the beginning of the field to
>force it to be interpreted as text and not as a date.

>>2. Not able to compute totals and use formulas on most fields, even those
>>that are currencies. Went to formatting and tried to convert to number or
>>currency, didn't seem to work.

>The RTF sample uses commas for decimal points. If your Windows
>regional settings are for a zone that uses periods, Excel will
>interpret the numbers in the sample as text. (e.g. 97,30 will go into
>the cell as the text "97,30", not as the number 97.30000000).
>Subsequent changes in the cell format won't affect this; you have to
>re-enter the number with the correct decimal symbol. This problem
>should disappear if you change regional settings to Russia or
>somewhere else that uses a comma for the decimal point.

>>Kind of solved the second problem by after pasting to Excel copying again
>>and pasting it in a table in Access, by default it converted almost all
>>fields to text, so changed the appropriate fields to number and currency
>>types, saved, then exported the table in excel format, now all the
>>computations work, but this is such a pain in the {*filter*}procedure and there
>>is still the first problem (date formatting).

>>If you guys know how to solve these problems, please let me know.
>>I am attaching a sample rtf file.

>The sample appears to have been produced with no concern for anything
>but the visible layout. The tab stops and numbers of tabs are
>inconsistent from line to line, which means that corresponding items
>in successive lines may actually be imported into different fields:
>another lot of problems in addition to the dual fonts.

>BTW, is this a really representative sample? I observe that the only
>cyrillic characters in it are in
>  (a) header and footer
>  (b) the first field.
>The header and footer can be handled separately - must be, if you're
>using Access as a data store. And surely the first field could be
>translated or transliterated with a simple search-and-replace in Word.
>IOW, is it really necessary to have both roman and cyrillic in your
>tables?

>HTH

>John Nurick [Access MVP]

>Please do not respond by email, but to the newgroup.

John Nurick [Access MVP]

Please do not respond by email, but to the newgroup.



Tue, 13 Jul 2004 06:06:53 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Getting Tab separated data from Word to Access, Excel or simply process it from VB

2. Getting Tab separated data from Word to Access, Excel or process it from VB

3. Help - Simply Getting Data from Extrenal Data Files (not from Access Data Base)

4. processing excel rows - separate workbooks/worksheets

5. Using Word to pre-process data for Access table

6. Getting Word Data to Access

7. Getting data from security-enabled Access 2 to Excel 7 using DAO

8. Getting access data into a combobox on a word form

9. Separating Data from Program - Access 2.0

10. Getting data from an out-of-process exe

11. API code for Ending a process in NT TaskMgr/Processes Tab

12. Getting values of Excel cells from word VBA

 

 
Powered by phpBB® Forum Software