Getting Tab separated data from Word to Access, Excel or simply process it from VB
Author |
Message |
Deni #1 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hello, I need to get a tab separated data in rtf or doc file to Access or Excel or just be able to process it in vb code. The problem worsens as some of the data is in Russian and some rows contain totals and I will compute my own totals so these totals should be ignored. Ideally it should be done using vb code in vb application. User opens the file and application evaluates and transfers data. But I will be thankful for any method to get the data to Access or Excel, including using macros or manually resaving the file in some compatible format. Thanks in advance. Denis.
|
Tue, 06 Jul 2004 01:24:37 GMT |
|
 |
jaf #2 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis, In code, open the .rtf/.doc file with Word. Do a saveas .txt Close word (or the file) and open the txt file with Excel. Can be done from Word or Excel VBA or VB6. John
Quote: > Hello, > I need to get a tab separated data in rtf or doc file to Access or Excel or > just be able to process it in vb code. > The problem worsens as some of the data is in Russian and some rows contain > totals and I will compute my own totals so these totals should be ignored. > Ideally it should be done using vb code in vb application. User opens the > file and application evaluates and transfers data. But I will be thankful > for any method to get the data to Access or Excel, including using macros or > manually resaving the file in some compatible format. > Thanks in advance. > Denis.
|
Tue, 06 Jul 2004 07:00:14 GMT |
|
 |
John Nuric #3 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
HI Denis, This can be done (either manually or from VB using Automation to control the MS Office applications) but it might be hard work. Here's some food for thought which should help you decide which way you want to go. When you're ready, please post back to the one or two most relevant groups: the scattergun approach across many groups is widely disapproved of<g>. 1) Basically either Access or Excel will happily import tab-separated data from a plain text file though not from a formatted .rtf or .doc file. So using Word to convert the file to plain text *may* be a good start, but see below. 2) The built-in Access import procedures won't work on files that have inconsistent rows (except for a single header row containing field names). So if the "totals" rows in your documents are structured differently from the ordinary data rows, or if there are multiple header rows, you'll need to either: use Excel; write VBA code to custom-import them in to Access; or process the files to strip these rows before importing them. 3) When you say "some of the data is in Russian", do you mean (a) some files are in Russian and others in English? (b) some fields are in Russian and others in English? (c) Russian and English are indiscriminately mixed in the fields? 4) In the rtf/doc files, is the Russian text (a) in a non-Unicode cyrillic font (the only way you could do it in versions before Word 2000)? (b) in the cyrillic "page" of a Unicode font (if this is the case, in Word 2000 and later, it should be possible to format the entire document in one Unicode font containing both roman and cyrillic characters without screwing up the English and Russian)? 5) Ordinary Access text fields cannot contain mixed formatting. The font etc. you specify for display of the field is applied to the entire field. So if (3c) you need to mix cyrillic and roman within a single field (either in one record or cyrillic in some records and roman in others) the only practicable way is to use a version of Access that knows about Unicode text (i.e. 2000 or XP). (In principle you can store formatted text in a text field as an RTF string, but this would make querying and updating hellishly difficult.) If (3a) but you need to combine cyrillic and roman documents into a single table, the situation is the same as (3c). Otherwise - some tables cyrillic, some roman, or (3b) some fields one, some the other - no problem: just apply the right fonts to the fields. Quote:
>Hello, >I need to get a tab separated data in rtf or doc file to Access or Excel or >just be able to process it in vb code. >The problem worsens as some of the data is in Russian and some rows contain >totals and I will compute my own totals so these totals should be ignored. >Ideally it should be done using vb code in vb application. User opens the >file and application evaluates and transfers data. But I will be thankful >for any method to get the data to Access or Excel, including using macros or >manually resaving the file in some compatible format. >Thanks in advance. > Denis.
John Nurick [Access MVP] Please do not respond by email, but to the newgroup.
|
Tue, 06 Jul 2004 15:10:44 GMT |
|
 |
John Nuric #4 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Denis, Most of this dual-alphabet stuff is beyond my expertise. Let's hope someone in one of the Russian groups comes to the rescue. More comments inline. Quote:
>Unfortunately converting to plain text won't work (Russian text gets screwed >up). >About the files, some come only in Russian, some come mixed, but all of them >are in Unicode.
When I tried saving your rtf sample as text with Word XP, it offered a couple of 1-byte encodings that seemed to preserve the cyrillic characters, though as I don't have a 1-byte cyrillic font I couldn't make sure. If this is the case, you should be able to import to Access, formatting the fields with a 1-byte cyrillic font (they contain the roman characters too, don't they?) with the same encoding. Also, I feel there ought to be some way of getting Word to save the document as a Unicode (2-byte) text file, but don't know whether or what it might be. Quote: >Tried selecting everything, copying to clipboard and posting it to Excel, >that almost worked. >2 problems: >1. Excel converted some data to Date format. Example: I had 4/9 when pasted >to Excel became: 5-Apr >How do I stop it?
AFAIK you have to specify that the column is text when you import it, or maybe you format the cells in the column in question as Text before you import the stuff. Experiment and you'll find out. Another approach - for Excel - is to put an apostrophe at the beginning of the field to force it to be interpreted as text and not as a date. Quote: >2. Not able to compute totals and use formulas on most fields, even those >that are currencies. Went to formatting and tried to convert to number or >currency, didn't seem to work.
The RTF sample uses commas for decimal points. If your Windows regional settings are for a zone that uses periods, Excel will interpret the numbers in the sample as text. (e.g. 97,30 will go into the cell as the text "97,30", not as the number 97.30000000). Subsequent changes in the cell format won't affect this; you have to re-enter the number with the correct decimal symbol. This problem should disappear if you change regional settings to Russia or somewhere else that uses a comma for the decimal point. Quote: >Kind of solved the second problem by after pasting to Excel copying again >and pasting it in a table in Access, by default it converted almost all >fields to text, so changed the appropriate fields to number and currency >types, saved, then exported the table in excel format, now all the >computations work, but this is such a pain in the {*filter*}procedure and there >is still the first problem (date formatting). >If you guys know how to solve these problems, please let me know. >I am attaching a sample rtf file.
The sample appears to have been produced with no concern for anything but the visible layout. The tab stops and numbers of tabs are inconsistent from line to line, which means that corresponding items in successive lines may actually be imported into different fields: another lot of problems in addition to the dual fonts. BTW, is this a really representative sample? I observe that the only cyrillic characters in it are in (a) header and footer (b) the first field. The header and footer can be handled separately - must be, if you're using Access as a data store. And surely the first field could be translated or transliterated with a simple search-and-replace in Word. IOW, is it really necessary to have both roman and cyrillic in your tables? HTH John Nurick [Access MVP] Please do not respond by email, but to the newgroup.
|
Wed, 07 Jul 2004 15:05:56 GMT |
|
 |
Cindy Meister -WordMVP #5 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis, Quote: > I need to get a tab separated data in rtf or doc file to Access or Excel or > just be able to process it in vb code.
Have you tried extracting it using the TextStream object of the FileSystemObject? Then ADO or DAO to put it into Access or Excel? Cindy Meister INTER-Solutions, Switzerland http://homepage.swissonline.ch/cindymeister http://www.mvps.org/word http://go.compuserve.com/MSOfficeForum This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail :-)
|
Fri, 09 Jul 2004 18:52:52 GMT |
|
 |
John Nuric #6 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
On Mon, 21 Jan 2002 11:52:52 +0100, Cindy Meister -WordMVP- Quote:
>> I need to get a tab separated data in rtf or doc file to Access or Excel or >> just be able to process it in vb code. >Have you tried extracting it using the TextStream object of the >FileSystemObject? Then ADO or DAO to put it into Access or Excel?
Hi Cindy, Am I missing something? I didn't think TextStream could parse .doc or .rtf files. John Nurick [Access MVP] Please do not respond by email, but to the newgroup.
|
Sat, 10 Jul 2004 05:44:23 GMT |
|
 |
Deni #7 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hello Cindy, I took a look at the TextStream object, not sure how will it help me. Doesn't it just allow to transfer data between files? Could you be more specific on its usage? If you could paste a sample code that would be great. Sorry for the dumb questions, I am new at this. Thanks. Denis.
Quote: > Hi Denis, > > I need to get a tab separated data in rtf or doc file to Access or Excel or > > just be able to process it in vb code. > Have you tried extracting it using the TextStream object of the > FileSystemObject? Then ADO or DAO to put it into Access or Excel? > Cindy Meister > INTER-Solutions, Switzerland > http://homepage.swissonline.ch/cindymeister > http://www.mvps.org/word > http://go.compuserve.com/MSOfficeForum > This reply is posted in the Newsgroup; please post any follow question or > reply in the newsgroup and not by e-mail :-)
|
Sun, 11 Jul 2004 01:22:45 GMT |
|
 |
Cindy Meister -WordMVP #8 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis, Quote: > I took a look at the TextStream object, not sure how will it help me. > Doesn't it just allow to transfer data between files? Could you be more > specific on its usage? If you could paste a sample code that would be great. > Sorry for the dumb questions, I am new at this.
Ah. Since you post in Developer newsgroups, I assumed you had some experience. For Word-related questions, there is a group where we figure one has less knowledge, and so answer at a different level: word.vba.beginners :-) Assume you have a text file like this: Name;Country "Meister";"Switzerland" "Bush";"USA" The following extracts the lines of data, one at a time. You can transfer these directly into another program using ADO or DAO; you can put them into an array; or use any other technique to process them. You can separate such a string of data into separate elements (fields) using the SPLIT function (this assumes you're using VB6). Sub ExtractTextRecs() Dim fso As FileSystemObject Dim ts As TextStream Dim sz As String Set fso = New FileSystemObject Set ts = fso.OpenTextFile("C:\tsb\DelimText.txt", ForReading, False) ts.SkipLine 'first line is field names Do sz = ts.ReadLine Debug.Print sz Loop Until ts.AtEndOfStream End Sub Cindy Meister INTER-Solutions, Switzerland http://homepage.swissonline.ch/cindymeister http://www.mvps.org/word http://go.compuserve.com/MSOfficeForum This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail :-)
|
Sun, 11 Jul 2004 18:37:05 GMT |
|
 |
Deni #9 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Cindy, Thanks for the reply, Another dumb question. The code works fine for a text file. What I have is an rtf file in Unicode. It is not possible to convert it to text because some of the words are in Russian, when converted to text it gets screwed up. I tried opening this rtf file using your code in both ASCII and Unicode (using TristateTrue constant), when used ASCII I get screwed up text, when used Unicode all I get is a bunch of question marks. What should I do to make it work? Denis.
Quote: > Hi Denis, > > I took a look at the TextStream object, not sure how will it help me. > > Doesn't it just allow to transfer data between files? Could you be more > > specific on its usage? If you could paste a sample code that would be great. > > Sorry for the dumb questions, I am new at this. > Ah. Since you post in Developer newsgroups, I assumed you had some
experience. For Word-related questions, Quote: > there is a group where we figure one has less knowledge, and so answer at
a different level: word.vba.beginners Quote: > :-) > Assume you have a text file like this: > Name;Country > "Meister";"Switzerland" > "Bush";"USA" > The following extracts the lines of data, one at a time. You can transfer
these directly into another program Quote: > using ADO or DAO; you can put them into an array; or use any other
technique to process them. You can separate Quote: > such a string of data into separate elements (fields) using the SPLIT
function (this assumes you're using VB6). Quote: > Sub ExtractTextRecs() > Dim fso As FileSystemObject > Dim ts As TextStream > Dim sz As String > Set fso = New FileSystemObject > Set ts = fso.OpenTextFile("C:\tsb\DelimText.txt", ForReading, False) > ts.SkipLine 'first line is field names > Do > sz = ts.ReadLine > Debug.Print sz > Loop Until ts.AtEndOfStream > End Sub > Cindy Meister > INTER-Solutions, Switzerland > http://homepage.swissonline.ch/cindymeister > http://www.mvps.org/word > http://go.compuserve.com/MSOfficeForum > This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by Quote: > e-mail :-)
|
Mon, 12 Jul 2004 02:09:53 GMT |
|
 |
Cindy Meister -WordMVP #10 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Hi Denis, Sorry, I can't help you there. I don't know how to convert RTF to Unicode text, nor read it with VB. You might try in the word.international.features group if someone can tell you how to extract the Unicode specifically. FWIW, compared to ASCII text, unicode symbols require two ASCII characters (looked at in the old Windows WRITE RTF editor). I expect that's where the question marks come from - you have to find a way to get the value for the combination of the two and "translate" it, somehow... Quote: > The code works fine for a text file. > What I have is an rtf file in Unicode. It is not possible to convert it to > text because some of the words are in Russian, when converted to text it > gets screwed up. > I tried opening this rtf file using your code in both ASCII and Unicode > (using TristateTrue constant), when used ASCII I get screwed up text, when > used Unicode all I get is a bunch of question marks. > What should I do to make it work?
Cindy Meister INTER-Solutions, Switzerland http://homepage.swissonline.ch/cindymeister http://www.mvps.org/word http://go.compuserve.com/MSOfficeForum This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail :-)
|
Mon, 12 Jul 2004 18:44:59 GMT |
|
 |
John Nuric #11 / 11
|
 Getting Tab separated data from Word to Access, Excel or simply process it from VB
Since my last post I've asked a question and experimented a bit. Results seem to be: 1) Properly installed, Word XP can export the text of at least some documents as plain Unicode text. (File|Save As; Plain Text; Other encoding; Unicode). Here's the command as captured by the macro recorder: ActiveDocument.SaveAs FileName:="car1.17priceD2.txt", FileFormat:= _ wdFormatText, LockComments:=False, Password:="", AddToRecentFiles:=True, _ WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _ SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _ False, Encoding:=1200, InsertLineBreaks:=False, AllowSubstitutions:=False _ , LineEnding:=wdCRLF 2) Access XP can read and import data from Unicode textfiles into text fields while retaining the Unicode. (I just used Get External Data to import your sample file into a table with just one 255-character wide text field.) So in principle what you want is possible. All that remains is the really {*filter*} stuff of extracting the data from an irregularly formatted word processor file and parsing it into the right fields. Possibly the way to go would be to use the VBScript TextStream and RegExp objects (usable in VBA code if you set references to the appropriate libraries); I think these will give you more flexibility than trying to do it all in VBA with InStr() and Mid(). Hope this helps!
Quote: >Denis, >Most of this dual-alphabet stuff is beyond my expertise. Let's hope >someone in one of the Russian groups comes to the rescue. More >comments inline.
>>Unfortunately converting to plain text won't work (Russian text gets screwed >>up). >>About the files, some come only in Russian, some come mixed, but all of them >>are in Unicode. >When I tried saving your rtf sample as text with Word XP, it offered a >couple of 1-byte encodings that seemed to preserve the cyrillic >characters, though as I don't have a 1-byte cyrillic font I couldn't >make sure. If this is the case, you should be able to import to >Access, formatting the fields with a 1-byte cyrillic font (they >contain the roman characters too, don't they?) with the same encoding. >Also, I feel there ought to be some way of getting Word to save the >document as a Unicode (2-byte) text file, but don't know whether or >what it might be. >>Tried selecting everything, copying to clipboard and posting it to Excel, >>that almost worked. >>2 problems: >>1. Excel converted some data to Date format. Example: I had 4/9 when pasted >>to Excel became: 5-Apr >>How do I stop it? >AFAIK you have to specify that the column is text when you import it, >or maybe you format the cells in the column in question as Text before >you import the stuff. Experiment and you'll find out. Another approach >- for Excel - is to put an apostrophe at the beginning of the field to >force it to be interpreted as text and not as a date. >>2. Not able to compute totals and use formulas on most fields, even those >>that are currencies. Went to formatting and tried to convert to number or >>currency, didn't seem to work. >The RTF sample uses commas for decimal points. If your Windows >regional settings are for a zone that uses periods, Excel will >interpret the numbers in the sample as text. (e.g. 97,30 will go into >the cell as the text "97,30", not as the number 97.30000000). >Subsequent changes in the cell format won't affect this; you have to >re-enter the number with the correct decimal symbol. This problem >should disappear if you change regional settings to Russia or >somewhere else that uses a comma for the decimal point. >>Kind of solved the second problem by after pasting to Excel copying again >>and pasting it in a table in Access, by default it converted almost all >>fields to text, so changed the appropriate fields to number and currency >>types, saved, then exported the table in excel format, now all the >>computations work, but this is such a pain in the {*filter*}procedure and there >>is still the first problem (date formatting). >>If you guys know how to solve these problems, please let me know. >>I am attaching a sample rtf file. >The sample appears to have been produced with no concern for anything >but the visible layout. The tab stops and numbers of tabs are >inconsistent from line to line, which means that corresponding items >in successive lines may actually be imported into different fields: >another lot of problems in addition to the dual fonts. >BTW, is this a really representative sample? I observe that the only >cyrillic characters in it are in > (a) header and footer > (b) the first field. >The header and footer can be handled separately - must be, if you're >using Access as a data store. And surely the first field could be >translated or transliterated with a simple search-and-replace in Word. >IOW, is it really necessary to have both roman and cyrillic in your >tables? >HTH >John Nurick [Access MVP] >Please do not respond by email, but to the newgroup.
John Nurick [Access MVP] Please do not respond by email, but to the newgroup.
|
Tue, 13 Jul 2004 06:06:53 GMT |
|
|
|