
Getting Tab separated data from Word to Access, Excel or simply process it from VB
Denis,
Most of this dual-alphabet stuff is beyond my expertise. Let's hope
someone in one of the Russian groups comes to the rescue. More
comments inline.
Quote:
>Unfortunately converting to plain text won't work (Russian text gets screwed
>up).
>About the files, some come only in Russian, some come mixed, but all of them
>are in Unicode.
When I tried saving your rtf sample as text with Word XP, it offered a
couple of 1-byte encodings that seemed to preserve the cyrillic
characters, though as I don't have a 1-byte cyrillic font I couldn't
make sure. If this is the case, you should be able to import to
Access, formatting the fields with a 1-byte cyrillic font (they
contain the roman characters too, don't they?) with the same encoding.
Also, I feel there ought to be some way of getting Word to save the
document as a Unicode (2-byte) text file, but don't know whether or
what it might be.
Quote:
>Tried selecting everything, copying to clipboard and posting it to Excel,
>that almost worked.
>2 problems:
>1. Excel converted some data to Date format. Example: I had 4/9 when pasted
>to Excel became: 5-Apr
>How do I stop it?
AFAIK you have to specify that the column is text when you import it,
or maybe you format the cells in the column in question as Text before
you import the stuff. Experiment and you'll find out. Another approach
- for Excel - is to put an apostrophe at the beginning of the field to
force it to be interpreted as text and not as a date.
Quote:
>2. Not able to compute totals and use formulas on most fields, even those
>that are currencies. Went to formatting and tried to convert to number or
>currency, didn't seem to work.
The RTF sample uses commas for decimal points. If your Windows
regional settings are for a zone that uses periods, Excel will
interpret the numbers in the sample as text. (e.g. 97,30 will go into
the cell as the text "97,30", not as the number 97.30000000).
Subsequent changes in the cell format won't affect this; you have to
re-enter the number with the correct decimal symbol. This problem
should disappear if you change regional settings to Russia or
somewhere else that uses a comma for the decimal point.
Quote:
>Kind of solved the second problem by after pasting to Excel copying again
>and pasting it in a table in Access, by default it converted almost all
>fields to text, so changed the appropriate fields to number and currency
>types, saved, then exported the table in excel format, now all the
>computations work, but this is such a pain in the {*filter*}procedure and there
>is still the first problem (date formatting).
>If you guys know how to solve these problems, please let me know.
>I am attaching a sample rtf file.
The sample appears to have been produced with no concern for anything
but the visible layout. The tab stops and numbers of tabs are
inconsistent from line to line, which means that corresponding items
in successive lines may actually be imported into different fields:
another lot of problems in addition to the dual fonts.
BTW, is this a really representative sample? I observe that the only
cyrillic characters in it are in
(a) header and footer
(b) the first field.
The header and footer can be handled separately - must be, if you're
using Access as a data store. And surely the first field could be
translated or transliterated with a simple search-and-replace in Word.
IOW, is it really necessary to have both roman and cyrillic in your
tables?
HTH
John Nurick [Access MVP]
Please do not respond by email, but to the newgroup.