Reading Text files with extended ASCII characters 
Author Message
 Reading Text files with extended ASCII characters

Hello,

I have reviewed the mailing list ruby-talk and I found some interesting
stuff, but I haven't seen the answer to reading (and creating) text
files with extended ASCII characters.

For example when I try to read a text file with the word:
D?n

I get:
D\366n

If I save this and read it in again I get:
D\\366n

Oddly many normal punctuation characters are also exscaped, these
include (at least):
" #

I assume I am reading the text file incorrectly or not converting it to
something because when I read a line of a file I also get:
"what-ever-it-says\n"

I am reading and writing the file with the following commands:

File.delete ("fileout.html")  if FileTest.exist?("fileout.html")
File.open ('filein.html', "r") do |f_in|
  f_in.each_line do |line|
    reformated = line.dump      
    reformated = line.sub ( /"(.*)\\n"/,'\1')
    reformated = line.gsub (/\\([#"])/, '\1')
    File.open ("fileout.html","a") {|f_out|f_out.print reformated }
  end
end            

Is there a convient way to keep the extened ASCII characters (or convert
them to ö and is there a convient way to convert this ö and
convert it into an "?" and save it into a text file?

I look forward to guidance.

Thanks,

Bill
--
William H. Tihen, Technology Director
The American School in Switzerland (TASIS)

http://www.*-*-*.com/



Sun, 07 Aug 2005 01:09:52 GMT  
 Reading Text files with extended ASCII characters

Quote:

> Hello,

> I have reviewed the mailing list ruby-talk and I found some interesting
> stuff, but I haven't seen the answer to reading (and creating) text
> files with extended ASCII characters.

> For example when I try to read a text file with the word:
> D?n

<snip>

seems to work :

/tmp > ruby -e "f = File.open 'foo', 'w'; f.puts 'D?n'"

/tmp > cat foo
D?n

/tmp > ruby -e "puts (IO.readlines 'foo')"
D?n

/tmp > ruby -e "lines = IO.readlines 'foo'; f = File.open 'foo', 'w'; f.puts lines"

/tmp > cat foo
D?n

are you in dos?  perhaps trying IO.binmode will help then...

when you say

Quote:
> I get:
> D\366n

> If I save this and read it in again I get:
> D\\366n

i assume you mean in irb?  if so, this is simply because the display method of
strings displays ? (and other extended chars) as their octal escaped equivs -
the byte value in the file is still the same, which you can see using cat or
opening the file from a test editor.

-a

--

 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================



Sun, 07 Aug 2005 02:02:42 GMT  
 Reading Text files with extended ASCII characters
Thanks for the help Ara, I guess I just didn't understand the file
operations.  Now I will look to solve saving &ouml; &#264; or 0xF6 as an
actual "?" into a text file.  

I can't seem to get unpack to convert "=264" into "?".

Any ideas

Quote:


> > Hello,

> > I have reviewed the mailing list ruby-talk and I found some interesting
> > stuff, but I haven't seen the answer to reading (and creating) text
> > files with extended ASCII characters.

> > For example when I try to read a text file with the word:
> > D?n

> <snip>

> seems to work :

> /tmp > ruby -e "f = File.open 'foo', 'w'; f.puts 'D?n'"

> /tmp > cat foo
> D?n

> /tmp > ruby -e "puts (IO.readlines 'foo')"
> D?n

> /tmp > ruby -e "lines = IO.readlines 'foo'; f = File.open 'foo', 'w'; f.puts lines"

> /tmp > cat foo
> D?n

> are you in dos?  perhaps trying IO.binmode will help then...

> when you say

> > I get:
> > D\366n

> > If I save this and read it in again I get:
> > D\\366n

> i assume you mean in irb?  if so, this is simply because the display method of
> strings displays ? (and other extended chars) as their octal escaped equivs -
> the byte value in the file is still the same, which you can see using cat or
> opening the file from a test editor.

> -a

> --

>  ====================================
>  | Ara Howard
>  | NOAA Forecast Systems Laboratory
>  | Information and Technology Services
>  | Data Systems Group
>  | R/FST 325 Broadway
>  | Boulder, CO 80305-3328

>  | Phone:  303-497-7238
>  | Fax:    303-497-7259
>  ====================================



Sun, 07 Aug 2005 07:42:31 GMT  
 Reading Text files with extended ASCII characters

Quote:

> Thanks for the help Ara, I guess I just didn't understand the file
> operations.  Now I will look to solve saving &ouml; &#264; or 0xF6 as an
> actual "?" into a text file.

> I can't seem to get unpack to convert "=264" into "?".

> Any ideas

how about

/tmp > ruby -e '255.times {|n| printf "%c%s", n, (n == 39 ? "\n" : " ")}'
` a b c  e

o p q r s t u v w x y

L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~  €   ? ?
? | ? a ? ? - ? 2 3 ? ? 1
o ? ? ? ? ? ? ? ? ? ? ? ? ? ? D ?
? ? ? ? ? Y T ? a ? ? ? ? ? ?
? ? e ? ? ? ? ? ? y t /tmp > 1;2c

(this may hose your terminal but you get the idea)

-a

ps.  running

  /tmp > ruby -e '255.times {|n| printf "%d : %c\n", n, n}' | less

and searching for '?' you'll see that it's

  246 : ?

not 264.

hope this helps.

Quote:



> > > Hello,

> > > I have reviewed the mailing list ruby-talk and I found some interesting
> > > stuff, but I haven't seen the answer to reading (and creating) text
> > > files with extended ASCII characters.

> > > For example when I try to read a text file with the word:
> > > D?n

> > <snip>

> > seems to work :

> > /tmp > ruby -e "f = File.open 'foo', 'w'; f.puts 'D?n'"

> > /tmp > cat foo
> > D?n

> > /tmp > ruby -e "puts (IO.readlines 'foo')"
> > D?n

> > /tmp > ruby -e "lines = IO.readlines 'foo'; f = File.open 'foo', 'w'; f.puts lines"

> > /tmp > cat foo
> > D?n

> > are you in dos?  perhaps trying IO.binmode will help then...

> > when you say

> > > I get:
> > > D\366n

> > > If I save this and read it in again I get:
> > > D\\366n

> > i assume you mean in irb?  if so, this is simply because the display method of
> > strings displays ? (and other extended chars) as their octal escaped equivs -
> > the byte value in the file is still the same, which you can see using cat or
> > opening the file from a test editor.

> > -a

> > --

> >  ====================================
> >  | Ara Howard
> >  | NOAA Forecast Systems Laboratory
> >  | Information and Technology Services
> >  | Data Systems Group
> >  | R/FST 325 Broadway
> >  | Boulder, CO 80305-3328

> >  | Phone:  303-497-7238
> >  | Fax:    303-497-7259
> >  ====================================

--
 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================


Sun, 07 Aug 2005 12:00:43 GMT  
 Reading Text files with extended ASCII characters

Quote:
> Thanks for the help Ara, I guess I just didn't understand the file
> operations.  Now I will look to solve saving &ouml; &#264; or 0xF6 as an
> actual "?" into a text file.  

> I can't seem to get unpack to convert "=264" into "?".

Is that using the quoted-printable format? If so, quoted-printable uses
hex values, so "=264" is "&4" because 26hex is "&"..

Sam



Sun, 07 Aug 2005 13:53:42 GMT  
 Reading Text files with extended ASCII characters
Thanks again for the tip, it everything works well now -- I might have
to clean up the code, but it all works.

Bill
--
William H. Tihen, Technology Director
The American School in Switzerland (TASIS)

http://www.tasis.ch/



Mon, 08 Aug 2005 19:58:02 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Reading fortran text files / Parsing ascii files/ Help!!

2. Reading backslashed characters (in ASCII) from a file

3. Reading an ascii text file

4. how to find extended ascii characters?

5. Extended ASCII Characters in CW

6. Help printing extended ASCII character set

7. How can I process Extended ASCII Characters?

8. REQUEST: Printing Extended ASCII characters

9. [file] fails on some files with non-ASCII characters in their names

10. Reading ASCII Characters

11. reading ascii file and creating topspeed file in application designer - gasman.zip (0/2)

12. Exporting Clarion DAT file to ASCII(Text)File

 

 
Powered by phpBB® Forum Software