
File Type (Binary or ASCII)
Quote:
> Never thought about reading all the file, then deciding if it is human
> readable by the percentage that was human readable. Not a bad idea.
> As another thought, you could search the file for common words ->
> A, An, the, Me, I, etc.....
This gets further into a definition problem: what is meant by ASCII, and/or
why does your program want to know? Not all files that contain only
ASCII-defined characters are human readable.
Several compression techniques create datastreams that contain only ASCII
characters, but it isn't human readable. And, does it matter if the
uncompressed file would be ASCII or binary?
A .csv file is human readable (if you know the application), but may not
contain any words.
I get spam every day that's human readable, but not to _this_ human - I only
recognize English and a little French. And speaking of localization, lots of
characters > 127 are used in some languages.
Want to handle DBCS? Unicode?
So, it depends on why the program needs to know. What works for one purpose
may not work for others. Checking the file extension isn't at all
fool-proof, but may be a better indication of the purpose of the file unless
you do a lot of checking on the whole file. And, much less overhead if you
don't have to open the file.
Just my 2[cents - no ASCII cent sign!]
-jcf