Detecting blank lines in text file read 
Author Message
 Detecting blank lines in text file read

Hello!

I'm planning to write a routine to read data (numbers) from an ordinary
ASCII
file, but I would like to allow for the presence of blank lines and
comment lines.

My original idea was to read each line to an internal file, check if the
line is a comment
or blank, and then read the variables from the internal file.

   CHARACTER*256      C_LINE

    READ(INUNIT, '(A)') C_LINE

It's easy to search for a predefined comment character (# or !, for
example),
and clean-up the line using

      IDXCOMM = INDEX(C_LINE,'#')
and
      C_LINE = C_LINE(1:IDXCOMM-1)

but the above does not work if the line is blank, because any line
without the
comment character will result in IDXCOMM = 0, and I wouldn't know if the

line has only data, or nothing at all.

Does anybody know any trick.  I was trying to avoid looping thru each
character
of the input line and checking what they are. Some of my files are
really large.
There must be a smarter way.

Any hint is greatly appreciated.

Thanks,

Dalmo Vieira

Center for Computational Hydroscience and Engineering
The University of Mississippi



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read


Quote:
> I'm planning to write a routine to read data (numbers) from an ordinary
> ASCII file, but I would like to allow for the presence of blank lines
> and comment lines.

> My original idea was to read each line to an internal file, check if the
> line is a comment or blank, and then read the variables from the
> internal file.

>    CHARACTER*256      C_LINE
>    READ(INUNIT, '(A)') C_LINE

> It's easy to search for a predefined comment character (# or !, for
> example), and clean-up the line using

>       IDXCOMM = INDEX(C_LINE,'#')
> and
>       C_LINE = C_LINE(1:IDXCOMM-1)

> but the above does not work if the line is blank, because any line
> without the comment character will result in IDXCOMM = 0, and I wouldn't
> know if the line has only data, or nothing at all.

        I don't recall what the  Standard  says,  but  if you want to be
    sure  you don't have "left over" characters from a previous read  in
    C_LINE, initialize it to all blanks  prior  to  each  read.   _That_
    done, the test for all blanks is trivial.

        C_LINE = ' '
        READ (INUNIT, '(A)') C_LINE

        IF (C_LINE .EQ. ' ') THEN          ! Test for all blanks
           <skip the blank line>
        ELSE
           IDX1 = INDEX (C_LINE, "!")
           IDX2 = INDEX (C_LINE, "#")
           IDXCOMM = MAX (IDX1, IDX2)
           IF (IDXCOMM .GT. 0) THEN
              IF (IDXCOMM .GT. 1) THEN
                 C_LINE (IDXCOMM:) = ' '    ! Just blank the trailing comment
              ELSE
                 <skip the line>
              ENDIF
           ENDIF
        ENDIF

Quote:
> Does anybody know any trick.  I was trying to avoid looping thru each
> character of the input line and checking what they are.  Some of my
> files are really large.  There must be a smarter way.

        See if the above suits you...

            -Ken
--

 SLAC, P.O.Box 4349, MS 46  |  DECnet:   45537::FAIRFIELD (45537=SLACVX)
 Stanford, CA   94309       |  Voice:    650-926-2924    FAX: 650-926-3515
 -------------------------------------------------------------------------
 These opinions are mine, not SLAC's, Stanford's, nor the DOE's...



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read

Quote:



> > I'm planning to write a routine to read data (numbers) from an ordinary
> > ASCII file, but I would like to allow for the presence of blank lines
> > and comment lines.

> > My original idea was to read each line to an internal file, check if the
> > line is a comment or blank, and then read the variables from the
> > internal file.

> >    CHARACTER*256      C_LINE
> >    READ(INUNIT, '(A)') C_LINE

> > It's easy to search for a predefined comment character (# or !, for
> > example), and clean-up the line using

> >       IDXCOMM = INDEX(C_LINE,'#')
> > and
> >       C_LINE = C_LINE(1:IDXCOMM-1)

> > but the above does not work if the line is blank, because any line
> > without the comment character will result in IDXCOMM = 0, and I wouldn't
> > know if the line has only data, or nothing at all.

>         I don't recall what the  Standard  says,  but  if you want to be
>     sure  you don't have "left over" characters from a previous read  in
>     C_LINE, initialize it to all blanks  prior  to  each  read.   _That_
>     done, the test for all blanks is trivial.

Doesn't the following read pad the variable to the defined length (256) with
blanks?  If so, the initialization before the read is unnecessary.  Also, some
compilers support a "pad=" specifier which may achieve the behavior implied
below.

- Show quoted text -

Quote:

>         C_LINE = ' '
>         READ (INUNIT, '(A)') C_LINE

>         IF (C_LINE .EQ. ' ') THEN          ! Test for all blanks
>            <skip the blank line>
>         ELSE
>            IDX1 = INDEX (C_LINE, "!")
>            IDX2 = INDEX (C_LINE, "#")
>            IDXCOMM = MAX (IDX1, IDX2)
>            IF (IDXCOMM .GT. 0) THEN
>               IF (IDXCOMM .GT. 1) THEN
>                  C_LINE (IDXCOMM:) = ' '    ! Just blank the trailing comment
>               ELSE
>                  <skip the line>
>               ENDIF
>            ENDIF
>         ENDIF

> > Does anybody know any trick.  I was trying to avoid looping thru each
> > character of the input line and checking what they are.  Some of my
> > files are really large.  There must be a smarter way.

>         See if the above suits you...

>             -Ken
> --

>  SLAC, P.O.Box 4349, MS 46  |  DECnet:   45537::FAIRFIELD (45537=SLACVX)
>  Stanford, CA   94309       |  Voice:    650-926-2924    FAX: 650-926-3515
>  -------------------------------------------------------------------------
>  These opinions are mine, not SLAC's, Stanford's, nor the DOE's...

--
Gary Scott


http://www.fortranlib.com



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read

Quote:



> > I'm planning to write a routine to read data (numbers) from an ordinary
> > ASCII file, but I would like to allow for the presence of blank lines
> > and comment lines.

> > My original idea was to read each line to an internal file, check if the
> > line is a comment or blank, and then read the variables from the
> > internal file.

> >    CHARACTER*256      C_LINE
> >    READ(INUNIT, '(A)') C_LINE

> > It's easy to search for a predefined comment character (# or !, for
> > example), and clean-up the line using

> >       IDXCOMM = INDEX(C_LINE,'#')
> > and
> >       C_LINE = C_LINE(1:IDXCOMM-1)

> > but the above does not work if the line is blank, because any line
> > without the comment character will result in IDXCOMM = 0, and I wouldn't
> > know if the line has only data, or nothing at all.

>         I don't recall what the  Standard  says,  but  if you want to be
>     sure  you don't have "left over" characters from a previous read  in
>     C_LINE, initialize it to all blanks  prior  to  each  read.   _That_
>     done, the test for all blanks is trivial.

>         C_LINE = ' '
>         READ (INUNIT, '(A)') C_LINE

>         IF (C_LINE .EQ. ' ') THEN          ! Test for all blanks
>            <skip the blank line>
>         ELSE
>            IDX1 = INDEX (C_LINE, "!")
>            IDX2 = INDEX (C_LINE, "#")

If you want to have both ! and # to be begin a comment characters,
then you will need to use something other than MAX to determine the
position of the beginning of the comment, eg.

123.456   ! begin one comment # begin the second comment

will result in IXDCOMM here --^

- Show quoted text -

Quote:
>            IDXCOMM = MAX (IDX1, IDX2)
>            IF (IDXCOMM .GT. 0) THEN
>               IF (IDXCOMM .GT. 1) THEN
>                  C_LINE (IDXCOMM:) = ' '    ! Just blank the trailing comment
>               ELSE
>                  <skip the line>
>               ENDIF
>            ENDIF
>         ENDIF

> > Does anybody know any trick.  I was trying to avoid looping thru each
> > character of the input line and checking what they are.  Some of my
> > files are really large.  There must be a smarter way.

>         See if the above suits you...

>             -Ken
> --

>  SLAC, P.O.Box 4349, MS 46  |  DECnet:   45537::FAIRFIELD (45537=SLACVX)
>  Stanford, CA   94309       |  Voice:    650-926-2924    FAX: 650-926-3515
>  -------------------------------------------------------------------------
>  These opinions are mine, not SLAC's, Stanford's, nor the DOE's...

Jerry . . .

--

Custom Solutions              http://www.cs-software.com/
209 Bayberry Run
Summerville, SC  29485-8778    Your source for discounted
Voice:  (843) 871 9081           Fortran compilers and
Fax:    (843) 873 8626             related software



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read


Quote:

[...]
>>         I don't recall what the  Standard  says,  but  if you want to be
>>     sure  you don't have "left over" characters from a previous read  in
>>     C_LINE, initialize it to all blanks  prior  to  each  read.   _That_
>>     done, the test for all blanks is trivial.

>>         C_LINE = ' '
>>         READ (INUNIT, '(A)') C_LINE

>>         IF (C_LINE .EQ. ' ') THEN          ! Test for all blanks
>>            <skip the blank line>
>>         ELSE
>>            IDX1 = INDEX (C_LINE, "!")
>>            IDX2 = INDEX (C_LINE, "#")

> If you want to have both ! and # to be begin a comment characters,
> then you will need to use something other than MAX to determine the
> position of the beginning of the comment, eg.

> 123.456   ! begin one comment # begin the second comment

> will result in IXDCOMM here --^

        Yep, that's what  I  get  for  coding  on  the fly, with someone
    else's  requirements to boot!  :-} It's a shame, it means the  tests
    have to be more explicit, e.g., instead of:

Quote:
>>            IDXCOMM = MAX (IDX1, IDX2)

    use something like this:

              IF (IDX1.GT.0 .AND. IDX2.GT.0) THEN
                 IDXCOMM = MIN (IDX1, IDX2)          
              ELSE                                    
                 IDXCOMM = MAX (IDX1, IDX2)          
              ENDIF                                  

Quote:
>>            IF (IDXCOMM .GT. 0) THEN
>>               IF (IDXCOMM .GT. 1) THEN
>>                  C_LINE (IDXCOMM:) = ' '    ! Just blank the trailing comment
>>               ELSE
>>                  <skip the line>
>>               ENDIF
>>            ENDIF
>>         ENDIF

        -Ken
--

 SLAC, P.O.Box 4349, MS 46  |  DECnet:   45537::FAIRFIELD (45537=SLACVX)
 Stanford, CA   94309       |  Voice:    650-926-2924    FAX: 650-926-3515
 -------------------------------------------------------------------------
 These opinions are mine, not SLAC's, Stanford's, nor the DOE's...


Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read


Quote:

[...]
> Doesn't the following read pad the variable to the defined length (256) with
> blanks?  If so, the initialization before the read is unnecessary.  Also, some
> compilers support a "pad=" specifier which may achieve the behavior implied
> below.

>>         C_LINE = ' '
>>         READ (INUNIT, '(A)') C_LINE

        Well, I'll defer to the  Standard's  experts  on that one.  It's
    precisely  because I _don't_ know under exactly which conditions the
    line will, or will not be padded (and whether  we're  talking  about
    F77  or  F90/95)  that  I  added  the initialization of C_LINE in my
    response.  Call it defensive programming.  :-)

        -Ken
--

 SLAC, P.O.Box 4349, MS 46  |  DECnet:   45537::FAIRFIELD (45537=SLACVX)
 Stanford, CA   94309       |  Voice:    650-926-2924    FAX: 650-926-3515
 -------------------------------------------------------------------------
 These opinions are mine, not SLAC's, Stanford's, nor the DOE's...



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read

Quote:




> > Doesn't the following read pad the variable to the defined length
> > (256) with blanks?  If so, the initialization before the read is
> > unnecessary.
> >>         C_LINE = ' '
> >>         READ (INUNIT, '(A)') C_LINE

Yes, C_LINE is padded with blanks in at least f90 and f95.  In f77,
there is at least some question.  (Though the underscore character
isn't legal f77 anyway, per a separate recent thread).

Quote:
> > Also, some compilers support a "pad=" specifier which may achieve
> > the behavior implied below.

There are some "issues" here.  They depend on the version of the
standard.  But I'll note that the issue is not whether or not C_LINE
is blank-padded.  That's not really what PAD= does.  The issue is
whether the READ is legal at all.  In *ALL* versions of the standard,
if the READ is legal, then C_LINE is completely defined by the READ.
This definition may include trailing blank padding in some cases.
It will never depend on what value C_LINE had before the READ.

However, there are some situations in which the READ could be
interpreted as being nonstandard.  In those cases, the compiler
could "do anything", quite plausibly including keeping parts of
C_LINE from its previous value.

F77 just says that the record read has to be big enough for what you
are reading from it (I won't bother to copy the exact words).  If the
record isn't big enough, then its an error (and the compiler can do
whatever it wants).  Some early f77 compilers considered this an error.
This can make things like interactive input pretty much completely
useless (unless you think that telling the user to explicitly type
enough trailing blanks to fill every response out to at least some
specified length is useful).  Pretty much all recent f77 compilers
implicitly pad input records with blanks in order to make things
useful.  This can be justified in either of two ways
  1. The code is nonstandard, so the compiler can do whatever it
     wants, and it chooses to do something useful.
or
  2. The standard does not specify the physical representation of
     a file, so the processor is free to claim, for example, that
     the 2 bytes "y" and "<return>" are a compressed physical
     representation of an arbitrarily long record consisting of a
     "y" and a bunch of blanks.
Although either of these arguments allows a compiler to do "the
right thing", neither of them mandates it.

F90 mandates the "right" behavior and provides the PAD= specifier
in case you really wanted it to work "wrong".  (Well, ok, the
standard doesn't use my prejudicial terminology).  The default
is PAD="YES", which means that the record is implicitly padded
out with blanks as needed.  Note that PAD="NO" does *NOT* mean
to partially define C_LINE.  The question is not whether C_LINE
is padded, but whether the record is implicitly padded.

If the record is too short and you have specified PAD="NO", then the
READ is just illegal and we are back to the f77 situation where the
behavior depends on the compiler.  I'd hope that the compiler would
consider it an error and abort or take IOSTAT=/ERR= action as
applicable, but the standard doesn't quite demand that (much as I wish
it did) because it is left up to the compiler what things count as
errors and trigger iostat=/err= versus what things are just illegal
and might cause other behavior.

But hopefully, one is not so silly as to explicitly specify PAD="NO"
unless you really want it (and have some idea of how you are going
to deal with its system dependencies).  And you don't get PAD="NO"
unless you go out of your way to ask for it........except...

There is one "feature" of the f90 standard in this area.  I think
of it as simply a bug, but I've had others argue that it was an
intentional feature.  (Or perhaps I'm confusing this with a
simillar situation for the BN edit descriptor - I also think of
that one as having had a bug in the f77 standard - perhaps that's
the one where someone was arguing otherwise).  In any case...

The words saying that PAD="YES" is the default are in the section
on the OPEN statement.  If you use an OPEN statement, it is
unambiguous that the default is "YES".  Alas (in my opinion)
this does not make it perfectly clear what happens in cases
where no OPEN statement was involved.  The cases where no OPEN
statement is involved are

1. Internal reads.
2. Reads from a pre-connected unit.  (You just start reading from
   unit n wihout ever having opened it).  (And for current purposes,
   I'll ignore the fine distinctions about whether typical implementations
   strictly meet the standard's definition of preconnection).
3. Reads from * (which is technically preconnected, but I'll mention
   it separately anyway, as its a very special case).

So these cases are still questionable.

F95 fixes the internal read case by explicitly specifying that the
default is also PAD="YES" there.  I don't know whether any f90
compilers are so....um...helpful...as to do internal files "wrong".
I'd hope not, but I'm afraid that a strict reading of the standard
doesn't specify it.  I'd have argued that it was the "obvious" intent,
but I've seen too many cases where what seemed obvious to me didn't
seem nearly as much so to someone else (including cases where I at
least understood the opposing reasoning once it was explained to me).

Alas, although f95 "fixes" the internal file case, it still overlooks
the preconnected case.  Sighhhhhh!  This wasn't noticed until after
f95 was "out."  (Or anyway, I didn't notice it at all until it was
called to my attention, which was after f95 was out).

I was about to say that the preconnected case was specified in the
current f2k draft, but since I can't find it in a quick scan (and I've
spent enough time on this right now), I better not say that.  Not sure
whether I missed it or its not there.

Quote:
>         Well, I'll defer to the  Standard's  experts  on that one.  It's
>     precisely  because I _don't_ know under exactly which conditions the
>     line will, or will not be padded (and whether  we're  talking  about
>     F77  or  F90/95)  that  I  added  the initialization of C_LINE in my
>     response.  Call it defensive programming.  :-)

I can't argue with that.  But I will note that if the READ doesn't
work "as desired" then the initialization before the READ may not
necessarily help, because the READ might just{*filter*}it up anyway.

--
Richard Maine



Wed, 18 Jun 1902 08:00:00 GMT  
 Detecting blank lines in text file read
I do it in the following way:

   CHARACTER*256      C_LINE

   READ(INUNIT, '(A)') C_LINE

   if (len_trim(trim(C_LINE)) == 0 ) then
      <a blank line>
   else
      <nonblank line)
   end if

It works fine in my code.

Si Yuan



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. adding blank lines to text file

2. tell me how to read input from file with comments and blank line in it

3. How to read blank lines in a file?

4. Reading a file containing blank lines

5. how to read text files line by line?

6. Read a text file line by line

7. How do I read lines of Text from Text File and add them to Array

8. Detecting end-of-line when reading a FILE?

9. reading lines of text in files.

10. How to read the x last lines of a text file

11. How to read lines from a text file?

12. Reading more than one line of input from a text file

 

 
Powered by phpBB® Forum Software