Problem with strtok() and new line 
Author Message
 Problem with strtok() and new line

I'm using strtok() to break up a text file, and pass each word to a
linked list. I find, however, that when I reach a new line, the function
starts sending junk to the list. I think the list code is OK, because if
I pass strings to it manually, it works perfectly, and in any case, it's
just standard LL code. The problem only occurs when I reach a new line,
and it looks like the strings aren't getting terminated correctly after
this. However, attempts to manually add \0 to the end (and remove any \n
characters) don't help.

Perhaps using fgets() is the problem, but I can't see what's going
wrong. Any advice would be appreciated. (I know some manuals warn
against strtok(), but it I think it should work for this sort of job.)

#define SEPARATORS " .,?\"\n"

while ((fgets(line, BUFFER, pFile) != NULL) && (!error))
{
        //Move to next line if current line empty
        if (strlen(line)<= 1) continue;

        //Remove newline from end of string
        if (*(line + (strlen(line) - 1)) == '\n')
                *(line + (strlen(line) - 1)) = '\0';

        word = strtok(line, SEPARATORS);

        //If page delimiter reached, increment page counter, move to next
line
        //Assumes page delimiter is always on new line
        if (!strcmp(word, "<END>"))
        {
                PageNumber++;
                printf("Page count: %d\n", PageNumber);
                continue;       //Go to next line
        }

        //Pass head, first word, page number to list
        IndexAppend(&index, word, PageNumber)

        //Loop through rest of line until end
        while(word != NULL)
        {
                word = strtok(NULL, SEPARATORS);

                if (word == NULL) break;        //Avoid empty lines

                //Pass head, word, page number to list
                IndexAppend(&index, word, PageNumber)
        }

Quote:
}

--
Rohan Parkes
Melbourne
Australia


Sat, 09 Oct 2004 22:37:14 GMT  
 Problem with strtok() and new line

Quote:

> I'm using strtok() to break up a text file, and pass each word to a
> linked list. I find, however, that when I reach a new line, the function
> starts sending junk to the list. I think the list code is OK, because if
> I pass strings to it manually, it works perfectly, and in any case, it's
> just standard LL code. The problem only occurs when I reach a new line,
> and it looks like the strings aren't getting terminated correctly after
> this. However, attempts to manually add \0 to the end (and remove any \n
> characters) don't help.

I'll bet you a sausage that you're not copying the token into freshly
allocated space. You're just pointing into the line buffer at the token
start point instead. Then, when you go round the loop, that data is
trashed.

Solution: allocate some fresh space, and copy the token into it, then
point at that.

Quote:

> Perhaps using fgets() is the problem, but I can't see what's going
> wrong. Any advice would be appreciated. (I know some manuals warn
> against strtok(), but it I think it should work for this sort of job.)

strtok is fine if used properly, but this is a classic way to get
bitten.

<snip>

--

"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton



Sat, 09 Oct 2004 23:23:32 GMT  
 Problem with strtok() and new line

Quote:


> > I'm using strtok() to break up a text file, and pass each word
> > to a linked list. I find, however, that when I reach a new line,
> > the function starts sending junk to the list. I think the list
> > code is OK, because if I pass strings to it manually, it works
> > perfectly, and in any case, it's just standard LL code. The
> > problem only occurs when I reach a new line, and it looks like
> > the strings aren't getting terminated correctly after this.
> > However, attempts to manually add \0 to the end (and remove
> > any \n characters) don't help.

> I'll bet you a sausage that you're not copying the token into
> freshly allocated space. You're just pointing into the line
> buffer at the token start point instead. Then, when you go
> round the loop, that data is trashed.

> Solution: allocate some fresh space, and copy the token into it,
> then point at that.

> > Perhaps using fgets() is the problem, but I can't see what's
> > going wrong. Any advice would be appreciated. (I know some
> > manuals warn against strtok(), but it I think it should work
> > for this sort of job.)

> strtok is fine if used properly, but this is a classic way to
> get bitten.

I think the problem is more fundamental.  In general, there will
be a 'widow' portion left at the end of a parsed line, which has
to be joined onto the data from the next line.  The signal that a
widow may exist is the lack of the \n at the end of the current
buffer, when that widow needs to be moved to the start and the
fgets parameters altered to fill in after it.  Only if there *IS*
a \n at eol can you treat it as just another separator.

The OP would be better off using the input as a stream and picking
off one word at a time.  The buffer then need be no larger than
the largest word, and there is no worry about long lines.  The
routines I used in one application follow:

/* ================================== */
/* Routines for text input and output */
/* ================================== */

static void skipblanks(FILE *f)
{
   int ch;

   while ( (' ' == (ch = getc(f))) || ('\t' == ch) ||
           ('\v' == ch) || ('\f' == ch) || ('\a' == ch) )
      continue;
   ungetc(ch, f);

Quote:
} /* skipblanks */

/* 1------------------1 */

/* The file is assumed to hold no control chars */
/* other than \n \t \v \a and \f.  A blank line */
/* marks a paragraph ending word                */
static int nextword(FILE *f, char *buffer, int max)
{
   int i, ch;

   skipblanks(f);
   if (EOF == (ch = getc(f))) return 0;

   /* Detect paragraph endings as \n\n */
   if ('\n' == ch) {
      skipblanks(f); ch = getc(f);
      if ('\n' == ch) {            /* paragraph ending */
         buffer[0] = buffer[1] = ch;    /* wd = "\n\n" */
         buffer[2] = '\0';
         /* now we have to absorb any more blank lines */
         do {
            skipblanks(f); ch = getc(f);
         } while ('\n' == ch);
         ungetc(ch, f);
         return 1;
      }
   }
   /* now ch holds the first non-blank.  Use all printable */
   if (EOF == ch) return 0;
   if (!isgraph(ch)) {
      fprintf(stderr, "'%c', 0x%x\n", ch, ch);
      error("Invalid character");
   }

   i = 0;
   do {
      buffer[i++] = ch;
      ch = getc(f);
      if (i >= max) {   /* truncate over long words */
         i--;
         break;         /* leaving ch for next word */
      }
   } while (isgraph(ch));

   ungetc(ch, f);       /* save for next word, may be \n */
   buffer[i] = '\0';    /* terminate string */
   return 1;

Quote:
} /* nextword */

--

   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!


Sun, 10 Oct 2004 03:29:37 GMT  
 Problem with strtok() and new line

Thanks for the suggestions.

--
Rohan Parkes
Melbourne
Australia



Sun, 10 Oct 2004 09:58:11 GMT  
 Problem with strtok() and new line

Quote:

> Perhaps using fgets() is the problem, but I can't see what's going
> wrong. Any advice would be appreciated. (I know some manuals warn
> against strtok(), but it I think it should work for this sort of job.)

It should, though it can be a {*filter*}.

Quote:
> #define SEPARATORS " .,?\"\n"

> while ((fgets(line, BUFFER, pFile) != NULL) && (!error))

I assume error is some kind of global variable?

Quote:
> {
>    //Move to next line if current line empty
>    if (strlen(line)<= 1) continue;

>    //Remove newline from end of string
>    if (*(line + (strlen(line) - 1)) == '\n')
>            *(line + (strlen(line) - 1)) = '\0';

You can simply ask for line[strlen(line)-1], no matter whether line is
an array or a pointer.

Quote:
>    word = strtok(line, SEPARATORS);

>    //If page delimiter reached, increment page counter, move to next
> line

And thus are illustrated the dangers of // comments... <g>.

Quote:
>    //Pass head, first word, page number to list
>    IndexAppend(&index, word, PageNumber)

>    //Loop through rest of line until end
>    while(word != NULL)
>    {
>            word = strtok(NULL, SEPARATORS);

>            if (word == NULL) break;        //Avoid empty lines

>            //Pass head, word, page number to list
>            IndexAppend(&index, word, PageNumber)
>    }

You could perhaps code the above more simply as

        do {
          IndexAppend(&index, word, PageNumber);
          word=strtok(NULL, SEPARATORS);
        } while (word!=NULL);

or even

        do {
          IndexAppend(&index, word, PageNumber);
        } while((word=strtok(NULL, SEPARATORS))!=NULL);

but I don't think this should solve your problem, since it ought to be
quite equivalent to what you've got.

Quote:
> }

In fact, I can't find any problem with the code you posted. Are you
_quite_ sure your list code is OK? In particular, are you sure
IndexAppend does what it ought to do?

Richard



Sun, 10 Oct 2004 16:28:40 GMT  
 Problem with strtok() and new line

uitgeverij.nl says...

Quote:

> In fact, I can't find any problem with the code you posted. Are you
> _quite_ sure your list code is OK? In particular, are you sure
> IndexAppend does what it ought to do?

> Richard

Thanks for those suggestions - the loop code could, as you say, be
improved.

Regrettably, I have to concede that the fault was, in fact, in my LL
code - as an earlier poster suggested, I wasn't creating storage inside
the node for the passed word. I was mislead by the fact that it seemed
OK until it got to a newline.
--
Rohan Parkes
Melbourne
Australia



Mon, 11 Oct 2004 12:25:05 GMT  
 Problem with strtok() and new line

Quote:

<snip>

> Regrettably, I have to concede that the fault was, in fact, in my LL
> code - as an earlier poster suggested, I wasn't creating storage inside
> the node for the passed word.

You owe me a sausage. :-)

(There's roughly 20 years of history behind the idea of "sausage bets",
but only Mick Jennings would understand (if he remembers). Basically,
they're sure-fire no-risk bets.)

<snip>

--

"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton



Mon, 11 Oct 2004 16:50:57 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. strtok() and new line problem

2. Parsing a comma delimited line with optional fields using sscanf or strtok

3. Macro problem (new lines in macros?)

4. Problems with reading a new line / carriage return

5. New Line Problem in ANSI-C program

6. New Line Problem

7. Problem with entering new lines in VC5

8. New Line/Col problem

9. New line character problem?

10. Problems whit SerializeRaw and new line string

11. strtok parsing problem

12. another problem.... about strtok

 

 
Powered by phpBB® Forum Software