getline function that allocates memory on-the-fly 
Author Message
 getline function that allocates memory on-the-fly

Hi,

I finally took some time to write a function that I have wanted to write
for some time, namely a function that will read a line of any lenght
into a string. Every time it runs out of space in the string it doubles
the space, so that we save expensive realloc() calls. Observe that we
never waste more than half the memory.

It compiled cleanly at first go (gcc -Wall -W -ansi -pedantic
getline.c), which only enhances my belief in that it has to have some
grandeur flaw ;)

It'd be nice to have some comments, good or bad :)

Also, what is a good initial value for the lenght? I've heard that
malloc() and friends are among the slowest functions you can call, so
you want to not call them too often. However, there is a tradeoff by
chosing a too large default value and wasting memory, or calling realloc
many times...

Or should I take a reasonably large inital value, and then calling
realloc to free up part of the string before return?

Stig
--
brautaset.org

PS: lclint complains about the return-values of fgets and fputs being
    ignored, but frankly, I've never seen any code that checks this...
    Not even in the clc-faq.

#include <stdio.h>
#include <stdlib.h>

char* getline(FILE *fp)
{
        size_t slen, len = 2;   /* rediculusly small value for testing */
        char *buff, *tmp;

        if ((buff = malloc(len)) == NULL) {
                fputs("not enough memory.\n", stderr);
                exit(EXIT_FAILURE);
        }

        fgets(buff, len, fp);
        slen = strlen(buff);

        while (slen == len-1) {
                if ((tmp = realloc(buff, len * 2)) == NULL) {
                        fputs("not enough memory.\n", stderr);
                        exit(EXIT_FAILURE);
                }
                buff = tmp;

                fgets(buff+len-1, len+1, fp);
                slen += strlen(buff+len-1);

                len *= 2;
        }
        return buff;

Quote:
}

int main(void)
{
        char *s;

        s = getline(stdin);
        printf("%s\n", s);
        free(s);

        return 0;

Quote:
}



Thu, 02 Sep 2004 10:10:04 GMT  
 getline function that allocates memory on-the-fly


[snip]

Quote:
> Also, what is a good initial value for the lenght? I've heard that
> malloc() and friends are among the slowest functions you can call, so
> you want to not call them too often. However, there is a tradeoff by
> chosing a too large default value and wasting memory, or calling realloc
> many times...

> Or should I take a reasonably large inital value, and then calling
> realloc to free up part of the string before return?

Perhaps you could mimic the behavior of GNU getline, and require the caller to
provide a 'mallocated' buffer of a specific size which can then be realloc'ed
larger as needed, as it is, you're calling malloc once per line read, which is
not terribly efficient.

Some functionality to return the actual number of characters read in would
also be nice.

Also, since your implementation of getline uses fgets, you inherit all the
issues surrounding fgets. What happens if nul characters are encountered? What
happens if the last line does not end in a newline?

-Daniel



Thu, 02 Sep 2004 10:58:37 GMT  
 getline function that allocates memory on-the-fly

Quote:

> It'd be nice to have some comments, good or bad :)

I have some comments on how I think it could be better.

First off, a design point.

This function actually does two things. It reads a line,
and it grows a buffer. Instead of a simple little function
like this, riddled with the dangers I am about to document,
I would build a simple string library, with a growable
buffer at it's heart.

Quote:
> Also, what is a good initial value for the lenght?

I'd let the caller specify.

Quote:
> I've heard that
> malloc() and friends are among the slowest functions you can call, so
> you want to not call them too often. However, there is a tradeoff by
> chosing a too large default value and wasting memory, or calling realloc
> many times...

Depends on the application. If you are using this for stdin
input, you probably don't care about performance anyway, so
malloc to your heart's content. On the other hand, if you
are doing high performance processing, say, file reading in
a tight loop, you probably don't want to be using malloc
at all, and should require the caller to pass in a buffer
to receive their data. (If line length exceeds buffer size,
there should be a return value indicating the line is unfinished,
rather than reallocing the buffer.)

Quote:
>    fgets(buff, len, fp);
>    slen = strlen(buff);

>    while (slen == len-1) {
>            if ((tmp = realloc(buff, len * 2)) == NULL) {
>                    fputs("not enough memory.\n", stderr);
>                    exit(EXIT_FAILURE);
>            }
>            buff = tmp;

>            fgets(buff+len-1, len+1, fp);
>            slen += strlen(buff+len-1);

>            len *= 2;
>    }

The things I would watch out for here are:

A) you ignore the return value of fgets. Just
   because many people are lazy and don't do so,
   doesn't mean you should do the same.

B) if you error, you exit the program. assuming this
   would be a library function call in a larger application,
   you would certainly want to return an error
   value rather than exit, if for no other reason
   than to appropriately log the location of the
   out-of-memory before shutting down.

C) I don't like the double-occurence of fgets,
   or the use of strlen, which traverses the
   buffer all over again.

   How about the following as a minor modification
   for clarity, and a tiny bit of efficiency.

   * passes back the length;
   * avoids strlen;
   * avoids multiple locations of buffer assignment;

   I don't personally know whether fgets is more
   efficient than multiple fgetc calls: I would hope
   the compiler is inlining whatever fgetc does.

#include <stdio.h>
#include <stdlib.h>

char* getline(FILE *fp, int* str_len)
{
    int len = 2;/* rediculusly small value for testing */
    int pos = 0;
    char *buff;
    char c;

    buff = malloc(len);
    if (!buff) return (NULL);

    while ((c = fgetc(fp)) != EOF) {
        buff[pos++] = c;
        if (c == '\n') break;
        if (pos == len) {
            len *= 2;
            buff = realloc(buff, len);
            if (!buff) return (NULL);
        }
    }
    buff[pos] = '\0';
    *str_len = pos; /* includes '\n' if any. */

    return buff;

Quote:
}

int main(void)
{
    char *s;
    int length;

    s = getline(stdin, &length);
    if (s)
        printf("%s (size: %d)\n", s, length);
    else printf("Action failed.\n");

    free(s);

    return 0;

Quote:
}



Thu, 02 Sep 2004 13:35:33 GMT  
 getline function that allocates memory on-the-fly
Quote:


(snip)
>             buff = realloc(buff, len);

(snip)

Not commenting on the whole - but the above line causes memory
leakage unless the program terminates immediately. If the
realloc() fails, the previous buffer pointer is lost, and cannot
be freed.

--
/Svante

http://axcrypt.sourceforge.net
Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP



Thu, 02 Sep 2004 16:01:12 GMT  
 getline function that allocates memory on-the-fly

Quote:



>(snip)
>>             buff = realloc(buff, len);
>(snip)

>Not commenting on the whole - but the above line causes memory
>leakage unless the program terminates immediately. If the
>realloc() fails, the previous buffer pointer is lost, and cannot
>be freed.

and this is right?

char* getline(FILE* file, char* s, int step, int* len, int* eof)
{int c, i;
 char *k;
 if(step<=5) return NULL;
 *eof=0;
 for(i=0;;++i)
      {if(i%step==0 && (k=realloc(s, i+step+1))==NULL)
                                  {free(s); return NULL;}
       s=k;
       if((c=getc(file))==EOF) {*eof=1; break;}
       s[i]=(char) c;
       if(c=='\n') break;
      }
 s[i]= '\0';
 *len=i;
 return s;

Quote:
}



Thu, 02 Sep 2004 18:20:05 GMT  
 getline function that allocates memory on-the-fly


Quote:
> I finally took some time to write a function that I have wanted to write
> for some time, namely a function that will read a line of any lenght
> into a string. Every time it runs out of space in the string it doubles
> the space, so that we save expensive realloc() calls. Observe that we
> never waste more than half the memory.

Good.

Quote:
> It compiled cleanly at first go (gcc -Wall -W -ansi -pedantic
> getline.c), which only enhances my belief in that it has to have some
> grandeur flaw ;)

- Missing prototype for strlen(). (as defined in <string.h>)
- You should handle the EOF case.
- Also, when entering an "empty" string, (by hitting the <enter> key
only), the behaviour is strange... The problem is probaly due to your
design with two fgets() functions. One is certainly enough. You should
work harder on the algorithm.

(As a state-of-the-art rule, please don't patch the existing one with some
ugly flags or goto)

Quote:
> Also, what is a good initial value for the lenght? I've heard that

I would put a minimum value of 4 that is good for a fgets(). In fact, for
an human input, I would put 32 or 64. For a computer input, 128 or 256.
It's not a big issue.

It's a pity that you put your code after the .sig, because in the reply,
my newsreader has simply eliminated it. I suspect the use of an evil
'begin 666' (attached file). Don't do that, but copy and paste the code
into the message body.

--
-ed- emdel at noos.fr
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
C-library: http://www.dinkumware.com/htm_cl/index.html
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/



Thu, 02 Sep 2004 20:01:09 GMT  
 getline function that allocates memory on-the-fly

Quote:

>> Also, what is a good initial value for the lenght? I've heard that
>> malloc() and friends are among the slowest functions you can call, so
>> you want to not call them too often. However, there is a tradeoff by
>> chosing a too large default value and wasting memory, or calling
>> realloc many times...

>> Or should I take a reasonably large inital value, and then calling
>> realloc to free up part of the string before return?

> Perhaps you could mimic the behavior of GNU getline, and require the caller to
> provide a 'mallocated' buffer of a specific size which can then be realloc'ed
> larger as needed, as it is, you're calling malloc once per line read, which is
> not terribly efficient.

I don't see how calling malloc before or after I enter the function will
change the fact that I have to call malloc at least once per line read,
although passing in the size to use as the initial lenght is a good
idea.

Quote:
> Some functionality to return the actual number of characters read in would
> also be nice.

Agreed.

Quote:
> Also, since your implementation of getline uses fgets, you inherit all the
> issues surrounding fgets. What happens if nul characters are encountered? What
> happens if the last line does not end in a newline?

ehh.. I see your point, that have to be fixed :)

The easiest thing would be to just use fgetc and read a character at a
time, but I was of the impression that calling fgetc many times would be
far less inefficient than calling fgets once.

As someone said, if I'm only caring for keyboard input the efficiency
does not matter, but I was planning to use it for other things as well.

Stig
--
brautaset.org



Thu, 02 Sep 2004 19:01:04 GMT  
 getline function that allocates memory on-the-fly

Quote:



>> I finally took some time to write a function that I have wanted to write
>> for some time, namely a function that will read a line of any lenght
>> into a string. Every time it runs out of space in the string it doubles
>> the space, so that we save expensive realloc() calls. Observe that we
>> never waste more than half the memory.

> Good.

>> It compiled cleanly at first go (gcc -Wall -W -ansi -pedantic
>> getline.c), which only enhances my belief in that it has to have some
>> grandeur flaw ;)

> - Missing prototype for strlen(). (as defined in <string.h>)

thanks.. (how come gcc didn't spot that one?)

Quote:
> - You should handle the EOF case.

yep, someone else pointed that out to me.

Quote:
> It's a pity that you put your code after the .sig, because in the reply,
> my newsreader has simply eliminated it.

Aww... I didn't think of that..

Quote:
> I suspect the use of an evil 'begin 666' (attached file).

Nope, it was an "G:r getline.c" in vim. Wouldn't dream of sending
attachments to clc :)

Stig
--
brautaset.org



Thu, 02 Sep 2004 20:12:51 GMT  
 getline function that allocates memory on-the-fly

Quote:




> >(snip)
> >>             buff = realloc(buff, len);
> >(snip)

> >Not commenting on the whole - but the above line causes memory
> >leakage unless the program terminates immediately. If the
> >realloc() fails, the previous buffer pointer is lost, and cannot
> >be freed.

> and this is right?

Well.. It doesn't obviously leak memory anyways, except in one
possible situation I can find, although it probably wastes it.

Quote:

> char* getline(FILE* file, char* s, int step, int* len, int* eof)
> {int c, i;
>  char *k;
>  if(step<=5) return NULL;

Why? Let the caller decide what's efficient. It's also usual
to implement a default, and communicate the wish for this by
using an unused/illegal value such as <= 0. I.e.
if (step <= 0) step = 5;

Also, assuming the caller does str = getline(fp, str, n, ...) and
n is <= 5, the caller looses his buffer pointer again. As the
semantics of the function elsewhere indicates that a NULL return
implies free()'d memory, this is inconsistent and may cause leakage.

Quote:
>  *eof=0;

Why would you keep track of this? feof() does it for you.

Quote:
>  for(i=0;;++i)

Style note 1: Unusual indentation rule with braces. They are
usually either attached to the for-line, or aligned with the
'f' of 'for' on the line below.

Style note 2: This is an unusual incomplete for. In this case
it might be cleare to rewrite into for (;;) and move the
index increment. Alternatively, use a while.

Quote:
>       {if(i%step==0 && (k=realloc(s, i+step+1))==NULL)

It might be easier to save the old in the call by doing
s = realloc(k = s, ....) .... free(k);

You will also be doing an extra realloc, if for example,
the line consists of "abcdef", step is 6 and EOF is encountered.

Style note 3: Left braces usually end a line.
Style note 4: What's wrong with spaces? (Applies to the whole snippet).
Style note 5: (Weak) I usually do not compare against '0'. I use it
directly or with ! operator.

Quote:
>                                   {free(s); return NULL;}

Style note 6: Avoid several statements on on line.

Quote:
>        s=k;
>        if((c=getc(file))==EOF) {*eof=1; break;}

Depending on the goal, one might consider orthogonalizing here
and either insert a '\n' on EOF or...

Quote:
>        s[i]=(char) c;

Style note 7: Unnecessary and usually unused cast. Due to the way C
treats chars, one does not usually cast int's assigned to char's,
even it indeed may result in loss of precision, technically.

Quote:
>        if(c=='\n') break;

...not do it here. I.e. making the function always or never
include a line termination char. I would vote for 'never'...

Quote:
>       }
>  s[i]= '\0';
>  *len=i;

A common way to generalize a function taking 'by reference'
arguments in C is to allow passing a NULL pointer, and thus
not requiring the caller to provide a variable that may not
be of interest to the caller. I.e. if (len) *len = i;

Quote:
>  return s;
> }

All of the above results in this attempt (compiled but not
tested, as a side note, I still feel one should use fgets()
for efficency though instead of getc() unless it's a known
interactive application):

/* Always strips eventual '\n', for default behavior
   call with getline(fp, 0, 0, 0) */
char *getline(FILE *fp, char *s, int step, int *len) {
    int c, i = 0;
    char *k;

    if (step <= 0) step = 5; /* Use default step */
    while ((c = getc(fp)) != EOF && c != '\n') {
        if (!(i % step) && !(s = realloc(k = s, i + step + 1))) {
            free(k);
            return NULL;
        }
        s[i++] = c;
    }
    s[i] = '\0';
    if (len) *len = i;
    return s;

Quote:
}

--
/Svante

http://axcrypt.sourceforge.net
Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP



Thu, 02 Sep 2004 21:38:29 GMT  
 getline function that allocates memory on-the-fly
Oops and darn. Correction to previous post follows:

(snip)

Quote:
> You will also be doing an extra realloc, if for example,
> the line consists of "abcdef", step is 6 and EOF is encountered.

My statement above is false. The posters code is ok in this regard.
My previously proposed code on the other hand will crash when getting
zero-length strings if the caller has not supplied a buffer... Not
really an improvement... 8-(.

On the other hand, the realloc should then alloc i + step, not
i + step + 1. Also corrected below.

(snip)

Revised edition of the updated function follows:

/* Always strips eventual '\n', for default behavior
   call with getline(fp, 0, 0, 0) */
char *getline(FILE *fp, char *s, int step, int *len) {
    int c, i = 0;
    char *k;

    if (step <= 0) step = 5; /* Use default step */
    for (;;) {
        if (!(i % step) && !(s = realloc(k = s, i + step))) {
            free(k);
            return NULL;
        }
        if ((c = getc(fp)) == EOF || c == '\n') break;
        s[i++] = c;
    }
    s[i] = '\0';
    if (len) *len = i;
    return s;

Quote:
}

--
/Svante

http://axcrypt.sourceforge.net
Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP



Thu, 02 Sep 2004 22:13:39 GMT  
 getline function that allocates memory on-the-fly


Quote:

> > Perhaps you could mimic the behavior of GNU getline, and require the
caller to
> > provide a 'mallocated' buffer of a specific size which can then be
realloc'ed
> > larger as needed, as it is, you're calling malloc once per line read,
which is
> > not terribly efficient.

> I don't see how calling malloc before or after I enter the function will
> change the fact that I have to call malloc at least once per line read,
> although passing in the size to use as the initial lenght is a good
> idea.

size_t buf_len = MY_BUF_LEN;
char * buf;

if( (buf = malloc( buf_len )) == NULL ) {
  /* do whatever */

Quote:
}

while( somecondition ) {
    getline( &buf, &buf_len, ... );

Quote:
}

free( buf );

One malloc(), any number of getline() calls. Only as many realloc() calls as
needed.
getline() can also alter the buffer pointer and the buffer length as needed.

Quote:
> > Some functionality to return the actual number of characters read in would
> > also be nice.

> Agreed.

> > Also, since your implementation of getline uses fgets, you inherit all the
> > issues surrounding fgets. What happens if nul characters are encountered?
What
> > happens if the last line does not end in a newline?

> ehh.. I see your point, that have to be fixed :)

> The easiest thing would be to just use fgetc and read a character at a
> time, but I was of the impression that calling fgetc many times would be
> far less inefficient than calling fgets once.

Generally that is the most robust way. fgets() can work for you as long as you
are aware of the issues.
You can often deal with the last line having no newline by checking feof()
after a successful fgets() call, and you can check for the presence of nul
characters by checking strlen() vs. strchr() for the newline character, among
other methods. However you'll find that you can't check both ... if the last
line has any nul characters in it and is not terminated by a newline, there
will be no way to detect them.

Note that it may be pointed out that a file containing nul characters isn't
technically a text file... but it is nice to be robust if you can.

-Daniel



Fri, 03 Sep 2004 00:53:33 GMT  
 getline function that allocates memory on-the-fly

Quote:

>> into a string. Every time it runs out of space in the string it doubles
>> the space, so that we save expensive realloc() calls. Observe that we
>> never waste more than half the memory.

> Good.

[snip]

Quote:
> - Missing prototype for strlen(). (as defined in <string.h>)
> - You should handle the EOF case.
> - Also, when entering an "empty" string, (by hitting the <enter> key
> only), the behaviour is strange... The problem is probaly due to your
> design with two fgets() functions. One is certainly enough. You should
> work harder on the algorithm.

> (As a state-of-the-art rule, please don't patch the existing one with some
> ugly flags or goto)

>> Also, what is a good initial value for the lenght? I've heard that

> I would put a minimum value of 4 that is good for a fgets(). In fact, for
> an human input, I would put 32 or 64. For a computer input, 128 or 256.
> It's not a big issue.

OK, I have *completely* redone my code, and many of the points above
have been adhered to. The code does not double the amount of memory
anymore, but rather increases it by 50%. Also, it takes a few more
arguments etc.

Send in the Wolves :)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#define GL_BUFSIZE      128
#define GL_MINBUFSIZE   16
#define GL_GROWFACTOR   1.5     /* every realloc grow the buffer this much */

/*
 * This function reads a string of any lenght from any FILE*-type stream (like
 * stdin) and returns a pointer to the string through its first argument. It
 * will keep using realloc if need be to make sure that we can fit the whole
 * string.
 *
 * REQUIREMENTS:
 *   1) *line _must_ be a pointer to char, and either:
 *      - initialised to NULL. An initial GL_BUFSIZE or *size bytes is allocated
 *        for it, depending on which is greater: *size or GL_MINBUFSIZE.
 *      - initialised by a previous call to malloc. *size is respected, if
 *        larger than GL_MINBUFSIZE, realloc'ed to GL_BUFSIZE if it is not.
 *  
 *   2) It is callers responsibility to free up memory returned from this
 *      function.
 *
 * RETURN VALUES:
 *      This function returns -2 upon error allocating memory. Upon error
 *      reading, -1 is returned. If end of file is found, but no '\n' is seen, 1
 *      is returned. Zero is returned in the normal case (i.e. a '\n'-terminated
 *      was read).
 */
int getline(char **line, size_t *size, FILE *fp)
{
        size_t pos = 0;
        char *tmp;

        assert(fp    != NULL);  
        assert(line  != NULL);  
        assert(size  != NULL);  

        /*
         * Allocate a reasonably-sized default chunk if we are passed a NULL
         * pointer, or the size is very small.
         */
        if (!*line || *size < GL_MINBUFSIZE) {
                *size = *size > GL_MINBUFSIZE ? *size : GL_BUFSIZE;
                tmp = realloc(*line, *size);
                if (!*line) {
                        perror("malloc, getline()");
                        return -2;
                }
                *line = tmp;
        }

        for (;;) {
                /*
                 * Read in a chunk of the specified size.
                 */
                tmp = fgets(*line+pos, *size - pos, fp);
                if (!tmp || ferror(fp)) {
                        /* Error occured. Scream.  */
                        return -1;
                } else if (feof(fp)) {
                        /* End of file reached, but no \n seen. */
                        return 1;
                } else if (pos+strlen(*line+pos) != *size-1) {
                        /*
                         * String is shorter than the full buffer, and no error
                         * detected above. Return to the caller, signaling all
                         * is ok.
                         *
                         * I cannot think of a way to do this check that doesn't
                         * require strlen/index etc, apart from using fgetc and
                         * read a character at a time. That approach proved to
                         * be quite a bit slower, so I scraped it.
                         */
                        return 0;
                }
                pos = *size - 1;
                *size *= GL_GROWFACTOR;

                /*
                 * Reallocate a factor of GL_GROWFACTOR more memory.
                 */
                tmp = realloc(*line, *size);
                if (!tmp) {
                        perror("realloc, getline()");
                        return -2;
                }
#ifdef DEBUG
fprintf(stderr, "realloc: %u -> %u bytes. location: %p -> %p\n",
                                pos+1, *size, *line, tmp);
#endif
                *line = tmp;
        }

Quote:
}

int main(void)
{
        char *s = NULL;
        size_t len = 128;

        s = malloc(len);
        for (;;) {
                if (0 != getline(&s, &len, stdin))
                        break;  
                printf("%s", s);
        }
        fflush(stdout);

        return 0;

Quote:
}

Stig
--
brautaset.org


Fri, 03 Sep 2004 05:50:46 GMT  
 getline function that allocates memory on-the-fly

(snip, text and code about getting a line w/o buffer overflow and limits)

Quote:
> #define GL_GROWFACTOR 1.5 /* every realloc grow the buffer this much */

This is not so good. This will include floating point in a program
otherwise free from it. Stick to integer types for this kind of code.

Quote:

> /*
>  * This function reads a string of any lenght from any FILE*-type stream (like
>  * stdin) and returns a pointer to the string through its first argument. It
>  * will keep using realloc if need be to make sure that we can fit the whole
>  * string.
>  *

(snip - Stigs code)

Here is a hacked version using fgets() of the getc() code previously
posted. I believe in returning single status error codes if at all
possible, and keeping parameters to a mininum, and defaulting as much
as possible. #includes skipped for brevity.

/*
    Always strips eventual '\n'.
    Call with an alloc()'d buffer or NULL.
    Optionally a pointer to the buffers length.
    Optionally it's actual length, or zero.
    Returns a new buffer pointer, or NULL on
    allocation error. If buffer length ptr
    given, new buffers length is returned there.
    For other errors, always check feof() and
    ferror().
*/
char *getline(FILE *fp, char *s, size_t *pilen) {
    size_t i = 0, olen = 0, xlen = 80; /* Chg min buffer here */
    char *t;

    if (pilen && *pilen >= xlen) olen = *pilen;
    for (;;) {
        if (i + 1 >= olen) {
            if (!(s = realloc(t = s, olen = i + xlen))) {
                free(t);
                return NULL;
            }
        }
        if (!fgets(t = &s[i], olen - i, fp)) break;
        t = strchr(t, '\0');
        if (feof(fp) || *--t == '\n') break;

        i = olen - 1;
        xlen = olen / 2;
    }
    *t = '\0';
    if (pilen) *pilen = olen;
    return s;

Quote:
}

--
/Svante

http://axcrypt.sourceforge.net
Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP



Fri, 03 Sep 2004 07:46:36 GMT  
 getline function that allocates memory on-the-fly

Quote:

> The easiest thing would be to just use fgetc and read a character at a
> time, but I was of the impression that calling fgetc many times would be
> far less inefficient than calling fgets once.

No, you wouldn't use fgetc().  Instead, you would use fread() and
manage a larger buffer inside a "context" object of some sort.  This
approach [very common, and leads to natural C++/object expressions]
also "fixes" the matter of the client allocating the initial buffer,
and even cleans up the interface a bit.

Here is my recommendation:  instead of a complicated one-entry-point
getline() method:

   char *getline(char **x, int *n, ...)

One would create 3 simpler methods:

   getl getline_open(FILE *fp, int buffer_size_hint)
   const char *getline(getl, size_t *line_len)
   void getline_close(getl)

The 'getl' would map [via some icky cast] to the context object that
has a large buffer and various pointers inside it.  The actual getline
method would walk a pointer inside this buffer until it hit the end of
a line and then stuff a NUL there and return the start point [hence
the prudence of returning a 'const char *':  the client can't{*filter*}
around with your buffer, at least as long as they play along with
const etc].  If one hits the end of the buffer without encountering
and end-of-line, you shift down from where you initially started the
scan and attempt to fread() another piece.  If the initial scan
started at the beginning of the buffer and you still hit the end of
the buffer, you are now left with extending the size of the buffer by
some factor [2x is usually too much;  for growth things like this, I
like 1.5x (easy to code), though there is some basis for using the
golden ratio too] and again fread()'ing into it.

A strategy for shrinking the buffer may also be appropriate as Very
Long Lines are usually rare in most input documents.

Note that the logic for deciding "end of line" doesn't have to be a
simple "is this a \n character?"  It can be a regexp, or even a
user-supplied recognizer of some form.  [A common one will be to
recognize EOF - in effect loading the entire document into memory.]
Naturally, things get slower as you get more general.

Also note that using fread() encurs the cost of two buffers instead of
just one [fread() also maintains a buffer].  You can remove this (and
its attendant memcpy()] by going to lower level entities in your
operating system:  file "handles", "descriptors", or such.  Remember
that fread() etc exists to smear out the cost of a OS system call:  it
is more a system call buffer than a data buffer.  The above "getline"
will do as good (if not a better) job of performing this function.



Fri, 03 Sep 2004 10:35:58 GMT  
 getline function that allocates memory on-the-fly
[I've read the other replies, and have tried not to duplicate comments
already made, e.g. omission of <string.h> ]

Quote:

<snip>

> PS: lclint complains about the return-values of fgets and fputs being
>     ignored, but frankly, I've never seen any code that checks this...
>     Not even in the clc-faq.

Always check the return value of fgets. Otherwise, you have no way of
knowing whether you've reached end of file.

Quote:

> #include <stdio.h>
> #include <stdlib.h>

> char* getline(FILE *fp)
> {
>         size_t slen, len = 2;   /* rediculusly small value for testing */

Consider making len equal to BUFSIZ (which is defined in stdio.h).

Quote:
>         char *buff, *tmp;

>         if ((buff = malloc(len)) == NULL) {
>                 fputs("not enough memory.\n", stderr);
>                 exit(EXIT_FAILURE);
>         }

Usability problem - this means you have to create a new string every
time, and free it when you're done. If you sent a pointer to the string,
and a pointer to the current maximum size, you could re-use the string
(e.g. in a loop reading through each line of a file), and just free it
once at the end. This would leave the return value available for some
kind of error or EOF indicator (or the number of characters read, or
whatever).

Quote:

>         fgets(buff, len, fp);

Check fgets return value.

Quote:
>         slen = strlen(buff);

>         while (slen == len-1) {
>                 if ((tmp = realloc(buff, len * 2)) == NULL) {

Consider (len * 3) / 2 instead.

Quote:
>                         fputs("not enough memory.\n", stderr);
>                         exit(EXIT_FAILURE);

Better: return an error condition of some kind, so that the calling code
can decide whether to abort.

Quote:
>                 }
>                 buff = tmp;

>                 fgets(buff+len-1, len+1, fp);

Check fgets return value.

Quote:
>                 slen += strlen(buff+len-1);

>                 len *= 2;

Consider (len * 3) / 2 instead.

<snip>

--

"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton



Fri, 03 Sep 2004 16:35:29 GMT  
 
 [ 25 post ]  Go to page: [1] [2]

 Relevant Pages 

1. allocating memory in one function and freeing it with another function

2. How to access memory allocated in function

3. Allocating memory for function argument

4. C functions that allocate memory

5. Freeing memory allocated inside a function

6. allocate memory for function

7. allocate memory in a function

8. standard variadic functions allocate memory?

9. Do the string functions allocate memory?

10. allocating memory and processing thru function.

11. Allocating memory inside a function

12. allocating memory with functions

 

 
Powered by phpBB® Forum Software