Reading text file data into an array 
Author Message
 Reading text file data into an array

Hi all!
I have been trying recently to read fast a large text file containing data.
I read all the file into the memory and then in a loop I am using
scanf( buffer, "%f", &var ) to read the data. It works but is extremally
slow. For comparison I tried to open the same file with Matlab and what took
20min with my program lasted only a second. What is the reason? Is sscanf()
that slow (I am using VC++ compiler with recommended service packes
installed)? What are the alternatives?
I will apreciate any hint. Thanks in advance.
Pawel Kopyt


Mon, 28 Nov 2005 21:50:53 GMT  
 Reading text file data into an array

Quote:

> Hi all!
> I have been trying recently to read fast a large text file containing data.
> I read all the file into the memory

In which do you do this?
You are aware that this pre-reading introduces some overhead?

Quote:
> and then in a loop I am using
> scanf( buffer, "%f", &var ) to read the data. It works but is extremally
> slow. For comparison I tried to open the same file with Matlab and what took
> 20min with my program lasted only a second. What is the reason? Is sscanf()
> that slow (I am using VC++ compiler with recommended service packes
> installed)? What are the alternatives?

Try reading straight forward from the file and process what you
have read.

Quote:
> I will apreciate any hint. Thanks in advance.

If you need more help, post code.

--
Karl Heinz Buchegger



Mon, 28 Nov 2005 22:03:00 GMT  
 Reading text file data into an array
Hi!
Thanks for the replay. You aksed for the source, so here it is. Frankly, I
don't know what slows things down here

// Read the dimensions
sscanf( buffer, "%d %d %d\n", &Nx, &Ny, &Nz );

 Temperature = (float*) malloc( Nx*Ny*Nz * sizeof( float ) );
 if( Temperature == NULL ) return -1;

  float g = 0.0;
  for( z = 0; z < Nz; z++ ) {
   for( y = 0; y < Ny; y++ ) {
    for( x = 0; x < Nx; x++ ) {
     sscanf( buffer, "%f", &g );

     Temperature[ Nx*Ny*z + Nx*y + x ] = g
    }
   }
  }
 }

And this is to read the following file:
39 44 48
2.00000000e+001 2.00000000e+001 2.00000000e+001 2.00000000e+001
2.00000000e+001
2.00000000e+001 2.00000000e+001 2.00000000e+001 2.00000000e+001
2.00000000e+001
2.00000000e+001 2.00000000e+001 2.00000000e+001 2.00000000e+001
2.00000000e+001
...

The buffer is filled with fread() function:
fread( buffer, len, sizeof( char ), file );

I will be grateful for any hints.
Regards
Pawel



Quote:
> Hi all!
> I have been trying recently to read fast a large text file containing
data.
> I read all the file into the memory and then in a loop I am using
> scanf( buffer, "%f", &var ) to read the data. It works but is extremally
> slow. For comparison I tried to open the same file with Matlab and what
took
> 20min with my program lasted only a second. What is the reason? Is
sscanf()
> that slow (I am using VC++ compiler with recommended service packes
> installed)? What are the alternatives?
> I will apreciate any hint. Thanks in advance.
> Pawel Kopyt



Mon, 28 Nov 2005 22:34:54 GMT  
 Reading text file data into an array

Quote:

> // Read the dimensions
> sscanf( buffer, "%d %d %d\n", &Nx, &Ny, &Nz );

>  Temperature = (float*) malloc( Nx*Ny*Nz * sizeof( float ) );

I don't recommend casting the return value of malloc():

        * The cast is not required in ANSI C.

        * Casting its return value can mask a failure to #include
          <stdlib.h>, which leads to undefined behavior.

        * If you cast to the wrong type by accident, odd failures can
          result.

When calling malloc(), I recommend using the sizeof operator on
the object you are allocating, not on the type.  For instance,
*don't* write this:

        int *x = malloc (sizeof (int) * 128); /* Don't do this! */

Instead, write it this way:

        int *x = malloc (sizeof *x * 128);

There's a few reasons to do it this way:

        * If you ever change the type that `x' points to, it's not
          necessary to change the malloc() call as well.

          This is more of a problem in a large program, but it's still
          convenient in a small one.

        * Taking the size of an object makes writing the statement
          less error-prone.  You can verify that the sizeof syntax is
          correct without having to look at the declaration.

Quote:
>  if( Temperature == NULL ) return -1;

>   float g = 0.0;
>   for( z = 0; z < Nz; z++ ) {
>    for( y = 0; y < Ny; y++ ) {
>     for( x = 0; x < Nx; x++ ) {
>      sscanf( buffer, "%f", &g );

That's going to read the same data from `buffer' every time
through the loop.  You should read something new into the buffer
in each iteration through the loop, or read from a different
buffer.

Quote:
>      Temperature[ Nx*Ny*z + Nx*y + x ] = g
>     }
>    }
>   }
>  }
> The buffer is filled with fread() function:
> fread( buffer, len, sizeof( char ), file );

sizeof(char) is always 1.
--
"Welcome to the wonderful world of undefined behavior, where the demons
 are nasal and the DeathStation users are nervous." --Daniel Fox


Tue, 29 Nov 2005 00:59:19 GMT  
 Reading text file data into an array

Quote:

> // Read the dimensions
> sscanf( buffer, "%d %d %d\n", &Nx, &Ny, &Nz );

Ok, first thing, are you sure these dimensions are right? Garbage values
could cause an absolutely long loop.
Quote:

>  Temperature = (float*) malloc( Nx*Ny*Nz * sizeof( float ) );
>  if( Temperature == NULL ) return -1;

>   float g = 0.0;
>   for( z = 0; z < Nz; z++ ) {
>    for( y = 0; y < Ny; y++ ) {
>     for( x = 0; x < Nx; x++ ) {
>      sscanf( buffer, "%f", &g );

You've got to here. Is the code still slow? I.e. is sscanf() the bottleneck?

Quote:

>      Temperature[ Nx*Ny*z + Nx*y + x ] = g
>     }
>    }
>   }
>  }

> The buffer is filled with fread() function:
> fread( buffer, len, sizeof( char ), file );

Put this back in your loop and comment out the sscanf(). Is the code still
slow? In which case fread is your bottleneck.
Also, make sure that you are actually reading the right number of
characters. If your input is a text file you would normally use fgets(). How
are you calculating len?


Tue, 29 Nov 2005 01:48:19 GMT  
 Reading text file data into an array

Quote:

> Hi!
> Thanks for the replay. You aksed for the source, so here it is. Frankly, I
> don't know what slows things down here

> // Read the dimensions
> sscanf( buffer, "%d %d %d\n", &Nx, &Ny, &Nz );

>  Temperature = (float*) malloc( Nx*Ny*Nz * sizeof( float ) );
>  if( Temperature == NULL ) return -1;

>   float g = 0.0;
>   for( z = 0; z < Nz; z++ ) {
>    for( y = 0; y < Ny; y++ ) {
>     for( x = 0; x < Nx; x++ ) {
>      sscanf( buffer, "%f", &g );

>      Temperature[ Nx*Ny*z + Nx*y + x ] = g
>     }
>    }
>   }
>  }

Replace the Temperature[/*...*/] with a pointer.
Multiplications are generally slower than an
increment.

float * p_temperature;
p_temperature = Temperature;
for (z = 0; /*... */)
{
   for (y = 0; /*...*/)
   {
     for (x = 0; /*...*/)
     {
       sscanf(buffer, "%f", &g);
       *p_temperature++ = g;
     }
   }

Quote:
}

One issue still to resolve is how to increment
"buffer" after each successful sscanf.

--
Thomas Matthews

C++ newsgroup welcome message:
          http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq:   http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
          http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
     http://www.josuttis.com  -- C++ STL Library book



Tue, 29 Nov 2005 04:27:27 GMT  
 Reading text file data into an array

Quote:


> >      Temperature[ Nx*Ny*z + Nx*y + x ] = g

> Replace the Temperature[/*...*/] with a pointer.
> Multiplications are generally slower than an
> increment.

On modern hardware with reasonable compilers, it's not going to
be enough slower, if it is slower at all, to account for a file
read going from 1 second to 20 minutes.
--
"I hope, some day, to learn to read.
 It seems to be even harder than writing."
--Richard Heathfield


Tue, 29 Nov 2005 04:39:30 GMT  
 Reading text file data into an array
On Thu, 12 Jun 2003 20:27:27 GMT, Thomas Matthews

Quote:

>One issue still to resolve is how to increment
>"buffer" after each successful sscanf.

By using the "n" specification. Actually, given the input, I'd
probably use strtod. I expect it would be faster, as well.

--
Al Balmer
Balmer Consulting



Tue, 29 Nov 2005 04:48:44 GMT  
 Reading text file data into an array

Quote:

> Hi all!
> I have been trying recently to read fast a large text file containing data.
> I read all the file into the memory and then in a loop I am using
> scanf( buffer, "%f", &var ) to read the data. It works but is extremally
> slow. For comparison I tried to open the same file with Matlab and what took
> 20min with my program lasted only a second. What is the reason? Is sscanf()
> that slow (I am using VC++ compiler with recommended service packes
> installed)? What are the alternatives?
> I will apreciate any hint. Thanks in advance.
> Pawel Kopyt

Try using fgets and strtod instead.


Tue, 29 Nov 2005 06:10:30 GMT  
 Reading text file data into an array

Quote:


>> Hi!
>> Thanks for the replay. You aksed for the source, so here it is. Frankly, I
>> don't know what slows things down here

>> // Read the dimensions
>> sscanf( buffer, "%d %d %d\n", &Nx, &Ny, &Nz );

>>  Temperature = (float*) malloc( Nx*Ny*Nz * sizeof( float ) );
>>  if( Temperature == NULL ) return -1;

>>   float g = 0.0;
>>   for( z = 0; z < Nz; z++ ) {
>>    for( y = 0; y < Ny; y++ ) {
>>     for( x = 0; x < Nx; x++ ) {
>>      sscanf( buffer, "%f", &g );

>>      Temperature[ Nx*Ny*z + Nx*y + x ] = g
>>     }
>>    }
>>   }
>>  }

> Replace the Temperature[/*...*/] with a pointer.
> Multiplications are generally slower than an
> increment.

That is a rather general statement about a thng the C standard doesn't
specify at all. We all can think of computers that would be faster at
multiplications than incrementing. You probably want to stick your nose
into the thread about, I'll call it premature optimisation, (actually
the threads subject is "[OT] beginners' interests in efficiency"). What
you suggest IMHO is just playing around.

Without identifying the actual bottle neck there is no point in
"optimizing" this or that.

- Show quoted text -

Quote:

> float * p_temperature;
> p_temperature = Temperature;
> for (z = 0; /*... */)
> {
>   for (y = 0; /*...*/)
>   {
>     for (x = 0; /*...*/)
>     {
>       sscanf(buffer, "%f", &g);
>       *p_temperature++ = g;
>     }
>   }
> }

> One issue still to resolve is how to increment
> "buffer" after each successful sscanf.

--

"LISP  is worth learning for  the profound enlightenment  experience
you will have when you finally get it; that experience will make you
a better programmer for the rest of your days."   -- Eric S. Raymond


Tue, 29 Nov 2005 16:09:07 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. How to read text file and access data?

2. reading data from a text file part II

3. reading data from a text file

4. reading data from text file

5. Reading data from a text file

6. Newbie : How to read data from text file???

7. Read array from text file.

8. reading text file contents into 2D array

9. Reading Text FIle Into Array On Macintosh

10. Reading and Splicing Text file into Array

11. Text Data File to 2D Array?

12. File reading problem - reading a text colormap.

 

 
Powered by phpBB® Forum Software