How do I process a data file that includes other files 
Author Message
 How do I process a data file that includes other files

I have some data files that contain a standard syntax such as #include
'otherfile' to
include those parts on the spots. Of course I can do two passes to
process the whole
dataset: first merge the includes parts, and then process the merged
file. But problem is
the size of the data files, and the flexibility I'd like my awk script
to have (one may skip some parts). Doing it in two passes seems a waste
of time and resources. Does somebody have a better idea to tackle this?
Thanks in advance.




Sun, 29 Apr 2001 03:00:00 GMT  
 How do I process a data file that includes other files
You could try an approach such as:

# If the line contains the string "#include" then strip off the #include and
any surrounding spaces
# als remove any trailing spaces from the include file name

$0 ~ /#include/ {
 includeFile = $0
 gsub(/^ +#include +/,"",includeFile)
 gsub(/ +$/,"",includeFile)
 while ((getline includeFileLine < includeFile) > 0) {
 # do your processing of all lines retirieved from the include file here
 # using includeFileLine as the variable containing the contents of a line
 # of text from the include file for each iteration of this loop
 }
 close(includeFile)

Quote:
}

{
# Any other stuff you need to do goes here

Quote:
}

Cesar
--
Please remove the UPPERCASE characters from my e-mail address for the real
thing
Quote:

>I have some data files that contain a standard syntax such as #include
>'otherfile' to
>include those parts on the spots. Of course I can do two passes to
>process the whole
>dataset: first merge the includes parts, and then process the merged
>file. But problem is
>the size of the data files, and the flexibility I'd like my awk script
>to have (one may skip some parts). Doing it in two passes seems a waste
>of time and resources. Does somebody have a better idea to tackle this?
>Thanks in advance.





Sun, 29 Apr 2001 03:00:00 GMT  
 How do I process a data file that includes other files
# If the line contains the string "#include" then strip off the #include and
any surrounding spaces
# also remove any trailing spaces from the include file name

$0 ~ /#include/ {
 includeFile = $0
 gsub(/^ *#include */,"",includeFile)
 gsub(/ *$/,"",includeFile)
 while ((getline includeFileLine < includeFile) > 0) {
 # do your processing of all lines retirieved from the include file here
 # using includeFileLine as the variable containing the contents of a line
 # of text from the include file for each iteration of this loop
 }
 close(includeFile)

Quote:
}

{
# Any other stuff you need to do goes here

Quote:
}

In the previous submission, I used a + in the regexp of the gsub commands
meaning that I needed to find at least one space. I have replaced those with
a *

Cesar
--
Please remove the UPPERCASE characters from my e-mail address for the real
thing



Sun, 29 Apr 2001 03:00:00 GMT  
 How do I process a data file that includes other files

Quote:

> I have some data files that contain a standard syntax such as #include
> 'otherfile' to
> include those parts on the spots. Of course I can do two passes to
> process the whole
> dataset: first merge the includes parts, and then process the merged
> file. But problem is
> the size of the data files, and the flexibility I'd like my awk script
> to have (one may skip some parts). Doing it in two passes seems a waste
> of time and resources. Does somebody have a better idea to tackle this?

Are you familiar with igawk? Look at the section titled "Note on Including
Libraries" on Ralph Becket's "The Awk Scripting Language" page:

     http://www.cam.sri.com/people/becket/awk/

The official documentation is at this URL:

http://www.tec.ualberta.ca/Documentation/Info/by-chapter/gawk-3.0.3/
gawk_17.html#SEC173

--
Jim Monty

http://www.primenet.com/~monty/
Tempe, Arizona USA



Sun, 29 Apr 2001 03:00:00 GMT  
 How do I process a data file that includes other files

% I have some data files that contain a standard syntax such as #include
% 'otherfile' to
% include those parts on the spots. Of course I can do two passes to
% process the whole

The typical approach to doing this with Unix tools is to have not two
passes, but two programs in a pipeline. Program 1 is something like this:

 BEGIN {
   stage2 = "awk -f script2"
   for (i = 1; i < ARGC; i++) {
      process_file(ARGV[i])
   }

   close(stage2)
 }

 function process_file(fname)
 {
   while (getline < fname) {
     if ($1 == "#include") {
       process_file($2)
     }
     else {
       print | stage2
     }
   }
 }

Where script2 has the commands to do whatever you like, eg

  /noodle/ { print }
  /dumpling/ { print toupper($0) }

The nice thing about this is it makes script2 simple, but the problem is
you don't know what the input file is anywhere in script2, so you end up
peppering the data stream with filenames at odd moments, making script2
complicated again. It can also give you substantial performance gains if
you have a multi-processor machine.

Another approach is to do the same thing, but have process_file do
some actual processing, rather than just printing to a command:

 BEGIN {
   for (i = 1; i < ARGC; i++) {
      process_file(ARGV[i])
   }

   close(stage2)
 }

 function process_file(fname)
 {
   filename = fname          # put the file name in a global
   while (getline < fname) {
     if ($1 == "#include") {
       process_file($2)
     }
     else {
       really_process_file()
     }
   }
 }

 function really_process_file()
 {
    # each of these ifs is exactly the same as a pattern/action
    # in a normal script
    if (/noodle/) {
       print
    }
    if (/dumpling/) {
       print toupper($0)
    }
 }
--

Patrick TJ McPhee
East York  Canada



Mon, 30 Apr 2001 03:00:00 GMT  
 How do I process a data file that includes other files
Thank you all for your replies to my initial question.

I've tested the 2nd approach suggested by Patrick, and it works well. The
suggestion from Cesar may not work in my case because you don't know which part of
the code should go in the while loop from reading the included file, and which part
should stay out, as you don't know where in the #include will appear in the
datafile.

Best Regards,

Yuanchang Qi

Quote:


> % I have some data files that contain a standard syntax such as #include
> % 'otherfile' to
> % include those parts on the spots. Of course I can do two passes to
> % process the whole

> The typical approach to doing this with Unix tools is to have not two
> passes, but two programs in a pipeline. Program 1 is something like this:

>  BEGIN {
>    stage2 = "awk -f script2"
>    for (i = 1; i < ARGC; i++) {
>       process_file(ARGV[i])
>    }

>    close(stage2)
>  }

>  function process_file(fname)
>  {
>    while (getline < fname) {
>      if ($1 == "#include") {
>        process_file($2)
>      }
>      else {
>        print | stage2
>      }
>    }
>  }

> Where script2 has the commands to do whatever you like, eg

>   /noodle/ { print }
>   /dumpling/ { print toupper($0) }

> The nice thing about this is it makes script2 simple, but the problem is
> you don't know what the input file is anywhere in script2, so you end up
> peppering the data stream with filenames at odd moments, making script2
> complicated again. It can also give you substantial performance gains if
> you have a multi-processor machine.

> Another approach is to do the same thing, but have process_file do
> some actual processing, rather than just printing to a command:

>  BEGIN {
>    for (i = 1; i < ARGC; i++) {
>       process_file(ARGV[i])
>    }

>    close(stage2)
>  }

>  function process_file(fname)
>  {
>    filename = fname          # put the file name in a global
>    while (getline < fname) {
>      if ($1 == "#include") {
>        process_file($2)
>      }
>      else {
>        really_process_file()
>      }
>    }
>  }

>  function really_process_file()
>  {
>     # each of these ifs is exactly the same as a pattern/action
>     # in a normal script
>     if (/noodle/) {
>        print
>     }
>     if (/dumpling/) {
>        print toupper($0)
>     }
>  }
> --

> Patrick TJ McPhee
> East York  Canada




Mon, 07 May 2001 03:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Read VFP data files into Clipper data files

2. I have 100 data files, I want to join them together as one data file

3. Unix Data files vs DOS data files

4. non-gridded ASCII data file to netCDF data file

5. Including data file to be exported in DLL

6. include file - header file

7. External Header file xlib.h includes internal header file tkIntXlibDecls.h

8. Processing variable length/variable data files

9. including preprocessor directives in included file

10. File Processing 2 files

11. file processing-rrds-padding the file

12. Multi DLL with Variable File Name, got Invalid Data File(36) error

 

 
Powered by phpBB® Forum Software