newbie would like to break input file and output to separate files 
Author Message
 newbie would like to break input file and output to separate files

Hi,

I've been struggling with gawk for a couple of weeks now and wouldn't mind some
help from those with more experience than me (which is just about everyone on
this list, I'd imagine).

I've got an input file in the following format:

Query= xxxxx

Query:      abcdefg
Sbjct:      gfedcba

Query:      hijklmn
Sbjct:      nmlkjih

Query= yyyyy

Query:      defghij
Sbjct:      jihgfed

etc, etc

I'd like to break this file (which has over 450 'Query=' blocks) into
individual files, so that each file starts from 'Query=' until the blank line
before the the next 'Query='. I came across an example that breaks large files
into smaller files, but that example sends specific numbers of lines from the
input file to the output files. This wouldn't work with my input file as the
number of 'Query:/Sbjct:' pairs of lines is not equal in each Query= block (ie
the number of pairs ranges from 3 to 5). I'd also like to name each output file
with the second field of the 'Query=' record, but that's not a major issue as I
can do it with other software.

If someone has a solution to my problem I would be very grateful.

Cheers,

Giovanni

--
Posted from proxy1.questnet.net.au [203.22.86.226]
via Mailgate.ORG Server - http://www.*-*-*.com/



Fri, 16 Apr 2004 07:48:04 GMT  
 newbie would like to break input file and output to separate files

Quote:

> I've been struggling with gawk for a couple of weeks now and
> wouldn't mind some help from those with more experience than me
> (which is just about everyone on this list, I'd imagine).

> I've got an input file in the following format:

> Query= xxxxx

> Query:      abcdefg
> Sbjct:      gfedcba

> Query:      hijklmn
> Sbjct:      nmlkjih

> Query= yyyyy

> Query:      defghij
> Sbjct:      jihgfed

> etc, etc

> I'd like to break this file (which has over 450 'Query=' blocks)
> into individual files, so that each file starts from 'Query='
> until the blank line before the the next 'Query='. I came across
> an example that breaks large files into smaller files, but that
> example sends specific numbers of lines from the input file to
> the output files. This wouldn't work with my input file as the
> number of 'Query:/Sbjct:' pairs of lines is not equal in each
> Query= block (ie the number of pairs ranges from 3 to 5). I'd
> also like to name each output file with the second field of the
> 'Query=' record, but that's not a major issue as I can do it with
> other software.

Think of the problem this way: Every time you encounter an input
line of the form 'Query= foo', you want to alter where the output
goes. In other words, you want to toggle the output file name to
'foo' at each occurrence of a 'Query= foo' line. So start with this
rudimentary script

    { print >file }

then just add the bit that toggles the value of the variable named
'file' at the appropriate places, setting it to the appropriate
file name.

    /^Query=/ { file = $2 ".sql" }
              { print >file      }

Presumably, the very first line of input is a 'Query=' line, so
the variable 'file' will be set prior to the first execution of
the print statement. But to be safe, initialize the variable to
some default value in a BEGIN rule.

    BEGIN     { file = "default.sql" }
    /^Query=/ { file = $2 ".sql"     }
              { print >file          }

Any solution to this simple problem that uses more than one print
statement is lame. Don't use it.

--
Jim Monty

Tempe, Arizona USA



Fri, 16 Apr 2004 12:39:01 GMT  
 newbie would like to break input file and output to separate files

Quote:


> > I'd like to break this file (which has over 450 'Query=' blocks)
> > into individual files, so that each file starts from 'Query='
> > until the blank line before the the next 'Query='.

> [S]tart with this rudimentary script

>     { print >file }

> then just add the bit that toggles the value of the variable named
> 'file' at the appropriate places, setting it to the appropriate
> file name.

>     /^Query=/ { file = $2 ".sql" }
>               { print >file      }

Oh, and since you're going to be opening over 450 output files,
you'll likely have to close them, too.

      /^Query=/ { close(file); file = $2 ".sql" }
                { print >file                   }

The decision whether to overwrite

    { print >file }

or append to

    { print >>file }

the output file depends on your input and your requirements.

--
Jim Monty

Tempe, Arizona USA



Fri, 16 Apr 2004 12:47:40 GMT  
 newbie would like to break input file and output to separate files

Quote:

> I've got an input file in the following format:
> .....................................
> etc, etc

> I'd like to break this file (which has over 450 'Query=' blocks) into
> individual files, so that each file starts from 'Query=' until the blank line
> before the the next 'Query='.
> I'd also like to name each output file
> with the second field of the 'Query=' record

$1 == "Query=" {
        if ( not_1st ) close( out_file )
        outfile = $2
        not_1st = 1

Quote:
}

not_1st { print > outfile }

Jurgen



Fri, 16 Apr 2004 16:04:12 GMT  
 newbie would like to break input file and output to separate files
Thanks to both of you who answered my plea for help. After struggling for days,
I've finally done this 'simple' manoeuvre!  ... and can finally get on
with getting the rest of my job done. Once again, I am very grateful
for the help.

Cheers,
Giovanni

--
Posted from proxy2.questnet.net.au [203.22.86.227]
via Mailgate.ORG Server - http://www.Mailgate.ORG



Sat, 17 Apr 2004 07:08:08 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Reading from input file writing to output file

2. sed: input file = output file

3. comparing an input file with an output file

4. Mutiple output files single Input file

5. Single file input ==> multi file output

6. with-input-from-file, with-output-to-file

7. with-input-from-file, with-output-to-file

8. Separate Verilog files or one big file?

9. TEMPORARY FILE during input and output??

10. open-{input,output}-file

11. Text file input/output

12. reading input files and then output

 

 
Powered by phpBB® Forum Software