newbe: copying file to part_of_patternfilename 
Author Message
 newbe: copying file to part_of_patternfilename

I have several thousand web generated files containing articles;
these have totally insignificant names like LIAJHGOQENVLQN .
Each file contains in several points the string:
<EM>Written in NameofPaper Edition day month year <\EM>
which is different for every file.

How do I get awk to read all files, one at a time, and copy each to
a new file called 'Edition day month year'.html ??

I'm at my first steps in gawk and so far have just managed to find
how to substitute strings one at a time.

Thanks for tips !

--
Tae Kyon
key ID 5F8EC5D2 on PGP public key server



Wed, 19 Feb 2003 13:26:23 GMT  
 newbe: copying file to part_of_patternfilename


Quote:
>I have several thousand web generated files containing articles;
>these have totally insignificant names like LIAJHGOQENVLQN .
>Each file contains in several points the string:
><EM>Written in NameofPaper Edition day month year <\EM>
>which is different for every file.

>How do I get awk to read all files, one at a time, and copy each to
>a new file called 'Edition day month year'.html ??

>I'm at my first steps in gawk and so far have just managed to find
>how to substitute strings one at a time.

>Thanks for tips !

If I were you I would reverse the parts of the new file name to put the
most significant part first, year month day edition.html. This means
that the files will list in chronological order if you do an ls command.
I would also separate the parts of the file name with "_" instead of " "
because I dislike spaces in file names.

I set up 3 files of test data :-

sh-2.03$ cat news1.dat
blah blah
<EM>Written in Wichita Whisper 1 5 07 1997 <\EM>
... another tornado ...
blah blah
<EM>Written in Wichita Whisper 1 5 07 1997 <\EM>
... another tornado ...

sh-2.03$ cat news2.dat
blah blah
<EM>Written in Cricklewood Chronicle 2 29 7 1998 <\EM>
... shopping trolley horror ...

sh-2.03$ cat news3.dat
blah blah
<EM>Written in Turin Weekly News 1 14 08 2000 <\EM>
... shroud for sale on ebay ...

sh-2.03$

I wrote this awk program (in file news.awk) :-

#!gawk -f
#rename files based on content

gotem && FNR!=1 {next} # fast forward to next file

{gotem=0}

/<EM>Written in/ {
  i=NF
  while ($i!~/<\\EM>/  && i>1) i--
  if (i==1) next # <\EM> not found
  newname=sprintf("%d_%02d_%02d_%s.html", $(i-1),$(i-2),$(i-3),$(i-4) )
  print "cp " FILENAME " " newname
  gotem=1

Quote:
}

I ran the awk program and got this output:-

sh-2.03$ ./news.awk news*.dat
cp news1.dat 1997_07_05_1.html
cp news2.dat 1998_07_29_2.html
cp news3.dat 2000_08_14_1.html
sh-2.03$

If you redirect this output to a file you can use that file as a shell
script. I would suggest you check it before running it. If you find it
works well you could change the cp to mv.

You might want to change the program to convert 2-digit years to 4-digit
years by adding either 1900 or 2000.

Hope this helps
--
Alan Linton



Thu, 20 Feb 2003 00:48:42 GMT  
 newbe: copying file to part_of_patternfilename

With minor adjustments your program is now doing exactly what I wanted.
More important, your example and bmarcum's ( which also works although it
produces a cp output for every significant string in each file) have helped
me understand a lot of points of awk language better than hours of poring
over the > 300 page manual
... and given me a big incentive to learn.

Thanks again !

--
Tae Kyon



Sat, 22 Feb 2003 21:05:15 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. CW2.003 File copy changes the date.....need a way to copy w/o date change

2. BUG: file copy copies links as links

3. Newbe problem - read a file

4. copying files using as a different file name

5. Setting file attributes and Copying files

6. H E L P COPY(file,new file)

7. CW2.003 Copy file changes file date

8. Copy a *.txt file to another file

9. File (original) To File Copy (storage)

10. VMS LIB$ copying file a file in FORTRAN

11. Copying files to file server

12. To copy or not to copy

 

 
Powered by phpBB® Forum Software