retaining a accessing file name 
Author Message
 retaining a accessing file name

As only a sporadic 'awk/sed' user, I'm floundering around a bit with
what is probably an easy problem for seasoned users.  Probably making
this more complicated than necessary.

I want to put only the headers from several directories full of mail
messages onto stndin.  Then scan those headers for a certain string.

Then be able to still access the file numbers that contain that string
and carry out actions on them.  Like 'cp' 'rm' or what ever.

An example of what I'm trying to describe:
Conditions:
1) My mail app (Gnus) changes the unix "From " line to "X-From-Line: "

2) Mail directory is in this format: Mail/sub1 sub2 etc.
   Subdirectories contain numbered files that are single messages.  So,
   sub1/1 2 ...N.

3) I'm using "find" to feed the file numbers to 'sed' but have a hunch I
   could use 'awk' to do both of the remaining jobs.

Problem:

sed '/^X-From-Line: /,/^$/p' `find Mail -type f -name '*[0--9]' -print`\

(That makes only the headers available to grep. Using grep to find the
message headers that have the string I want)

|grep 'by fetchmail'

( So far I've found the messages that contain the string and am sure
its not in the body)

From here I'm lost as to how to get back the file names (path and
number) to be able to carry out further actions.

I'd like to copy the full messages containing that string in the headers to a
separate directory for further processing.

I can think of several ways to do this, all of which are time
consuming and awkward.  Hoping someone can point out how to use 'awks'
FILENAME variable in this situation.  Or other 'awk' techniques.



Wed, 24 Oct 2001 03:00:00 GMT  
 retaining a accessing file name

Quote:

>As only a sporadic 'awk/sed' user, I'm floundering around a bit with
>what is probably an easy problem for seasoned users.  Probably making
>this more complicated than necessary.
>I want to put only the headers from several directories full of mail
>messages onto stndin.  Then scan those headers for a certain string.

Something like:

FILENAME!=name{FOUND=0;name=FILENAME} #clear print message flag on new file
/X-From-line/{HEAD=1}                 #Start of message, set header flag
HEAD&&/^$/{f(FOUND){for(x=0;x in header;x++)print header[x];HEAD=0;delete header}
                         #if at header end, and string found, print header
HEAD&&/stringI'm lookingfor/{FOUND=1}  #set flag that we want this message
HEAD{header[headline++]=$0;next}
FOUND{print}

Run as awk -f filename `find command`

This isn't too slow, I've got something like this, that I pull news
batches through, to check for binary messages. (read the message into
a big array, while checking, and throw it out if more than 70% of the
lines have the same lenght.



Thu, 25 Oct 2001 03:00:00 GMT  
 retaining a accessing file name


% As only a sporadic 'awk/sed' user, I'm floundering around a bit with
% what is probably an easy problem for seasoned users.  Probably making
% this more complicated than necessary.

Probably. Using too many programs, anyway.

[we're looking for a line in the header of a e-mai message that starts
with X-From-Line: and has the text `by fetchmail' in it, then we
want to process the file.]

% Problem:
%
% sed '/^X-From-Line: /,/^$/p' `find Mail -type f -name '*[0--9]' -print`\

I don't understand why you don't deal with `by fetchmail' here, too.

% (That makes only the headers available to grep. Using grep to find the
% message headers that have the string I want)
%
% |grep 'by fetchmail'
%
% ( So far I've found the messages that contain the string and am sure
% its not in the body)
%
% From here I'm lost as to how to get back the file names (path and
% number) to be able to carry out further actions.

And you can't, because it's gone. If you only have one level of sub-directory
under Mail, you don't need to use anything but awk. Your command-line will
be:
 awk -f findheaders Mail/*/*[0-9]

Now you just need to find lines in the header which start with `X-From-Line:'
and contain the text `by fetchmail', ie that match the pattern
 /^X-From-Line: .*by fetchmail/
and deal with them. You can run an external program using the system()
function.

email headers include all the text up to the first blank line,
so we can use a range pattern to restrict the search for the pattern.
Since you have one message per file, the start of the range will be
FNR==1, and the end patter will be NF==0 (meaning there's nothing but
whitespace on the line):
 # only do the test for the e-mail header
 FNR==1,NF==0 {
   # test for the line of interest and process the file
   if (/^X-From-Line: .*by fetchmail/)
      system("cp " FILENAME " /tmp/findheaders.tmp;"\
             " processsomehow /tmp/findheaders.tmp;"\
             " cp /tmp/findheaders.tmp" " FILENAME")
 }

Hope that helps.
--

Patrick TJ McPhee
East York  Canada



Thu, 25 Oct 2001 03:00:00 GMT  
 retaining a accessing file name

Quote:


> >As only a sporadic 'awk/sed' user, I'm floundering around a bit with
> >what is probably an easy problem for seasoned users.  Probably making
> >this more complicated than necessary.

> >I want to put only the headers from several directories full of mail
> >messages onto stndin.  Then scan those headers for a certain string.

> Something like:

> FILENAME!=name{FOUND=0;name=FILENAME} #clear print message flag on
> new file /X-From-line/{HEAD=1} #Start of message, set header flag
> HEAD&&/^$/{f(FOUND){for(x=0;x in header;x++)print
> header[x];HEAD=0;delete header}
>                          #if at header end, and string found, print header
> HEAD&&/stringI'm lookingfor/{FOUND=1}  #set flag that we want this message
> HEAD{header[headline++]=$0;next}
> FOUND{print}

***8< snip

Thanks for your input.   Looks like I need to do some serious study
before really getting much out of your formula.  When I look at this I
kind of just see ancient hebrew.



Thu, 25 Oct 2001 03:00:00 GMT  
 retaining a accessing file name

Quote:


> % As only a sporadic 'awk/sed' user, I'm floundering around a bit with
> % what is probably an easy problem for seasoned users.  Probably making
> % this more complicated than necessary.
> Probably. Using too many programs, anyway.
> [we're looking for a line in the header of a e-mai message that starts
> with X-From-Line: and has the text `by fetchmail' in it, then we
> want to process the file.]

No.  The headers them selves start with "X-From-Line" but the string
"by fetchmail" is found somewhere in the middle of the headers.  Its
actually in a line starting with 'Received: ', but not always in the
same "Received" line.

Quote:

> % Problem:
> %
> % sed '/^X-From-Line: /,/^$/p' `find Mail -type f -name '*[0--9]' -print`\

> I don't understand why you don't deal with `by fetchmail' here, too.

Probably barking up the wrong tree here but since 'by fetchmail' isn't
in that line I'd hoped to put the whole set of headers on stndin and
then pick out the header sets that contain the string "by fetchmail".
To avoid searching the whole message.  In this case the bodies are not
a factor.

I'll be needing the whole set of headers to pick other info out, but
just the ones that have the string 'by fetchmail' .  So I guess you
could say I want to split out all the messages (headers only) that
have the string 'by fetchmail' anywhere in them, for further
processing and still retain the original file number for later
reference.

Quote:
> % (That makes only the headers available to grep. Using grep to find the
> % message headers that have the string I want)
> %
> % |grep 'by fetchmail'
> %
> % ( So far I've found the messages that contain the string and am sure
> % its not in the body)
> %
> % From here I'm lost as to how to get back the file names (path and
> % number) to be able to carry out further actions.

> And you can't, because it's gone. If you only have one level of sub-directory
> under Mail, you don't need to use anything but awk. Your command-line will
> be:

Ah I was afraid of that.  But the all "awk" approach sounds promising.
And yes, it is only one layer of subdirectories.

Quote:
>  awk -f findheaders Mail/*/*[0-9]

> Now you just need to find lines in the header which start with `X-From-Line:'
> and contain the text `by fetchmail', ie that match the pattern
>  /^X-From-Line: .*by fetchmail/

As mentioned earlier the two strings are not in the same line.  In
fact the 'by fetchmail' part turns up in different positions, but
always in a line begining with (^Received: )

A further short explanation:
My general aim in all this was to compile some data about how many
posters use fetchmail and out of that how many get an
"X-Authentication warning" line in their headers (fetchmail does that
but not always).  Then the final processing would be seeing if the
"X-Authentication" lines are coming from only a few posters by picking
out the "From: " lines and sorting them.

I had hoped, that once the full headers are available all the rest
could be picked out one way or another.  But still have reference to
the original file numbers (with out changing the original files).

Quote:

> email headers include all the text up to the first blank line,
> so we can use a range pattern to restrict the search for the pattern.
> Since you have one message per file, the start of the range will be
> FNR==1, and the end patter will be NF==0 (meaning there's nothing but
> whitespace on the line):
>  # only do the test for the e-mail header
>  FNR==1,NF==0 {

Already looking much tiddier than what I was attempting.  I think I
see how to change your setup below to get what I'm after.  One thing
that would help immensely would be if you could pepper the brief
program with some comments that try to explain what is happening at
various points  ( I still get pretty lost looking at the hierogliphics.

Quote:
>    # test for the line of interest and process the file
>    if (/^X-From-Line: .*by fetchmail/)
>       system("cp " FILENAME " /tmp/findheaders.tmp;"\
>         " processsomehow /tmp/findheaders.tmp;"\
>         " cp /tmp/findheaders.tmp" " FILENAME")
>  }

It seems your FNR==1 has relieved the need for finding "^X-From-Line"
  # test for the line of interest and process the file
  if (/by fetchmail/)
     system("cp " FILENAME " /tmp/findheaders.tmp;"\ <==What is
                                                        happening here
           " processsomehow /tmp/findheaders.tmp;"\
           " cp /tmp/findheaders.tmp" " FILENAME")

Quote:
}

The "processsomehow" would be to compare the number that have "by
fetchmail" to the full number of posts, then to the number that have
"by fetchmail" and ^X-Authentication warning".  Then see if the ones
with both are evenly spread or being put out by a few machines.

Quote:
> Hope that helps.

Very much so.  You can't imagine how much a bit of explaining can help
a neophyte.

I hope you'll feel inclined to look at a coming post on related
matters.  More e-mail processing



Thu, 25 Oct 2001 03:00:00 GMT  
 retaining a accessing file name

Quote:


>>As only a sporadic 'awk/sed' user, I'm floundering around a bit with
>>what is probably an easy problem for seasoned users.  Probably making
>>this more complicated than necessary.
>>I want to put only the headers from several directories full of mail
>>messages onto stndin.  Then scan those headers for a certain string.

ARGH, misread the spec.
The below will correctly read from multiple message containing spools, and
print messages on stdout, that contain a string in a header.

Unfortunately, you only want the filenames

This is rather easier

/^$/{nextfile}                #if we hit an empty line, go to next file
/stringIwant/{print FILENAME;nextfile} #If stringIwant occurs, print the
                       #name of the file, and skip to next file

And that's it.
It could be used on the commandline as
awk '/^$/{nextfile}/stringIwant/{print FILENAME;nextfile}' `findcommand`

Original message, with more explanation below.

Quote:
>Something like:

Concept:

Read a header into an array, while checking for the string.
If the string is found, print the header array, then set a flag
to print the body

Ok, more comments

Quote:
>FILENAME!=name{FOUND=0;name=FILENAME} #clear print message flag on new file

If the awk variable FILENAME representing the current filename
does not match name, (we have started a new file) clear the FOUND flag,
so we won't print out the next message.

Quote:
>/X-From-line/{HEAD=1;headline=0}                 #Start of message, set header flag

(bugfix from previous version)

If the current line is the beginning of a message, set the flag HEAD, to
indicate we are reading the header

Quote:
>HEAD&&/^$/{f(FOUND){for(x=0;x in header;x++)print header[x];HEAD=0;delete header}

If the flag HEAD is set, and the current line is empty, then check the
variable FOUND, if it's set, print out the header, reset FLAG to indicate
we are not in the header any more, and delete the printed header.

Quote:
>                         #if at header end, and string found, print header
>HEAD&&/stringI'm lookingfor/{FOUND=1}  #set flag that we want this message

If HEAD flag is set, and the current line contains the string it's looking
for, then FOUND is set to 1, indicating that the header should be printed,
when it gets to the end, and that the body should be printed

Quote:
>HEAD{header[headline++]=$0;next}

If in the header, then add the current line ($0) to the array header, at
the index headline, and skip this line.

Quote:
>FOUND{print}

If FOUND is set, then print the line, this line will never be reached
in a header, due to the last line.

Quote:
>Run as awk -f filename `find command`

Put the above into a file called filename, and run
Quote:
>This isn't too slow, I've got something like this, that I pull news
>batches through, to check for binary messages. (read the message into
>a big array, while checking, and throw it out if more than 70% of the
>lines have the same lenght.



Fri, 26 Oct 2001 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. DOS File Lookup - just the file name, not the path AND file name

2. Accessing file name

3. Access a file when I only know its name

4. Exclusive file access or serial access to external file

5. How can I know the .EXE file name, file date, file size

6. mkdir same name as file name

7. file name vs. real name

8. How do I access a file's last accessed attribute

9. Accessing MS Access Files from clarion

10. Two threads accessing the same file, different access modes

11. Accessing MS ACCESS Files how to

12. Accessing Clipper Files from Access

 

 
Powered by phpBB® Forum Software