e-mail processing awk/sed 
Author Message
 e-mail processing awk/sed

Summary:
Strip a set of headers off e-mail that is put on over the original
headers by a mailing-list  archive machine.  Putting the original
message back to its original unix format.

The messages have this form and are one message to a file:

New headers begin with a line:
X-From-Line:X-From-Line:
and continue to a blank line
[blank line .... then]
File: <path>/<file number>
BEGIN------------cut here-------------

The original message then begins as shown below .. no blank line after
"BEGIN"
[followed by another blank line at the end of these original headers]

So,
X-From-Line:X-From-Line: hhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhh
more headers
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
etc

File: <path>/<file number>
BEGIN------------cut here-------------
hhhhhhhhhhhhhhhhh
more headers
hhhhhhhhhhhhhh
etc

body text
ttttttttttttttt
ttttttttttttttt

END--------------cut here-------------
[additional blank line]

This awk commad does most of it:
awk '/^>From /,/^\d/'  ~/<path>/<file-number>  

But leaves the (>) andclosing:
END--------------cut here-------------

in place.  So attempting to clean up with sed:

sed  '/^END.*cut here.*$/d'|awk '/^>From /,/^\d/' /path/<file-number>

leaves the 'End----cut here ...' in place (deletes nothing)

adding a second (blank) address causes sed to make the deletion..but..

sed  '/^END.*cut here.*$/,//d' /<path>/<file-num>

When combined as below, I'm back to square one.  Apparently sed has
deleted my (^\d) marker so the file is printed to stndout with no
changes

sed '/^END.*cut here.*$/,//d'|awk '/^X-From-Line/,/^\d/' /<path>/<file-num>

Hope I haven't made this too confusing. What is the 'awk' way to do
this job?



Thu, 25 Oct 2001 03:00:00 GMT  
 e-mail processing awk/sed
If I understood correctly, what you mean is a script that will pick
out from a file the lines between the delimeters:

BEGIN------------cut here-------------

and

END--------------cut here-------------

(not including the delimeters themselves). I guess you need something
similar to this:

/^BEGIN------------cut here-------------$/ {
  getline;
  while( ! /^END--------------cut here-------------$/ ) {
    print;
    getline;
  }

Quote:
}

It should work with multiple messages in a file, but won't do embedded
messages.

Stasinos

Quote:

> Summary:
> Strip a set of headers off e-mail that is put on over the original
> headers by a mailing-list  archive machine.  Putting the original
> message back to its original unix format.



Fri, 26 Oct 2001 03:00:00 GMT  
 e-mail processing awk/sed

Quote:
>Strip a set of headers off e-mail that is put on over the original
>headers by a mailing-list  archive machine.  Putting the original
>message back to its original unix format.

Assuming your "end" flag record can't occur in the text, this will do
it:

  $0 ~ /^END-+cut here-+$/ { do_copy = 0 }
  do_copy { print }
  $0 ~ /^BEGIN-+cut here-+$/ { do_copy = 1 }

Despite the somewhat counter-intuitive sequence of the statements,  it
really has to be that way,  to avoid copying the flag records.

Essentially,  the above code implements a state machine with only two
possible states:  "copy" and "don't".  If it's possible for an "end"
record to appear in the message body,  you can add some code to its
action that buffers the input and peeks ahead with getline to see
whether the "end" record is real.

Ran



Fri, 26 Oct 2001 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. HELP with awk and sed to parse email

2. log file processing - awk or sed

3. awk process in awk ??

4. Newbie awk (sed??) question, regular expressions

5. Awk/Sed Filehandler question

6. sed, awk, perl

7. A very simple question on SED or AWK for a GURU, and an enjoyable problem

8. SED to AWK...???

9. Extracting hyphenated words using sed/awk

10. Need help with sed or awk !!

11. Running sed/awk on SQL results

12. Can it be solved with awk/sed??

 

 
Powered by phpBB® Forum Software