Trial and error/mainly error [was]Handy way.. 
Author Message
 Trial and error/mainly error [was]Handy way..

This is a continuation of the thread under "Subject: Handy way ...".

Sorry that my output was so sloppy and fraught with typos, careless
pastes, near duplicate posts and etc.

I've gotten some sleep and had a new look at the problem.  I hope to
have some criticism here if some of you will be patient enough to read
through and comment.

My aim was to find by RE, certain strings contained in mail messages
stored in `MH' (one file per message) style.  At first I thought I
needed to invoke printing by recognizing when the end of a file had
occured, but not sure now if that is required.

(Thanks Jim M., for your examples they will be handy. And Henry C. for
the example function call)

Finding strings that are expected to be in mail headers always
presents the problem of the case where a poster using sloppy or no
quoting technique, sends a message that contains reprinted Header
lines in the body of the  message, thus giving false hits.

In groups where mail headers may be a topic of discussion that
occurrence could be failry often.

The empty space between Headers and body is usually a good place to
force the hits to be printed, guaranteeing the hits are only in the
headers.

In this case some of the REs  are expected to appear in the body
making it harder to determine when to print.  Or so it seems to a
neophite.

The technique I've hit on is using the `increment' operator to
guarantee that what gets printed is the first seen instance of the
body RE.  Else I get  printing at every line containing the body RE.

`/X-From-Line: /' Is on the first line of every message

/X-From-Line: / {a = fr = xr = mg =0}
/^From:.*Putnam/ {from = $0 ; fr = 1}
/^Xref:.*ding2/ {xref = $0 ; xr = 1}
/^Message-ID: / {msg = $0 ; mg = 1}
/\.fmt/ {fm = $0 ; a++}
/\.fmt/ && 4==(a+fr+xr+mg){print FILENAME"\n" from"\n"xref"\n"msg"\n"fm"\n--"}

This *does not*  work.  Even in a small trial sample of messages,
several are reported as having `/^From:.*Putnam/' that in fact do not
contain that string at all, indicating the technique is badly flawed.
Although it does find some accurate information.

Even if it had worked there would be a couple major drawbacks: The
final RE which is most likely to be in the body may occur several
times in a single message and I'd like to print each occurrence along
with the other out put.

The script does nothing to protect against, header like lines that
fulfill one of the RE being contained in a message body.

I'd prefer that all header type REs get printed only if found in
headers, and all body type REs get printed as long as the header REs
are satisfied..

Probably calls for  array operators that hold the occurrences and is
invoked by  an end of file `idiom' like  Jim offered..  But  a little
beyond my trial and error technique.



Tue, 07 Jan 2003 03:00:00 GMT  
 Trial and error/mainly error [was]Handy way..


Quote:

>Finding strings that are expected to be in mail headers always
>presents the problem of the case where a poster using sloppy or no
>quoting technique, sends a message that contains reprinted Header
>lines in the body of the  message, thus giving false hits.

>In groups where mail headers may be a topic of discussion that
>occurrence could be failry often.

>The empty space between Headers and body is usually a good place to
>force the hits to be printed, guaranteeing the hits are only in the
>headers.

>In this case some of the REs  are expected to appear in the body
>making it harder to determine when to print.  Or so it seems to a
>neophite.

>The technique I've hit on is using the `increment' operator to
>guarantee that what gets printed is the first seen instance of the
>body RE.  Else I get  printing at every line containing the body RE.

>`/X-From-Line: /' Is on the first line of every message

>/X-From-Line: / {a = fr = xr = mg =0}
>/^From:.*Putnam/ {from = $0 ; fr = 1}
>/^Xref:.*ding2/ {xref = $0 ; xr = 1}
>/^Message-ID: / {msg = $0 ; mg = 1}
>/\.fmt/ {fm = $0 ; a++}
>/\.fmt/ && 4==(a+fr+xr+mg){print FILENAME"\n" from"\n"xref"\n"msg"\n"fm"\n--"}

>This *does not*  work.  Even in a small trial sample of messages,
>several are reported as having `/^From:.*Putnam/' that in fact do not
>contain that string at all, indicating the technique is badly flawed.
>Although it does find some accurate information.

>Even if it had worked there would be a couple major drawbacks: The
>final RE which is most likely to be in the body may occur several
>times in a single message and I'd like to print each occurrence along
>with the other out put.

>The script does nothing to protect against, header like lines that
>fulfill one of the RE being contained in a message body.

>I'd prefer that all header type REs get printed only if found in
>headers, and all body type REs get printed as long as the header REs
>are satisfied..

>Probably calls for  array operators that hold the occurrences and is
>invoked by  an end of file `idiom' like  Jim offered..  But  a little
>beyond my trial and error technique.

much of your problem probably stems from (a+fr+xr+mg) becoming 4 as
a becomes larger than 1 without every other term being 1

Try this instead:

/X-From-Line: / {a = fr = xr = mg =0}
/^From:.*Putnam/ {from = $0 ; fr = 1}
/^Xref:.*ding2/ {xref = $0 ; xr = 1}
/^Message-ID: / {msg = $0 ; mg = 1}
/\.fmt/ {fm = $0 ; a++}
/\.fmt/ && (fr+xr+mg)==3 && a==1{print FILENAME;
                                 print from; print xref; print msg}
/\.fmt/ && (fr+xr+mg)==3 && a>0 {print fm}

This checks that all three header lines are there and a fm
line is found before printing the header.

Then as long as each header set has been found, each fm line
is printed.

Chuck Demas
Needham, Ma.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Tue, 07 Jan 2003 03:00:00 GMT  
 Trial and error/mainly error [was]Handy way..

[...]

Quote:
> >/X-From-Line: / {a = fr = xr = mg =0}
> >/^From:.*Putnam/ {from = $0 ; fr = 1}
> >/^Xref:.*ding2/ {xref = $0 ; xr = 1}
> >/^Message-ID: / {msg = $0 ; mg = 1}
> >/\.fmt/ {fm = $0 ; a++}
> >/\.fmt/ && 4==(a+fr+xr+mg){print FILENAME"\n" from"\n"xref"\n"msg"\n"fm"\n--"}

[...]

Quote:
> Try this instead:

> /X-From-Line: / {a = fr = xr = mg =0}
> /^From:.*Putnam/ {from = $0 ; fr = 1}
> /^Xref:.*ding2/ {xref = $0 ; xr = 1}
> /^Message-ID: / {msg = $0 ; mg = 1}
> /\.fmt/ {fm = $0 ; a++}
> /\.fmt/ && (fr+xr+mg)==3 && a==1{print FILENAME;
>                             print from; print xref; print msg}
> /\.fmt/ && (fr+xr+mg)==3 && a>0 {print fm}

> This checks that all three header lines are there and a fm
> line is found before printing the header.

> Then as long as each header set has been found, each fm line
> is printed.

Yes... nice!  That staightened things out quite a bit.  In fact I
think it leaves only the problem  of guaranteeing the Header RE aren't
being hit in the body.

As a test I cooked one of the input files and gave it a second
Message-ID in the body at the beginning of a line.  The script as it
stands prints the second one found.

So in an attempt to fix that I'm trying to doctor up the script at the
message id part to see what works and make a few changes for
legibility in the output:

/X-From-Line: / {a = b = fr = xr = mg =0}
/^From:.*Putnam/ {from =$0 ; fr = 1}
/^Xref:.*ding2/ {xref = $0 ; xr = 1}
/^Message-ID: / {b++ ;if (b==1)msg=$0 ;mg=1}
/\.fmt/ {fm = $0 ; a++}
/\.fmt/ && (fr+xr+mg)==3 && a==1{print "--\n"FILENAME;
print gsub(/^\/.*\//,"",FILENAME) FILENAME,from; print FILENAME,xref;
 print FILENAME, msg}
/\.fmt/ && (fr+xr+mg)==3 && a>0 {print FILENAME, fm}

Putting the `if' clause at the Message-ID part seems to work for the
probem mentioned.  Gsubbing `FILENAME' in all but the first print helps
the look too.  Do you see anything here that is likely to bite me.  If
not I'll use the same technique on the othere regexp.

There is a small mishap in the script above:
(picking one with only one instance of /^\.fmt/ for brevity)

--
/home/reader/tmp2/2806

2806 Xref: reader.ptw.com ding2:2806

2806 2) Using the example *.fmt file
--

Notice an extraneous `1' in the file name at `From'.  At first I
thought maybe since the variables `fr' and `from' share the same first
2 letters, it might be causing the `1' but changing the `fr' to `fo'
didn't stop it.

The first, third, fourth and fifth are reported correctly

Pulling out the gsub:
 gsub(/^\/.*\//,"",FILENAME)

Stops it but I can't see why.



Tue, 07 Jan 2003 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. I have seen the error of my ways!

2. Error BASE/2012 Create error: AC01.DBF (DOS Error 32)

3. Error DBFCDX/1011 Write Error DOS ERROR 6

4. Serial Error 0x4002 (Error 16386, character was lost by overwrite / serial port overrun error)

5. Handy Tools Page Announces Handy Tools for Clarion Windows

6. Help I am having an Internal error:tpsbt.cpp line 1477

7. Why am I getting bind errors?

8. I am getting THISTHREADACTIVE error (lots of them)

9. ERROR 48 - Why am I getting it?

10. why am i getting processor stack fault error?

11. I am running a vi that crashes after a period of time with no error message

12. i am gettign objheap.cpp error

 

 
Powered by phpBB® Forum Software