newscan 1.66 - a Perl Network News Article scanner (requires NNTP) 
Author Message
 newscan 1.66 - a Perl Network News Article scanner (requires NNTP)

Archive-Name: newscan

-----------------------------CUT HERE-----------------------------------
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#               "End of shell archive."
# Contents:  README INSTALL newscan cdrom.cfg pex.cfg extest.cfg
#   distrib.txt

PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'README' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(2058 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
X               newscan - a Perl Network News scanner
X                         by John F. McGowan, Ph.D.

X
X***********************************************************************
X
XCOPYRIGHT NOTICE:
X
X       This note is Copyright (C) 1993, 1994 by John F. McGowan.
XPermission to reproduce and distribute is granted however.  The README
Xnote may be added to at the end after END OF ORIGINAL README.
X
X
XDescription:
X
X       newscan is an attempt to solve the information overload problem
Xin the Network News groups by scanning news groups for articles that contain
Xmatches to regular expressions.  newscan can also veto groups that contain
Xmatches to regular expressions.  
X
X       newscan is written in Larry Wall's Perl (Practical Extraction and
XReport Language).  Perl is described in Programming Perl by Larry Wall and
XRandal L. Schwartz.  It is available for virtually all Unix systems.  It is
Xa public domain software package.  
X
X       newscan contains Perl comments, a short help message generated by
Xentering % newscan -h, and an embedded manpage following the convention
Xdescribed in Programming Perl.
X
XAUTHOR:
X-------
X
X       newscan's author is John McGowan who can be reached at either

Xbug reports to the author.
X
X
XSYSTEMS:
X---------
XShould work on Unix systems.
X
X
XDEPENDENCIES:
X--------------
XNeeds Perl to run.
X
X
XPACKING LIST:
X---------------
XREADME                   - this README file
XINSTALL                  - Installation instructions for newscan
Xnewscan                  - Perl shell script
Xcdrom.cfg                - a sample configuration file
Xpex.cfg                  - a sample configuration file
Xextest.cfg              - a sample configuration file with wildcard * expansion
Xdistrib.txt              - how and where to post newscan to Net
X
X
XSTANDARD DISCLAIMER:
X
X       newscan is distributed as is.  There is no warranty express or
Ximplied that it will work correctly, do what you want, or anything else.
XUse at your own risk.
X
X
X------------------------>END OF ORIGINAL README<-----------------------
END_OF_FILE
if test 2058 -ne `wc -c <'README'`; then
    echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
if test -f 'INSTALL' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'INSTALL'\"
else
echo shar: Extracting \"'INSTALL'\" \(1444 characters\)
sed "s/^X//" >'INSTALL' <<'END_OF_FILE'
X                Installation Instruction for NEWSCAN
X                   A Network News Article Scanner
X                     by John F. McGowan, Ph.D.
X
X----------------------------------------------------------------------------
X
X1.  You will need Larry Wall's Perl on your system.
X
X2.  By default, the first line of the newscan script:
X
X       #!/usr/local/bin/perl
X
X       assumes perl is located in /usr/local/bin directory.
X
X       This is true on the author's system, but may not be true on your
Xsystem.  You may need to edit the path to perl.  For example,
X
X       #!/usr/bin/perl
X
X       is a common path for perl.
X
X       To find perl on your system, use the Unix which command:
X
X       % which perl
X       /usr/local/bin/perl
X
X       which will return path to perl.
X
X
X3.     Man Page  (Unix On Line Help)
X
X       The newscan Perl script doubles as a manpage.  The script is
X       written in such a way that the Perl source acts as an nroff
X       comment.  Likewise, perl ignores the nroff manpage embedded in
X       the script.  To install the manpage:
X
X       % cp newscan newscan.1
X       % mv newscan.1 /usr/man/man1    # for example
X
X
X4.     Machine or Unix Flavor Dependencies
X
X       Need sys/socket.ph Perl header file in sys subdirectory of
X       Perl library directory to layer out differences in socket calls
X       between BSD and SVR4 flavors of Unix.  If Perl installed ok at
X       your site, this is not a problem.  newscan will complain if it
X       cannot find sys/socket.ph!  Then complain to your Perl guy.
X
X---------------------END OF FILE------------------------------------------
END_OF_FILE
if test 1444 -ne `wc -c <'INSTALL'`; then
    echo shar: \"'INSTALL'\" unpacked with wrong size!
fi
# end of 'INSTALL'
fi
if test -f 'newscan' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'newscan'\"
else
echo shar: Extracting \"'newscan'\" \(65210 characters\)
sed "s/^X//" >'newscan' <<'END_OF_FILE'
X#!/usr/local/bin/perl
X'di';
X'ig00';
X#       $Header: /u/ey/jfm/hunter/NEWSCAN/RCS/newscan,v 1.66 1994/11/21 23:40:33 jfm Exp jfm $
X#
X#      Name: newscan
X#      Date: $Date: 1994/11/21 23:40:33 $
X#      Version: $Revision: 1.66 $
X#      Author: John F. McGowan, Ph.D. ($Author: jfm $)

X#
X#      Description:
X#
X#              newscan is a utility to scan netnews for articles that
X#       contain matches to regular expressions.  
X#      newscan can also exclude articles that contain
X#       matches to selected regular expressions.  newscan is writen
X#       in Perl (Practical Extraction and Report Language).  Perl is
X#      available for essentially all Unix systems, as well as some
X#       non-Unix systems.  A good source of information on Perl
X#       is "Programming Perl" by Larry Wall and Randal L. Schwartz.
X#      
X#      Articles that contain matches are stored in a file in mailbox format.
X#      This file may be read using mail readers such as mail, elm, etc.
X#
X#              Search is controlled by a resource file .newscanrc (by
X#       default) in the current directory.  the default file may overridden
X#      through the environment variable NEWSCAN, e.g. setenv NEWSCAN /me/.myrc.
X#              newscan is intended to be run as a background job, e.g. newscan &
X#      since it takes a while to scan selected newsgroups for articles of
X#      interest.
X#
X#      Revision Record:
X#      $Log: newscan,v $
X# Revision 1.66  1994/11/21  23:40:33  jfm
X# 1. Add pico to editor list
X#
X# Revision 1.65  1994/11/17  00:55:55  jfm
X# 1. Add code to convert "From " to ">From " in body of message
X# so mail readers will handle messages correctly.
X#
X# Revision 1.64  1994/11/16  21:38:33  jfm
X# 1. Tweek to manpage
X# 2. # of articles found is printed in termination message.
X# 3. prints F when it finds an article with a match.  To make the
X# in progress messaging more intuitive and less boring.  -s will
X# turn off the F (for found)
X#
X# Revision 1.63  1994/11/14  03:02:00  jfm
X# 1. Add discussion of Perl regular expressions and examples to embedded
X# newscan manpage (on-line help)
X#
X# Revision 1.62  1994/11/11  21:11:34  jfm
X# 1. More additions to handling failed connection to NNTP server
X#
X# Revision 1.61  1994/11/11  20:55:08  jfm
X# 1. newscan now dies with error message if NNTP server refused connection
X# (fails to return 200 or 201 status codes )
X#
X# Revision 1.60  1994/11/11  20:17:21  jfm
X# 1. Fix code to correctly use local machine as default NNTP Server.
X#
X# Revision 1.59  1994/11/09  06:51:13  jfm
X# 1.  Add code to handle error code 420 from NNTP NEXT command.  Treat this
X# as a non-fatal error.
X#
X# Revision 1.58  1994/11/03  22:19:44  jfm
X# 1. Add some more undefs in effort to fix (move?) coredump problem on
X# AIX (does not seem to happen on SunOS)
X# 2. There is a known, apparently unsolved, coredump bug in perl 4.035 (mayber
X# 4.036 too).
X#
X# Revision 1.57  1994/11/03  00:25:04  jfm
X# 1. require 'sys/socket.ph' to layer out machine dependencies in
X# socket calls.  Replace hard-wiring $AF_INET and $SOCK_STREAM to
X# BSD values.
X#
X# Revision 1.56  1994/11/03  00:17:21  jfm
X# 1. Attempted fix of AIX 3.2 memory faults.
X# 2. newscan works on SunOS 4.1.3, but having problems on AIX
X#
X# Revision 1.55  1994/11/02  18:45:15  jfm
X# 1. Add to embedded manpage.
X#
X# Revision 1.54  1994/11/01  23:42:18  jfm
X# 1. Changed to handle ranges with asterisk newsgroups ok (I think)
X#
X# Revision 1.53  1994/11/01  23:08:16  jfm
X# 1. Added code to expand newsgroup names contaiining asterisk
X# to match valid newsgroups.
X# 2. Does not exclude ranges correctly.
X#
X# Revision 1.52  1994/11/01  01:41:31  jfm
X# 1. Fix bug where newscan aborts on an empty newsgroup (0 articles)
X#
X# Revision 1.51  1994/10/29  17:53:34  jfm
X# 1. Comment NNTPSERVER addition better.
X#
X# Revision 1.50  1994/10/29  17:50:06  jfm
X# 1. Add NNTPSERVER environment variable to specify nntp server
X# 2. Switch to Search Completed! exit message
X#
X# Revision 1.49  1994/10/28  17:12:51  jfm
X# 1. Allow hyphen in NNTP server address
X#
X# Revision 1.48  1994/10/27  20:22:47  jfm
X# 1. Fix so that -s option suppresses all in progress messages.
X#
X# Revision 1.47  1994/10/27  19:25:24  jfm
X# 1. Separate exit message from file output reporting
X# 2. Try to make in progress messages a little clearer
X#
X# Revision 1.46  1994/10/27  19:17:56  jfm
X# 1. Now reports name of file that statistics are saved in if statistics
X# are collected in termination message.
X# 2. Modified file spec regular expression in COLLECT STATISTICS line parsing
X# to support statistics file names with embedded hyphens and + characters.
X#
X# Revision 1.45  1994/10/26  23:19:30  jfm
X# 1. Add author e-mail address to header
X# 2. Add name of file with found articles to exit message
X#
X# Revision 1.44  1994/10/26  23:09:42  jfm
X# 1.  Added in progress messages to make program more user-friendly when
X# run as foreground job at command line.  Prints G when starting to search
X# a new newsgroup and period . for each article searched.  Prints a quit
X# message as well indicating that search is finished.
X#
X# Revision 1.43  1994/10/26  21:32:37  jfm
X# 1. Add initial printout message identifying program
X#
X# Revision 1.42  1994/10/26  21:22:05  jfm
X# Add revision number to help message.
X#
X# Revision 1.41  1993/10/26  04:33:43  jfm
X# 1.  Added \+ to regular expression for newsgroup in code to parse
X# excluded range line in configuration file.  Again to fix bug found
X# by Wolfgang Glunz.
X#
X# Revision 1.40  1993/10/21  22:01:05  jfm
X# 1.  Fixed bug in updating excluded range of articles.  I used
X# s/$group.../$group:$rage{$group/ but Perl interprets the first
X# block that is replaced as a regular expression so the group
X# comp.lang.c++ is a regular expression and ++ is a recursive use of
X# + which is not a valid regular expression.  So now I backslash
X# non-word characters in $group to quote . and +.  The bug was
X# reported by Wolfgang Glunz who tried to use newscan on
X# comp.lang.c++.
X#
X# Revision 1.39  1993/10/20  21:08:01  jfm
X# 1.  Use last; in GetDate subroutine to quite looping over lines when
X# if finds the first Date: line in the header.  This is an attempt to
X# solve bug reported by J.M. (Mike) Lake.
X#
X# Revision 1.38  1993/10/20  20:39:15  jfm
X# 1.  Add citation to "Programming Perl" by Larry Wall and Randal L. Schwartz.
X#
X# Revision 1.37  1993/10/20  20:32:31  jfm
X# 1.  Added \+ to allowed characters in newsgroup regular expression, for
X# newsgroups such as comp.lang.c++.  this bug was found by Wolfgang Glunz.
X# 2.  Added \+ \- to allowed characters for the mailbox format folder file.
X# 3.  newscan now checks if the mailbox format file specified in
X# newscan -r <folder-file-spec> exists before invoking the mail reader.
X#
X# Revision 1.36  1993/10/20  02:04:29  jfm
X# 1.  Modify code to handle newsgroups with hyphen.  Perl \w is
X# [0-9a-z_A-Z] ... \w --> [\w\-] to solve.
X#
X# Revision 1.35  1993/10/19  04:47:20  jfm
X# Delete SEE ALSO from manpage for submission to alt.sources
X#
X# Revision 1.34  1993/09/05  21:19:55  jfm
X# 1. Slightly improve the embedded manpage to include all options in
X# the synopsis and a section on options.  Needs work on formatting.
X#
X# Revision 1.33  1993/07/20  00:53:53  jfm
X# 1.  Improve template configuration file.
X#
X# Revision 1.32  1993/07/19  23:54:34  jfm
X# 1.  small improvements in handling of leading ~ in file-specifications.  Now only leading ~ is translated to home directory.  Other ~ are left alone.
X#
X# Revision 1.31  1993/07/18  21:34:07  jfm
X# 1.  Removed -d debug option
X# 2.  Support for -e edit configuraiton file flag
X# 3.  Support for -r <file-specification> read mailbox format file using
X#     a standard mail reader.  Will try to use elm first.
X# 4.  Has support for different search criteria for different newsgroups.
X#
X# Revision 1.30  1993/07/17  20:46:02  jfm
X# 1. Add support for newscan -e to edit configuration file before execute search.
X# 2.  Add support for newscan -r to invoke mail reader to read the file containing the found articles (probably not done yet).
X#
X# Revision 1.29  1993/07/17  19:14:07  jfm
X# 1.  Revised comments.  Now support different search criterion for different groups.
X# 2.  Now supports REQUIRE /regexp/i which finds only articles containing
X# a match to /regexp/i.
X#
X# Revision 1.28  1993/07/16  19:49:35  jfm
X# 1.  Attempt to add support for different search criterion for different groups or collections of groups.  Appears to only search group(s?) specified by last SELECT line however.  Needs bug fix.
X#
X# Revision 1.27  1993/07/16  00:38:24  jfm
X# Switch from Greenwich Meridian Time to Local Time in statistics.
X#
X# Revision 1.26  1993/07/15  04:53:39  jfm
X# 1.  Add better formatting for the statistics output.  Make more readable.
X#
X# Revision 1.25  1993/07/14  00:32:47  jfm
X# 1.  added COLLECT STATISTICS ON (VETO|UNLESS) PATTERNS
X# 2.  COLLECT STATISTICS ON (SEARCH|WHERE) PATTERNS now
X#
X# Revision 1.24  1993/07/14  00:15:17  jfm
X# 1.  Add option to COLLECT STATISTICS ON SEARCH PATTERNS
X#
X# Revision 1.23  1993/07/13  02:40:06  jfm
X# 1.  Spaces between two regular expressions in pairs statistics.
X#
X# Revision 1.22  1993/07/13  02:25:05  jfm
X# 1.  Added correlation coefficients for pairs of regular expression.
X#
X# Revision 1.21  1993/07/12  23:05:47  jfm
X# 1.  Now COLLECT STATISTICS ON PAIRS to count pairs of regular expressions
X#
X# Revision 1.20  1993/07/12  22:29:07  jfm
X# 1.  Now appends to statistics file rather than overwrites.
X# 2.  Does date properly in statistics.
X#
X# Revision 1.19  1993/07/12  20:23:44  jfm
X# 1.  Added COLLECT STATISTICS IN <file-specification>
X# 2.  Added COLLECT STATISTICS ON <regular-expression>
X#
X# Revision 1.18  1993/07/06  15:32:08  jfm
X# Appears to work ok
X#
X# Revision 1.17  1993/07/06  04:21:31  jfm
X# 1.  Added routines to sort and merges elements in range of excluded articles information.
X#
X# Revision 1.16  1993/07/05  00:34:23  jfm
X# 1.  ~ in file specification for home directory (for MBOX)
X# 2.  [119] port number optional
X# 3.  period . allowed in file specification
X# 4.  automatically generates group:excluded range line if not supplied by user and appends to end of file.
X# 5.  Recognizes additional date formats
X# 6.  .newscanrc is in user's home directory as specified by $HOME environment variable.
X# 7.  Improve manpage.
X#
X# Revision 1.15  1993/07/04  00:29:27  jfm
X# 1.  Add support for dotted decimal internet addresses.
X#
X# Revision 1.15  1993/07/04  00:29:27  jfm
X# 1.  Add support for dotted decimal internet addresses.
X#
X# Revision 1.14  1993/07/03  23:57:26  jfm
X# 1.  Fixed bug so that nntp port is optional.
X# 2.  Added support for dates of format 1 Jul 1993 05:54 CST and 1 Jul 93 05:54 CST.
X# 3.  Changed sample resource file in embedded manpage to NNTP nntphost, omitted optional port number since 119 is the default port specified in NNTP rfc.
X#
X# Revision 1.13  1993/07/03  00:25:34  jfm
X# 1. Added -h command line argument for brief help.  Use embedded manpage for detailed help.
X# 2.  Need to fix multiple copy of same article bug.
X#
X# Revision 1.12  1993/07/03  00:04:23  jfm
X# 1.  Added command line argument to specify a configuration file.
X# 2.  Some changes to embedded manpage.
X#
X# Revision 1.11  1993/07/02  22:21:39  jfm
X# 1.  Added more to manpage.
X#
X# Revision 1.10  1993/07/02  22:09:11  jfm
X# Add some more to the built-in manpage.
X#
X# Revision 1.9  1993/07/02  21:58:57  jfm
X# *** empty log message ***
X#
X# Revision 1.8  1993/07/02  21:56:21  jfm
X# 1.  Added default mailbox file newscanBox
X# 2.  Added sample configuration file.
X#
X# Revision 1.7  1993/07/02  21:48:53  jfm
X# 1. removed debug -d
X# 2. Added more comments
X# 3. Removed socket, bind, and connect ok messages.
X# 4. Expanded built in manpage
X#
X# Revision 1.6  1993/06/30  19:53:23  jfm
X# 1.  Add RCS $Header: /u/ey/jfm/hunter/NEWSCAN/RCS/newscan,v 1.66 1994/11/21 23:40:33 jfm Exp jfm $ to comments
X#
X# Revision 1.5  1993/06/26  21:05:45  jfm
X# 1. Fixed bug in GetDate which did not handle the date format Sat, 26 Jun 1993 17:38:51 GMT correctly.
X#
X# Revision 1.4  1993/06/12  22:48:54  jfm
X# 1. Change to using local($_) in subroutines instead of $save = $_.  This should be more efficient and easier to maintain.
X#
X# Revision 1.3  1993/06/10  03:12:39  jfm
X# 1.  Removed bug to allow search patterns of for /.../
X#
X# Revision 1.2  1993/06/10  02:41:49  jfm
X# 1. Changed syntax to use UNLESS to veto articles and WHERE to specifying match to find.
X#
X# Revision 1.1  1993/06/09  00:50:06  jfm
X# Initial revision
X#
X# Revision 1.17  1993/06/07  00:00:39  jfm
X# 1. Treats blank lines in configuration file as comment lines
X# 2. Support for supergroups of groups to do search on using lines of form SUPERGROUP: blank delimited list of newsgroups to search.
X#
X# Revision 1.16  1993/06/06  20:02:53  jfm
X# 1. Changed data structure for search patterns to associative array.  Key is the group name, value is packed list of search patterns using // as the field separator.
X# 2. minor cleanup of code
X#
X# Revision 1.15  1993/06/05  23:48:40  jfm
X# 1. Improved update range subroutine
X# 2. Added subroutine GetXRef to extract cross references from article header.  The Xref: line specifies the article number in other groups if the article has been posted to multiple groups.  scanner should now retrieve only one copy of an article even if posted to multiple groups (not fully tested yet!).
X#
X# Revision 1.14  1993/06/03  21:57:33  jfm
X# 1. Add copyright notice to manpage
X#
X# Revision 1.13  1993/06/03  21:40:54  jfm
X# 1.  Add more comments
X# 2.  Add veto subroutine to veto articles.  More efficient veto logic.
X# 3.  Add built in manpage
X#
X# Revision 1.12  1993/06/02  22:04:54  jfm
X# 1.  Solved date format problem.  Was with dates of form 1 Jun etc.  Old format expected two digit date but 1-9 are allowed in dates.
X# 2.  Now updates range by overwriting the configuration file.
X#
X# Revision 1.11  1993/06/02  20:57:28  jfm
X# 1. Added subroutine to return the day of the week (Mon, Tue, ...) for a date.  Use this subroutine to generate the mailbox header line.
X#
X# Revision 1.10  1993/06/01  00:56:53  jfm
X# 1. Subroutine UpDateRange added to update excluded range for a group.  Still need to use this to update configuration file.
X# 2. Configuration file now specified by NEWSCAN environment variable.  Defaults to .newscanrc if NEWSCAN environment variable not defined.
X#
X# Revision 1.9  1993/05/31  19:24:45  jfm
X# 1. Now supports more date formats for conversion to mailbox format
X# 2. Loops over groups correctly now (used perl keys function)
X# 3. Handles unknown date formats better so still get article
X#
X# Revision 1.8  1993/05/29  21:32:35  jfm
X# 1. Now generates output in mailbox format that can be used with elm mail reader.
X#
X# Revision 1.7  1993/05/29  01:14:14  jfm
X# 1. Now searches body of text of articles according to search patterns stored in .scannerrc file
X# 2. Stores articles to file scanned.
X# 3. scanned is NOT in mailbox format.  Appear to need first line of each article to be of form From (path) (date).  See valid mailbox files.
X# 4. System seems to prefer linefeed ^J as end of line rather than the ^M ^J
X# CR-LF end of line used by NNTP protocol (but I am not certain of this)
X#
X# Revision 1.6  1993/05/28  20:08:37  jfm
X# 1. Now loops over groups and loops over articles in group
X# 2. Some bugs (not a debugged version)
X# 3. No pattern searching yet - add pattern searching next
X#
X# Revision 1.5  1993/05/25  19:46:13  jfm
X# 1.  Added very simple query and response communication with the NNTP server.  Appears to work so far.
X#
X# Revision 1.4  1993/05/25  18:56:07  jfm
X# 1. Added support for a line in configuration file giving internet address and port number of the NNTP server.
X#
X# Revision 1.3  1993/05/25  18:43:41  jfm
X# 1. Added debug flag to run in perl de{*filter*}
X# 2. Removed most debug print statements
X# 3. Believe it now parses the config file
X# 4. Need to add network connection to NNTP server
X#
X# Revision 1.2  1993/05/22  21:40:03  jfm
X# fix bug - add ; at end of second line of code
X#
X# Revision 1.1  1993/05/22  21:24:25  jfm
X# Initial revision
X#
X# arrays for weekday and date conversion
X
Xrequire 'sys/socket.ph';       # handle machine dependencies for sockets
Xrequire "getopts.pl";
X
X&Getopts('c:r:hes');  # -c takes argument configuration file specification
X                   # -e indicates edit configuration file before search
X                  # -r invokes mail reader
X                  # -h is help
X                               # -s is silent mode
X
X
Xif( ! $opt_s)                  # if not silent give status message
X{
X    print "Running newscan newsreader version $Revision: 1.66 $ by John F. McGowan, Ph.D.. \n"; # initial message
X}


X
Xif($opt_r)
X{
X       if($ENV{'READER'})
X       {
X               $myReader = $ENV{'READER'};
X               $foundReader = 1;  # indicate that reader has been found
X       }
X       else
X       {
X# select a reasonable default mail reader - prefers elm if exists
X               $foundReader = 0;

X               {
X                       if(!$foundReader)
X                       {

X                       {
X                               if(-e "$path/$command" && -x "$path/$command")
X                               {
X                                       $myReader = $path . "/" . $command;
X                                       $foundReader = 1;
X                               }
X                       } # close loop over paths
X                       } # close if not foundReader
X               }
X#              $myReader = 'elm';  # use elm mail reader for now
X       }
X
X       if(!$foundReader)
X       {
X               print "newscan problem!  Could not find a mail reader from

X               die;
X       }
X
X#      print "Checking if file $opt_r exists \n";
X
X       if(-e $opt_r)           # file exists
X       {
X# some newsgroups may contain binary executables so don't do this for now
X#          if(-T $opt_r)       # file is a text file
X#          {
X#                print "Please wait!  newscan using $myReader mail reader to read $opt_r file containing results of a search. \n";   # let the user know this may take a while
X               exec("$myReader -f $opt_r");
X#          }
X#          else
X#          {
X#              die "newscan error: Folder file $opt_r is not a text file!";
X#          }
X       }
X       else                    # let user know folder doesn't exist
X       {
X           die "newscan error: Folder of articles $opt_r does not exist!  Cannot read!";
X       }
X
X}
X
Xif($opt_h)
X{
X       print <<"EndOfHelp";
X
Xnewscan -- a network news scanner  ( Version $Revision: 1.66 $ )
X
X       newscan searches selected Internet network news groups
X       for articles that match perl regular expressions.  newscan
X       implements a boolean query key-pattern full-text information
X       retrieval system.  In plain English, this means newscan scans
X       the complete text of each article for matches to various
X       combinations of perl regular expressions.
X
X       Command Line Options:
X
X               -c <file-specification>  
X
X                       This flag selects the configuration file that
X               tells newscan which newsgroups to search and what
X               patterns to search for.  If this option flag is not
X               specified, newscan uses the default configuration file
X               .newscanrc in the user's home directory or specified by
X               the environment variable NEWSCAN.
X
X               -e
X
X                       This flag indicates that the configuration file
X               should be edited before doing the search.  newscan will
X               pop the user into the editor specified by the EDITOR
X               environment variable.  If no configuration file exists,
X               newscan provides a template configuration file that the
X               user should edit.
X
X               -r <file-specification>
X
X                       This flag invokes a mail reader on the mailbox
X               format file specified by the file-specification.  This should
X               be the file containing the articles found by newscan in
X               a search.  The user should set the environment variable
X               READER to his or her favorite mail reader.  Otherwise,
X               newscan will select a mail reader, trying to use elm
X               first if it exists.
X
X                -s
X
X                        Silent mode.  Suppress progress messages issued by
X                newscan during search.
X
X               -h
X
X                       Output this help message.
X
XEndOfHelp
X       exit;
X}

X
X%DaysInMonth = ( 'Jan', '31',
X                'Feb', '28',  # 1992 is leap year with 29 days
X                'Mar', '31',
X                'Apr', '30',
X                'May', '31',
X                'Jun', '30',
X                'Jul', '31',
X                'Aug', '31',
X                'Sep', '30',
X                'Oct', '31',
X                'Nov', '30',
X                'Dec', '31',
X);

X
X%NewsArticle = ();
X
X
X# read the configuration file (default to .newscanrc)
Xif($opt_c)
X{
X# configuration file is specified at command line.
X       $configFile = $opt_c;
X       $configFile =~ s/^\~/$ENV{'HOME'}/;
X}
Xelse
X{
X       if($ENV{'NEWSCAN'})
X       {
X               $configFile = $ENV{'NEWSCAN'};
X       }
X       else # default configuration file in user's home directory
X       {
X               $configFile = $ENV{'HOME'} . '/.newscanrc';
X       }
X}
X# edit the configuration file before start if -e

X
Xif($ENV{'EDITOR'})
X{
X       $myeditor = $ENV{'EDITOR'};
X}
Xelse
X{
X       $foundEditor = 0;

X       {

X               {
X                       if(-e "$path/$command" && -x "$path/$command")
X                       {
X                               $myeditor = $path . "/" . $command;
X                               $foundEditor = 1;
X                       }
X               }
X               last if $foundEditor;
X       }
X       die "Could not find an editor!\n" unless $foundEditor;
X#      $myeditor = 'emacs';  # I am an emacs snob
X}
X
Xif($opt_e)
X{
X       if(-e $configFile)
X       {
X# do nothing if configuration file already exists
X       }
X       else # file does not exist - provide a form
X       {
X               open(CONFIG,">$configFile");
X               print CONFIG <<"EndOfConfig";
X# Template Configuration file for newscan Internet news scanner
X#
X# Note:  Lines beginning with # are comments.  Remove leading # and
X#        edit line if you wish to activate line.
X#
X# Line following: specify the Internet address of the NNTP
X# server for the system.
X# The NNTP server is the machine where the Internet news is stored.
X# OR this line may be omitted if you use the NNTPSERVER environment variable
X# This line takes precedence over NNTPSERVER variable.
XNNTP <nntphost>
X# specify the mailbox format file for the found articles
XMBOX <file-for-found-articles>
X# specify the Internet newsgroups to be searched (space delimited list)
XSELECT <list of newsgroups>
X# retrieve a news article if it contains a match to regular expression
X#WHERE /perl regular expression/[i]
X# do not retrieve a news article if it contains a match to UNLESS regular expression
X#UNLESS /perl regular expression/[i]
X# a news article must contain a match to a REQUIRE regular expression to
X#  be retrieved.
X#REQUIRE /perl regular expression/[i]
X# collect statistics on search in a file -- to aid in refining search criteria
X#COLLECT STATISTICS IN <file-specfication>
X# collect statistics on frequency of articles that match a pattern
X#COLLECT STATISTICS ON /perl regular expression/i
X# collect statistics on search patterns specified by WHERE and REQUIRE lines
X#COLLECT STATISTICS ON SEARCH PATTERNS
X# collect statistics on unless patterns specified by UNLESS
X#COLLECT STATISTICS ON UNLESS PATTERNS
X# collect statistics on frequency of articles that match pairs of patterns
X#COLLECT STATISTICS ON PAIRS
X#<newsgroup>:<excluded range>
X#
XEndOfConfig
X               close(CONFIG);
X       }
X       system("$myeditor $configFile");  # edit the configuration file
X       print "Do you wish to do this search (y/n):";
X       $ans = <STDIN>;
X       exit 0 if $ans =~ /[Nn]/;
X}
X
X

Xclose(CONFIG);
X#
X$newgroup = 'NULL';

X
X# parse the configuration file
X$mbox = "newscanBox";  # default mailbox file for found articles
X$port = 119;           # default to 119 as NNTP server port
Xfor $i (0 .. $#config)
X{
X       $_ = $config[$i];
X       chop;  # removing the trailing newline ^J
X       if(/^\s*NNTP\s+([\w\.\-\+]+)(\s+\d+)?\s*$/)
X       {
X# line specifying internet address and port number of NNTP server
X               $label = NNTP; $them = $1;  
X               if($2)
X               {
X                       $two = $2;
X# note that parenthesis around $port is absolutely necessary
X                       ($port) = $two =~ /\s+(\d+)\b/;
X               }
X       }
X       elsif(/^\s*MBOX\s+([\w\/\.\~\-\+]*)\s*$/) # name of file to store found articles
X       {
X               $mbox = $1;
X               $mbox =~ s/^\~/$ENV{'HOME'}/;
X       }
X       elsif(/^#.*$/)  # a comment line in configuration file begins with #
X       {
X       }
X       elsif(/^\s*SELECT(\s+([\w\-\+\*]+\.)+[\w\-\+\*]+)+\s*$/)
X       {
X# in Perl \w is [0-9a-z_A-Z] ... does not include the hyphen
X# expect space delimited list of newsgroups
X               /^\s*SELECT\s*(.*)$/;

X# create entry in range associative array for each group selected

X               {
X                       $range{$group} = '' unless $range{$group}; # blank range
X               }
X       }
X       elsif(/^\s*([\w\-\+]+\.)+[\w\-\+]+:.*$/)
X       {
X# line specifying the range of newsgroup articles excluded from search
X# this line is optional.  newscan will create this line after search
X# completed if it does not exist OR update the range if the line does exist
X#
X               ($newgroup,$newrange) = split(/:/,$_,2);
X               $range{$newgroup} = $newrange;
X
X       }
X       elsif(/^\s*WHERE\s*(\/.*[^\\]\/\w?)\s*$/)
X       {
X# line specifying the search pattern for a newsgroup

X               {
X                       if($pattern{$Group})
X                       {
X#                   \034 is the ascii field separator ^\ (I believe)
X                               $pattern{$Group} = join("\034",$pattern{$Group},$1);
X                       }
X                       else
X                       {
X                               $pattern{$Group} = $1;
X                       }
X               }

X       }
X       elsif(/^\s*REQUIRE\s+(\/.*[^\\]\/\w?)\s*$/)
X       {

X               {
X                       if($required{$Group})
X                       {
X                               $required{$Group} = join("\034",$required{$Group},$1);
X                       }
X                       else
X                       {
X                               $required{$Group} = $1;
X                       }
X               }

X       }
X       elsif(/^\s*UNLESS\s+(\/.*[^\\]\/\w?)\s*$/) # veto on this pattern
X       {

X               {
X                       if($veto{$Group})
X                       {
X                               $veto{$Group} = join("\034",$veto{$Group},$1);
X                       }
X                       else
X                       {
X                               $veto{$Group} = $1;
X                       }
X               }

X       }
X       elsif(/^\s*COLLECT\s+STATISTICS\s+IN\s+([\w\/\.\~\-\+]+)\s*$/)
X       {
X               $doCollect = 1;
X               $statFile =  $1;
X               $statFile =~ s/^\~/$ENV{'HOME'}/;
X       }
X       elsif(/^\s*COLLECT\s+STATISTICS\s+ON\s+(\/.*[^\\]\/\w?)\s*$/)
X       {
X               $doCollect = 1;
X               $statistics{$1} = 0;  # initialize pattern histogram
X                                       # counts number of articles
X                                       # with a match
X       }
X       elsif(/^\s*COLLECT\s+STATISTICS\s+ON\s+PAIRS\s*$/)
X       {
X               $doCollect = 1;
X               $doPairs = 1;
X               %pairs = ();
X       }
X       elsif(/^\s*COLLECT\s+STATISTICS\s+ON\s+(SEARCH|WHERE)\s+PATTERNS\s*$/)
X       {
X               $doCollect = 1;
X               $doSearchPatterns = 1;
X       }
X       elsif(/^\s*COLLECT\s+STATISTICS\s+ON\s+(VETO|UNLESS)\s+PATTERNS\s*$/)
X       {
X               $doCollect = 1;
X               $doVetoPatterns = 1;
X       }
X       elsif(/^\s*$/)  # skip blank line
X       {
X       }
X       else
X       {
X# line did not match any allowed pattern
X               die "Abort! Syntax error in configuration file!\n Line: $i $_ \n";
X       }
X}  # close loop over lines in configuration file
X
X&FixRange(*range);
X
X
Xif($doCollect)
X{
X       $statistics{'ALL'} = 0;  # count of number of articles scanned
X       $statistics{'FOUND'} = 0; # count of number of articles found
X}
X
X# add search patterns to patterns to collect statistics for
X# if requested
Xif($doSearchPatterns)
X{
X# append search patterns to statistics

X{
X       $statistics{$pattern} = 0;
X}
X}
X
Xif($doVetoPatterns)
X{
X# append search patterns to statistics

X{
X       $statistics{$pattern} = 0;
X}
X}
X
X# open mailbox file to store found newsarticles
Xopen(MBOX,">>$mbox");
X#
X# connect to the NNTP server
X$port = 119 unless $port;
X$them = $ENV{'NNTPSERVER'} unless $them; # use NNTPSERVER environment variable
X
X
X$AF_INET = &AF_INET;               # &AF_INET defined in sys/socket.ph
X$SOCK_STREAM = &SOCK_STREAM;               # = 2 for Irix 5 ( 1 for Irix 4 )
X
X$sockaddr = 'S n a4 x8';
X
Xchop($hostname = `hostname`);  # get name of local host
X$them = $hostname unless $them; # try local machine as NNTP Server
X
X# translate protocol name to associated number
X($name,$aliases,$proto) = getprotobyname('tcp');
X# translates service (port) name to corresponding number
X($name,$aliases,$port) = getservbyname($port,'tcp')
X       unless $port =~ /^\d+$/;
X# translates network hostname to corresponding number
X($name,$aliases,$type,$len,$thisaddr) =
X       gethostbyname($hostname);
X# translate network hostname to corresponding number
Xif($them =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
X{
X       $thataddr = pack('C4',$1,$2,$3,$4);
X}
Xelsif($them =~ /(\w+)(\.\w+)*/)
X{
X       ($name,$aliases,$type,$len,$thataddr) = gethostbyname($them);
X}
Xelse
X{
X       die "Fatal error: NNTP host not specified in proper format!\n";
X}
X$this = pack($sockaddr, $AF_INET, 0, $thisaddr);
X$that = pack($sockaddr, $AF_INET, $port, $thataddr);
X
X# Make the socket a filehandle
X
Xif(socket(S, $AF_INET, $SOCK_STREAM, $proto))
X{
X#      print "socket ok\n";
X}
Xelse
X{
X       die "Fatal Error: socket failed ", $1,"\n";
X}
X
X# give the socket an address
Xif(bind(S, $this))
X{
X#      print "bind ok\n";
X}
Xelse
X{
X       die "Fatal Error: bind to $this failed! ", $1, "\n";
X}
X
X# Call up the server
X
Xif(connect(S,$that))
X{
X#      print "Connect to $them ok\n";
X}
Xelse
X{
X       die "Fatal Error: Apparently, can't connect to $them tcp/ip port $port.  Error: ", $1, "\n";
X}
X
X#  Set socket to be command buffered
Xselect(S); $| = 1; select(STDOUT);
X# loop over groups to search
X$_ = <S>;  # read confirmation of connection message from NNTP server
X
X($stat, $rest) = split(/ /,$_,2);      # split connection message
X
Xif( $stat == 200 || $stat == 201 )
X{
X    # on initial connection, NNTP server will return
X    # 200 server ready - posting allowed
X    # 201 server ready - no posting allowed
X    # otherwise there has been a problem
X}
Xelse
X{
X    die "newscan: Abort! NNTP Server $them refused connection with message: $stat $rest \n";
X}
X
X
X# expand group specifications with asterisk to allowed groups
X
X$k = 0;

X{
X    if($Group =~ /[^\*]*\*.*/)
X    {
X       $expandGroup{$Group} = $k; # offset into selectedGroups
X    }
X    $k++;
X}
X
Xif(keys expandGroup)
X{
X    print S "LIST\n";                # retrieve list of valid groups from NNTP Server
X    $flush_save = $|;
X    $| = 1; print "L"; $| = $flush_save;


X
X    $_ = <S>;                    # read first line back
X    while(! /^\.[^\.].*$/)
X    {
X       $flush_save = $|;
X       $| = 1;
X       print ".";
X       $| = $flush_save;
X       chop; chop;             # remove trailing \r\n
X       $line = $_ . "\n";    # add trailing newline

X       $_ = <S>;         # read next line
X    }
X}
X

X{
X    ($vGroup, $remainder) = split(/ /,$vGroup);
X#    $flush_save = $|;
X#    $| = 1; print "."; $| = $flush_save;
X}
X
Xforeach $Group (keys expandGroup)
X{


X    ($GroupMatch = $Group) =~ s/\*/.*/g; # replace asterisk with .*
X    $GroupMatch =~ s/\+/\\\+/g;
X    $GroupMatch =~ s/\.([^\*])/\\\.$1/g;
X

X    {
X       if ( $vGroup =~ /$GroupMatch/ ) # match valid group to group
X       {
X           push(subList, $vGroup);
X       }
X    }

X    &RangeSplice(*range, $Group, *subList); # won't work????
X    &AssocSplice(*pattern, $Group, *subList);
X    &AssocSplice(*required, $Group, *subList);
X    &AssocSplice(*veto, $Group, *subList);
X}                              # end loop over groups to expand
X
X
X
X
X# loop over groups that are keys of range array
X# this allows different searches for different groups
X# %range associative array includes other groups such as cross posting groups
X
Xforeach $group (sort keys %range)
X{
X# send group command to NNTP server
X       print S "GROUP $group \n";
X       $_ = <S>; # read status reply from NNTP server
X#        if ($!)
X#          {
X#              print "Socket Read Returned Error!\n";
X#          }
X       ($status,$gn,$gfirst,$glast,$gname) = split(/ /);
X       if($status == 411)
X       {
X               warn "Warning! Group $group does not exist! Skipping to next group!\n";
X       }
X       elsif($status == 211) # loop over articles in the group
X       {                      
X           if( $gn > 0 )    # if group is not empty
X           {
X
X           if( ! $opt_s )
X           {
X               $flush_save = $|; # save flush control
X               $| = 1;         # flush on every print or write
X               print "G( $group )";
X               $| = $flush_save; # return to standard method
X           } # indicate doing a newsgroup
X               $i = $gfirst;
X               $prevArticle = $gfirst;
X               $do = 1;
X               while($do)
X               {
X# check if article is in not in excluded range (previously scanned articles)
X                       if(!&InRange($range{$group},$i))
X                       {
X# check if article exists
X                               print S "STAT $i\n";
X                               $_ = <S>;  # read reply
X                               ($status,$article,$id,$rest) = split(/ /);
X                               if($status == 223)
X                               {
X# article retrieved
X                                   if ( ! $opt_s )
X                                   {
X                                       $flush_save = $|;
X                                       $| = 1; # flush on every print or write
X                                       print "\."; # period for each article scanned
X                                       $| = $flush_save;
X                                            
X                                   }
X                                       print S "article\n";
X                                       $_ = <S>;
X                                       ($status,$article,$id,$rest) = split(/ /);
X                                       if($status == 220)
X                                       {


X                                               $_ = <S>; # read first line of text
X
X                                               while(!/^\.[^\.].*$/) # loop until encounter lone period at start of line (.^J ends text in NNTP)
X                                               {
X                                                       chop; chop; # remove trailing CR LF (^M ^J)
X                                                       $line = $_ . "\n"; # add line feed LF ^J to end of line

X                                                       $_ = <S>; # read another line
X                                               }
X
X                                              
X                                               &Collect(*text, *statistics, *pairs, *doPairs) if %statistics;
X
X                                               $match = 0; # start with no match
X                                               if(!&Veto(*text,*veto,*group))
X                                               {
X# look for match
X# deal with required patterns first
X                                                       $Required = 1; # start by assuming it matches all required patterns
X                                                       foreach $search (split("\034",$required{$group}))
X                                                       {
X                                                               $blatz = 0;

X                                                               if(!$blatz)
X                                                               {
X                                                                       $Required = 0; # does not match pattern $search which is required.
X                                                               }
X                                                              
X                                                       }
X
X                                                       if($Required)
X                                                       {
X                                                       foreach $search (split("\034",$pattern{$group}))
X                                                       {
X                                                               $blatz = 0;

X                                                               if($blatz)
X                                                               {
X                                                                       $match = 1;
X                                                               }
X                                                       } # close loop over search patterns
X                                                       } # close if Required
X                                               }
X# clear the NewsArticle array
X                                               %NewsArticle = ();
X# find any cross references
X                                               if(!&GetXRef(*text,*NewsArticle))
X                                               {
X                                                       $NewsArticle{$group} = $article;
X                                               }
X# NewsArticle is an associative array containing the group and article number
X# in group for the article ( an article may be posted to multiple groups
X                                       # if match store article
X                                               if($match)
X                                               {
X                                                   if(! $opt_s) # not quiet
X                                                   {
X                                                       $flush_save = $|; # save i/o buffering state
X                                                       $| = 1; # immediate output
X                                                       print 'F'; # indicate found an article
X                                                       $| = $flush_save;
X                                                   }
X                                                   $statistics{'FOUND'}++;
X                                                   &ToMailBox(*text);

X                                               }

X                                               foreach $newsgroup (keys %range)
X                                               {
X                                                       $ArticleInGroup = $NewsArticle{$newsgroup};

X                                              
X                                                       &UpDateRange(*theRange,*ArticleInGroup);

X                                               }
X
X                                       }
X                                       else
X                                       {
X                                               warn "Warning! ARTICLE command returned unexpected status response: $status \n";
X                                       }
X                               }
X                               elsif($status == 423)
X                               {
X# 423 no such article number in this group
X                                       warn "Warning! Article $i in Group $group does not exist!\n";
X                               }
X                               elsif($status == 430)
X                               {
X# 430 no such article found
X                                       warn "Warning! Article $i in Group $group not found!\n";
X                               }
X                               else
X                               {
X                                       die "Aborting! STAT $i in Group $group returned unexpected status response: $status.\n";
X                               } # end if for result of STAT command
X
X                       }  # end if !&InRange  
X                       &NNTPNext;  # go to next article in group
X               } # end while($do)
X       }                       # end if $gn > 0 (not an empty group )
X       }
X       else                    # doesn't recognize NNTP response to GROUP
X       {
X               die "Abort! NNTP GROUP $group command returned unexpected response: $status. \n";
X       }
X} # close loop over groups to search
Xprint S "quit\n"; # close the connection to the server
Xclose MBOX;       # close the file of found news articles
X
X# update the configuration file
X
X&FixRange(*range);  # clean up the range
X
Xforeach $group (keys %range)
X{
X       $notPresent{$group} = 'T';
X}
X
Xforeach $i (0 .. $#config)
X{
X       $_ = $config[$i];
X       foreach $group (keys %range)
X       {
X               $qgroup = $group;
X               $qgroup =~ s/(\W)/\\\1/g; # quote non word characters (e.g. +)
X               if(s/$qgroup\s*:(.*)/$group:$range{$group}/)
X               {
X                       $notPresent{$group} = 'F';
X               }
X              
X       }
X       $config[$i] = $_;
X}
X#
X# append group range lines if don't exist
X# only do this for groups that have been selected
X# don't care about cross postings to groups that have not been
X# selected
X#
Xforeach $group (keys %range)  # keys of range are all groups to be searched
X{
X       if($notPresent{$group} =~ /T/)
X       {
X               $line = "$group:$range{$group}\n";

X       }
X}
X
Xopen(CONFIG,">$configFile");

Xclose CONFIG;
X# save statistics
X
Xif($statistics{'ALL'})
X{
X       open(STAT,">>$statFile");  # open to append to file
X       print STAT "######\n";


X       ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime(time);
X       $month = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)[$mon];
X       $year = '19' . $year;
X       print STAT "Collected $hour:$min:$sec (Local Time)  $month $mday, $year \n";
X       print STAT "Regular Expression (Perl Syntax) : Number of Articles With a Match \n";
X       $length = 0;
X       foreach $pattern (keys %statistics)
X       {
X               $curLength = length($pattern);
X               $length = $curLength if $curLength > $length;
X       }
X
X       if($length + 2 < 60)
X       {
X               $length = $length + 2;
X       }
X       else
X       {
X               $length = 60;
X       }
X
X       foreach $pattern (keys %statistics)
X       {
X               printf STAT "%-${length}s: %d\n", $pattern,  $statistics{$pattern};
X       }
X       if($doPairs)
X       {
X               print STAT " Incidence of pairs of regular expressions \n";
X               print STAT " Only pairs that occur in at least one (1) article are reported. \n";
X               print STAT " <Perl Regular EXpression><Perl Regular Expression> : Number of Articles with Match :  Correlation Coefficient \n";
X
X               $length = 0;
X               foreach $key (keys %pairs)
X               {
X                       $length = length($key) if length($key) > $length;
X               }
X
X               if($length < 58)
X               {
X                       $length += 2;
X               }
X               else
X               {
X                       $length = 60;
X               }
X
X               foreach $key (keys %pairs)
X               {
X                       $Nall = $statistics{'ALL'};
X                       $Npairs = $pairs{$key};
X                       ($pOne,$pTwo) = split(/\034/,$key);
X                       $NOne = $statistics{$pOne};
X                       $NTwo = $statistics{$pTwo};
X                       $muOne = $NOne/$Nall;
X                       $muTwo = $NTwo/$Nall;
X                       $varOne = $muOne - ($muOne * $muOne);
X                       $varTwo = $muTwo - ($muTwo * $muTwo);
X
X                       if($varOne && $varTwo)
X                       {
X                               $corr = ( (($Npairs)/$Nall) - ($muOne * $muTwo))/(sqrt($varOne) * sqrt($varTwo));
X                       }
X                       else
X                       {
X                               $corr = "UNDEFINED";
X                       }
X
X                       ($index = $key) =~ s/\034/ /;
X
X                       printf STAT "%-${length}s : %-10d : %5.3f \n", $index, $pairs{$key}, $corr if $pairs{$key} && !($corr =~ "UNDEFINED");
X               }
X       }
X       close STAT;
X}
X
Xif(! $opt_s)                   # Let user know where found articles are
X{
X    $flush_save = $|;          # save original i/o buffering method
X    $| = 1;                    # force flush on each print or write
X    print "\nnewscan: Search completed! $statistics{'FOUND'} News articles with match saved in $mbox ! \n"; # think number of articles found is comforting to user - I find it comforting to see
X    $| = $flush_save;
X}
X
Xif( ( ! $opt_s) && $statFile && -e $statFile )                 # if statistics file name defined
X{
X    print "newscan: Search statistics saved in $statFile. \n"; # Let user know where to find statistics
X}
X
Xif( ! $opt_s )
X{
X    print "newscan: Search Completed! \n"; # Final exit message
X}
X
X
X##### Support Subroutines Follow #####
X
Xsub AssocSplice                        # splice into an associative array
X{
X# asterisk * prefix to pass variables by name (not value)

X

X    {
X       $assoc{$xGroup} = $assoc{$group};
X    }
X    delete $assoc{$group};
X}
X
Xsub RangeSplice                        # splice into range array
X{

X

X    {
X       if( ! $assoc{$xGroup} )
X       {
X           $assoc{$xGroup} = $assoc{$group};
X       }
X    }
X    delete $assoc{$group};
X}
X
X
Xsub Collect
X{


X       local($pattern);
X       local(%FoundPattern) = () if $doPairs;
X       local($pOne, $pTwo);
X       local(%done) = () ;  # array of found patterns
X
X       foreach $pattern (keys %statistics)
X       {




X               {
X                       $FoundPattern{$pattern} = 1 if $doPairs;
X                       $statistics{$pattern}++;
X               }
X              
X       }
X       $statistics{'ALL'}++;  # count all articles scanned
X
X       if($doPairs)
X       {
X               foreach $pOne (keys %statistics)
X               {
X                       $done{$pOne} = 1;
X                       foreach $pTwo (keys %statistics)
X                       {
X                               if(!$done{$pTwo})
X                               {
X                                       if($FoundPattern{$pOne} && $FoundPattern{$pTwo})
X                                       {
X                                               if($pairs{$pOne,$pTwo})
X                                               {
X                                                       $pairs{$pOne,$pTwo}++;
X                                               }
X                                               else
X                                               {
X                                                       $pairs{$pOne,$pTwo} = 1;
X                                               }
X                                       } # end if found pattern one and two
X                               } # end if pattern two not already checked
X                       } # end loop over pattern two
X               } # end loop over pattern one
X                              
X       }
X       return;
X}
X
X
Xsub NNTPNext
X{
X# group sending next command to NNTP server and code to parse response
X
X                               print S "next\n";
X                               $_ = <S>; # retrieve reply to next
X                               ($status,$article,$id,$rest) = split(/ /);
X                               if($status == 223)
X                               {
X# 223 n a message
X                                       $i = $article;
X                               }
X                               elsif($status == 421)
X                               {
X# 421 no next article in group
X                                       $do = 0;
X                               }
X                               elsif( $status == 420 )
X                               {
X# 420 no current article has been selected
X                                   warn "newscan: NNTP Server has returned 420 no current article has been selected error message in response to NNTP NEXT command.\n";
X                                   $do = 0;
X
X                               }
X                               else
X                               {
X                                       die "Abort! Unexpected response $status from NEXT command.\n";
X                               }
X}
X
Xsub InRange
X{

X

X
X       local($answer) = 0;  # return false unless in range
X       local($i);
X# should use a more efficient search algorithm here now that ranges
X# are sorted in ascending order
X
X       for $i (0 .. $#ranges)
X       {
X               $_ = $ranges[$i];
X               if(/^\s*(\d+)\s*$/)  
X               {
X                       $1 == $n ? ($answer = 1) : ($answer);
X               }
X               elsif(/^\s*(\d+)-(\d+)\s*$/)
X               {
X                       if($n >= $1 && $n <= $2)
X                       {
X                               $answer = 1; return $answer;
X                       }
X               }
X               else
X               {
X                       die "Abort in InRange! Syntax error in range: $_ \n";
X               }
X       }
X
X#      print "InRange: answer is $answer \n";
X       return $answer
X}
X                      
X
X
X
Xsub ToMailBox
X{

X       local($path) = &GetPath(*text);
X       local($date) = &GetDate(*text);
X       local($header) = sprintf("%s %s %s\n","From",$path,$date);
X#
X# Unix mail, in its infinite wisdom, treats a line starting with From followed
X# by a space as the beginning of a new message.
X# The mail mail reader will actually split a message containing a line
X# starting with From into two messages.  elm seems to ignore a line
X# in the body of the mail message starting with From.  So, following
X# the sendmail convention, newscan will change From .... to >From ....
X#

X       {
X           $line =~ s/^From />From /g;
X       }

X}
X
Xsub GetPath
X{

X       local($line) = '';
X       local($path) = '';
X

X       {
X               $_ = $line;
X               if(/^\s*[Pp]ath\s*:\s*([^\s]*)\s*$/)
X               {
X                       $path = $1;
X               }
X       }
X       if($path)
X       {
X               return $path;
X       }
X       else
X       {
X               warn "Warning! Could not find a Path line in article!\n";
X       }
X}
X
Xsub GetDate
X{

X       local($line) = '';
X       local($date) = '';
X       local($date) = '';
X

X       {
X               $_ = $line;
X               if(/^\s*[Dd]ate\s*:\s*(.*)$/)
X               {
X                       $_ = $1;
X                       if(/(\w\w\w),\s+(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of form: Tue, 18 May 93 11:24:33 GMT
X                               $wday = $1;
X                               $mday = $2;
X                               $month = $3;
X                               $year = '19' . $4;
X                               $time = $5;
X                               &FixTime(*time);
X                               $zone = $6;
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\w\w\w),\s+(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d:\d\d)\s+(-\d\d\d\d)\s*/)
X                       {
X# dates of form: Tue, 18 May 93 11:24:33 -0400
X                               $wday = $1; $mday = $2; $month = $3;
X                               $year = '19' . $4;
X                               $time = $5;
X                               &FixTime(*time);
X                               $zone = 'GMT';  # kludge for this
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\w\w\w),\s+(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d:\d\d)\s+(-\d\d\d\d)\s*/)
X                       {
X# dates of form: Tue, 18 May 1993 11:24:33 -0400
X                               $wday = $1; $mday = $2; $month = $3;
X                               $year = $4;
X                               $time = $5;
X                               &FixTime(*time);
X                               $zone = 'GMT';  # kludge for this
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\w\w\w),\s+(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of form: Tue, 18 May 1993 11:24:33 GMT
X                               $wday = $1; $mday = $2; $month = $3;
X                               $year = $4; $time = $5;
X                               &FixTime(*time);
X                               $zone = $6;
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of form: 18 May 1993 11:24:33 GMT
X                               $mday = $1; $month = $2; $year = $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = $5;
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of form 18 May 93 11:24:33 GMT
X                               $mday = $1;
X                               $month = $2;
X                               $year = '19' . $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = $5;
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of format 1 Jul 1993 05:54 CST
X                               $mday = $1;
X                               $month = $2;
X                               $year = $3;
X                               $time = $4 . ':00';
X                               &FixTime(*time);
X                               $zone = $5;
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d)\s+(\w\w\w)\s*/)
X                       {
X# dates of format 1 Jul 93 05:54 CST
X                               $mday = $1;
X                               $month = $2;
X                               $year = '19' . $3;
X                               $time = $4 . ':00';
X                               &FixTime(*time);
X                               $zone = $5;
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d:\d\d)\s+(-\d\d\d\d)\s*/)
X                       {
X# dates of form: 18 May 93 05:11:21 -0400
X                               $mday = $1;
X                               $month = $2;
X                               $year = '19' . $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = 'GMT'; # temporary kludge until i figure out what -0400 means
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d:\d\d)\s+(-\d\d\d\d)\s*/)
X                       {
X# dates of form: 18 May 1993 05:11:21 -0400
X                               $mday = $1;
X                               $month = $2;
X                               $year = $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = 'GMT'; # temporary kludge until i figure out what -0400 means
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d)\s+(\d?\d:\d\d:\d\d)\s*/)
X                       {
X# dates of form: 18 May 93 05:11:21
X                               $mday = $1; $month = $2;
X                               $year = '19' . $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = 'GMT';  #kludge
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       elsif(/(\d\d?)\s+(\w\w\w)\s+(\d\d\d\d)\s+(\d?\d:\d\d:\d\d)\s*/)
X                       {
X# dates of form: 18 May 1993 05:11:21
X                               $mday = $1; $month = $2;
X                               $year = $3;
X                               $time = $4;
X                               &FixTime(*time);
X                               $zone = 'GMT';  #kludge
X                               $wday = &GetWeekDay($mday,$month,$year);
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                       }
X                       else
X                       {
X                               warn "Warning! Format of date used in news article is not recognized by newscan: $_ \n";
Xwarn "Warning! newscan is unable to reformat this date to the date format required for the mailbox format file!\n";
Xwarn "Warning! A dummy date is used so that the mailbox format file with the stored articles will function with a mail reader!\n";
X# use a dummy date below so it knows that I did find a Date line
X# but could not parse date format
X                               $wday = 'Mon'; # kludge
X                               $mday = 1;
X                               $month = 'Jan';
X                               $year = '1800';
X                               $time = '12:00:00';  # noon
X                               $zone = 'GMT';
X                               $date = join(' ',$wday,$month,$mday,$time,$year,$zone);
X                           }  # end parsing of date
X                       last;   # leave loop over lines after first Date line
X                   }  # end if date line
X           }                   # end loop over lines in article
X
X       if($date)
X       {
X               return $date;
X       }
X       else
X       {
X               warn "Warning! Could not find Date line in article!\n";
X       }
X}
X
Xsub FixRange
X{


X
X       foreach $group (keys %range)
X       {


X               &CleanUpRange(*sortedRange);

X       }
X       return;
X}
X
Xsub SortElements
X{
X       if( ($minA,$maxA) = $a =~ /(\d+)\-(\d+)/)
X       {
X       }
X       elsif( ($minA) = $a =~ /(\d+)/)
X       {
X               $maxA = $minA;
X       }
X       else
X       {
X               die "Error in range\n";
X       }
X
X       if( ($minB,$maxB) = $b =~ /(\d+)\-(\d+)/)
X       {
X       }
X       elsif( ($minB) = $b =~ /(\d+)/)
X       {
X               $maxB = $minB;
X       }
X       else
X       {
X               die "Error in range\n";
X       }
X
X       if( $maxA > $maxB && $minA >= $minB)
X       {
X               return 1;
X       }
X       elsif( $maxB > $maxA && $minB >= $minA)
X       {
X               return -1;
X       }
X       else
X       {
X               return 0;
X       }
X
X
X}
X
X
Xsub CleanUpRange
X{

X
X       local($i)=0;
X       local($next);
X       local($element);
X       local($newElement);
X       local($minLo, $maxLo, $minHi, $maxHi);
X
X       while($i < $#theRange)
X       {
X                       if( ($minLo,$maxLo) = $theRange[$i] =~ /(\d+)\-(\d+)/)
X                       {
X                       }
X                       elsif( ($minLo) = $theRange[$i] =~ /(\d+)/)
X                       {
X                               $maxLo = $minLo;
X                       }
X                       else
X                       {
X                               warn "CleanUpRange Error! \n";
X                       }
X
X                       if( ($minHi,$maxHi) = $theRange[$i+1] =~ /(\d+)\-(\d+)/)
X                       {
X                       }
X                       elsif( ($maxHi) = $theRange[$i+1] =~ /(\d+)/)
X                       {
X                               $minHi = $maxHi;
X                       }
X                       else
X                       {
X                               warn "CleanUpRange Error! \n";
X                       }
X
X                       if($minHi <= ($maxLo + 1) && $minHi >= $minLo && $maxLo <= $maxHi)
X                       {
X# merge two elements in the range
X                               $newElement = "$minLo\-$maxHi";

X                       }      
X                       else
X                       {
X# don't merge elements -- move to next element in list
X                               $i++;  # increment element in range
X                       }                      
X
X       } # end while loop
X       return;
X}
Xsub UpDateRange
X{

X
X       local($i) = 0;
X       local($lrange);
X# theRange is an array consisting of article numbers nn or nn-mm where nn and mm are article numbers

X       {
X               local($_) = $lrange;
X               if(/^(\d+)$/)
X               {
X                       if($1 == $theArticle)
X                       {
X                               return 0;
X                       }
X                       elsif($1 == ($theArticle - 1) )
X                       {
X                               local($newRange) = join('-',$1,$theArticle);

X                               return 1; # return 1 if updated range
X                       }
X                       elsif($1 == ($theArticle + 1) )
X                       {
X                               local($newRange) = join('-',$theArticle,$1);

X                               return 1;  # return 1 if updated range
X                       }
X                       else
X                       {
X                       }
X               }
X               elsif(/^(\d+)-(\d+)$/)
X               {
X                       if($1 <= $theArticle && $2 >= $theArticle)
X                       {
X                               return 0;
X                       }
X                       elsif(($theArticle + 1) == $1 )
X                       {
X                               $newRange = join('-',$theArticle,$2);

X                               return 1; # returns one if updated range
X                       }
X                       elsif(($theArticle - 1) == $2 )
X                       {
X                               $newRange = join('-',$1,$theArticle);

X                               return 1;  # returns one if updated range
X                       }
X                       else
X                       {
X                       }
X               }
X               else
X               {
X               }
X               $i++;
X       }       # end loop over range list
X# if it gets here then article is not in excluded range and
X# is not one after an existing range

X       return 1;
X}
X
Xsub GetWeekDay
X{

X
X       local($i) = 0;
X       local($days) = 0;  # Number of days since 1 Jan 1991
X                          # 1 Jan 1991 is a Tue
X# count number of days in current year to date
X       while($Months[$i] ne $month && ($i < 12))
X       {
X               $days += $DaysInMonth{$Months[$i]};
X               if($Months[$i] eq 'Feb' && ( $year - 1988 ) % 4 == 0)
X               {
X                       $days++;  # add additional day for the leap year
X               }
X               $i++;
X       }
X
X       $days += $mday;
X# count number of days in years since 1990 (counts 1991)
X       for(local($past) = 1991; $past < $year; $past++)
X       {
X               if(($past - 1988) % 4) # non-zero if not a leap year
X               {
X                       $days += 365;
X               }
X               else # a leap year
X               {
X                       $days += 366;
X               }
X       }
X
X       local($weekday) = $days % 7;  # at this point 0 is a Monday
X       return $DaysOfWeek[$weekday];
X}
X
Xsub Veto
X{

X# veto article if matches a veto pattern
X       local($search, $blatz);  # declare local variables


X       {

X               if($blatz)
X               {
X                       return 1; # veto this article
X               }
X       }
X       return 0;  # no veto
X}
X
Xsub GetXRef
X{


X       local($line,$field,$key,$value);
X       local($iret) = 0;  # did not find an Xref: line in text

X       {
X# parse Xref line
X               local($_) = $line;
X               chop;
X               if(/^Xref:/)
X               {



X                       {
X                               ($key,$value) = split(':',$field);
X                               $NewsArticle{$key} = $value;
X                               $iret = 1;
X                       }
X              
X               }  # end if Xref line
X       } # end loop over lines in article text
X       return $iret;
X}
X
Xsub FixTime
X{

X       local($_) = $time;
X       if(/^\d\d:\d\d:\d\d$/)
X       {
X# do nothing: time in correct format
X       }
X       elsif(/^\d:\d\d:\d\d$/)
X       {
X               $time = '0' . $time;
X
X       }
X       elsif(/^\d\d:\d\d$/)
X       {
X               $time = $time . ':00';
X       }
X       elsif(/^\d:\d\d$/)
X       {
X               $time = '0' . $time . ':00';
X       }
X       else
X       {
X               warn "Fixtime Warning!  Do Not Recognize Time format of time: $time !\n";
X       }
X       return;
X
X}
X
X
X##################################################
X# Next few lines are legal in both perl and nroff
X.00;  # finish .ig
X
X'di                      \" finish diversion -- previous line must be blank
X.nr nl 0-1               \" fake up transition to first page again
X.nr % 0                  \" start at page 1
X';__END__ #### From here on it's a standard manual page ###
X
X.TH NEWSCAN 1 "June 3, 1993"
X.AT 3
X.SH NAME
Xnewscan \- scans Usenet news for articles matching regular expressions (uses perl regular expressions).  Articles are saved in a mail folder, a file in mailbox format.
X.SH SYNOPSIS
X.B newscan [-c configuration-file] [-e] [-r folder-file-specification]
X[-h] [-s]
X.SH DESCRIPTION
X.I newscan
Xsearches selected newsgroups for articles matching patterns.  Patterns are specified as regular expressions in a configuration or resource file.  Also may specifify patterns to veto articles.  If an article contains this pattern, the article will be ignored even if it contains a search pattern.
XThe configuration file is specified with the command line argument -c configuration-file or in the environment variable NEWSCAN.  The command line argument takes precedence over the NEWSCAN environment variable. If NEWSCAN is undefined, the configuration file defaults to .newscanrc in the user's home directory.
X
XTypically, newscan is run as a batch job:
X       newscan -s &      ( -s to turn off in progress messages )
XOR
X       newscan > newscan.out &  ( redirect in progress messages to file )
X
X.SH OPTIONS
X
X       What You Want To Do             Option
X
X       Get help on
X.I newscan                             -h
X
X       Edit Configuration File         -e
X
X       Override Default Configuration File     -c configuration-file
X
X       Invoke Mail Reader to Read a Folder     -r folder
X
X       Silent Mode (Suppress In Progress Messages)  -s
X
X
X.SH CONFIGURATION FILE
Xnewscan is controlled by a resource or configuration file containing command in a simple language that newscan understands.  These commands should be all capitals.
X
XConfiguration file commands:
X
XNNTP them.them.com [119] specifies the Internet address of the NNTP
Xserver.  The first argument is the Internet address either as a Fully
XQualified Domain Name or the dotted decimal format for the 32 bit Internet
Xaddress.  The optional second argument is the port number of the NNTP
Xserver; this should be 119 if NNTP specification is followed.  Port
X119 is reserved for NNTP.
X
X    If this line is not specified, newscan will use the environment
Xvariable NNTPSERVER if it is defined.  If NNTPSERVER is not defined, newscan
Xwill use the local machine as default NNTP server.
X
XMBOX <my-file-specification> specifies the file where the found
Xarticles are stored.  This file is a mail folder that may be
Xread and manipulated by any mailer (Mail User Agent)
X<my-file-specification> can be any valid Unix
Xfile-specification (including path).  MBOX interprets a leading tilde
X~ as the user's home directory: e.g.  MBOX ~/tmp/myfile.
X
XSELECT my.group his.group alt.group specifies the Internet newsgroups
Xfor which subsequent search specifications apply.  newscan searches all
Xnewsgroups specified by SELECT lines.
X
XREQUIRE /regexp/[i] specifies a perl regular expression that must be
Xfound for a match to occur.  This search criterion applies only to
Xthe newsgroups specified by the last preceding SELECT line.
X
XWHERE /regexp/[i] specifies the perl regular expression to be found.
XAs in perl, the optional trailing /i tells newscan to ignore case.  This
Xsearch criterion applies only to the newsgroups specified by the
Xlast preceding SELECT line.
X
XUNLESS /regexp/[i] specifies the perl regular expression used to
Xexclude an article.  Even if the article contains a match to a WHERE
Xregular expression it will be excluded (not found) if it contains a
Xmatch to an UNLESS expression.  As in perl, the optional trailing /i
Xtells newscan to ignore case.
X
XCOLLECT STATISTICS IN <file-specification> line tells newscan to collect
Xstatistics on the incidence of regular expressions in all articles scanned
X(not just articles that match).  These statistics are printed in human
Xreadable form in the file <file-specification>.  newscan appends the
Xstatistics information to <file-specification>.
X
XCOLLECT STATISTICS ON /regexp/[i] line tells newscan to collect statisitcs on
Xincidence of perl regular expression regexp.  newscan counts the number of
Xarticles that contain at least one match to regexp.
X
XCOLLECT STATISTICS ON {SEARCH|WHERE} PATTERNS tells newscan to collect
Xstatistics on all perl regular expressions specified by WHERE /regexp/[i]
Xlines in the configuration file.
X
XCOLLECT STATISTICS ON {VETO|UNLESS} PATTERNS tells newscan to collect
Xstatistics on all perl regular expressions specified by UNLESS /regexp/[i]
Xlines in the configuration file.
X
XCOLLECT STATISTICS ON PAIRS tells newscan to collect statistics on all pairs
Xof perl regular expressions for which statistics are being collected.  newscan
Xcounts the number of articles containing at least one match to both regular
Xexpressions.  newscan also calculates a simple correlation coefficient between
Xthe two regular expressions in the pair.  Note that if the correlation
Xcoefficient is 1.0, then the two regular expressions always occur together;
Xthis means one regular expression is a redundant (unneeded) search criterion.
X
Xmy.favorite.group:1-1100,1105-1110 is a line in the configuration file
Xthat lists ranges of articles in a group that are excluded from
Xsearch.  newscan updates this range after it finishes to avoid
Xrepeating a search.  If this line is not provided by the user, newscan
Xwill generate the group range line and append it to the end of the
Xconfiguration file.  newscan generates the group range line for any
Xother groups that a found article has been posted to.
X
X.SH SAMPLE RESOURCE FILE
X
XNNTP nntphost
X
XMBOX myBox
X
XSELECT misc.jobs.offered ba.jobs.offered
X
XWHERE /gui/i
X
XWHERE /motif/i
X
XWHERE /graphic/i
X
XUNLESS /From:.*Headhunter/i
X
XUNLESS /Subject:.*Recruit/i
X
Xmisc.jobs.offered:1-1000
X
Xba.jobs.offered:
X
X.SH ENVIRONMENT
X.I NEWSCAN
Xenvironment variable defines the configuration or resource file that
Xcontrols the search.  If NEWSCAN is not defined, then newscan defaults
Xto the file .newscanrc in the user's home directory.  NEWSCAN is
Xsuperseded by the -c configuration-file command line argument.
X
X.I NNTPSERVER
Xenvironment variable specifies the Internet address of the system NNTP
Xserver.  The NNTP <nntphost> line in the configuration file takes
Xprecedence over the NNTPSERVER variable.
X
X.I READER
Xenvironment variable selects the mail reader used by newscan.  If READER
Xis not set, newscan will try to find and use a mail reader of its choosing.
Xnewscan will use Dave Taylor's elm if it exists.
X
X.I EDITOR
Xenvironment variable selects the editor used by newscan.  If EDITOR is not
Xset, newscan will try to use first emacs and then vi.
X
X.SH FILES
X.I $HOME/.newscanrc
Xis the default resource or configuration file
Xspecifying the search.  This can be overridden from the command line
X(using -c configuration-file) or by setting the NEWSCAN environment
Xvariable.  The command line argument supersedes the NEWSCAN
Xenvironment variable.
X
X.I newscanBox
Xis the default file where newscan saves found articles in mailbox
Xformat.  This can be overridden by using the MBOX file-name command in
Xthe configuration file.
X
X.SH PERL REGULAR EXPRESSIONS
X
Xnewscan uses Perl regular expressions to specify the patterns that it searches
Xfor in the selected newsgroups.  Perl regular expressions are similar to
Xthe regular expressions used in sed, vi, and emacs, but more extensive.
X
X    .           Matches any character except for newline
X    [a-z0-9]    Matches any single character of a set
X    [^a-z0-9]   Matches any signle character not in set
X    \\d          Matches a digit, same as [0-9]
X    \\D                Matches a non-digit, same as [^0-9]
X    \\w          Matches an alphanumeric (word) character [a-zA-Z0-9_]
X    \\W          Matches a non-word character [^a-zA-Z0-9_]
X    \\s          Matches a whitespace character (space, tab, newline...)
X    \\S          Matches a non-whitespace character
X
X    \\n          Matches a newline
X    \\r          Matches a return
X    \\t          Matches a horizontal tab
X    \\f          Matches a formfeed
X    \\b          Matches a backspace ( inside [] only! )
X    \\0          Matches a null character
X    \\000        Also matches a null character
X    \\nnn        Matches an ASCII character of that octal value
X    \\xnn        Matches an ASCII character of that hexadecimal value
X    \\cX         Matches an ASCII control character
X    \\metachar   Matches the character itself (\|, \., etc.)
X
X    (abc)       Remembers the match for later backreferences
X    \\1                 Matches whatever first set of parens matched
X    \\2          Matches whatever second set of parens matched
X    \\3          Matches whatever third set of parens match ... and so on
X
X    x?           Matches 0 or 1 x's, where x is any of above
X    x*           Matches 0 or more x's
X    x+           Matches 1 or more x's
X    x{m,n}       Matches at least m x's but no more than n
X
X    abc          Matches abc in order
X    fee|fie|foe  Matches one of fee, fie, or foe
X    
X    \\b                 Matches a word boundary (outside [] only!)
X    \\B          Matches a non-word boundary
X    ^            Anchors match to the beginning of a line or string
X    $           Anchors match to the end of line or string
X    
X.SH EXAMPLES OF PERL REGULAR EXPRESSIONS
X
X.I
XWHERE /\\bphigs\\b/i
X
XSearch for the word PHIGS (case insensitive) Uses \\b word boundary
Xsymbol.  Note that this search would NOT find a news article
Xcontaining the word "SunPHIGS"
X
X.I
XWHERE /phigs/i
X
XSearch for the string PHIGS (case insensitive) Does not use \\b.  Note
Xthat this search would find a news article containing the word
X"SunPHIGS".
X
X.I
XWHERE /John\\s+Smith/        
X
XSearch for the string John (any number of spaces) Smith.  Case
Xsensitive.  Would not find "john smith".  Would find "John Smith".
X
X.I
XWHERE /^Subject:.*phigs/i  
X
XSearch for a news article whose Subject contains the string phigs.
X
X.I
XUNLESS /^From:.*Mad\\s+Flamer/i  
X
XIgnore articles from Mad Flamer!
X
X
X.SH SEE ALSO
X
X"Programming Perl" by Larry Wall and Randal L. Schwartz, published by
XO'Reilly and Associates (ISBN: 0-937175-64-1).  Often referred to as the
X"Camel" book after the Camel on the cover.  A comprehensive book on Perl.
X
X"Learning Perl" by Randal L. Schwartz, published by O'Reilly and Associates
X(ISBN: 1-56592-042-2).  Often referred to as the "Llama" book after the Llama
Xon the cover.  Introductory Perl book.
X
Xcomp.lang.perl USENET newsgroup.
X
X.SH AUTHOR
X.I John F. McGowan, Ph.D.
X

X
X.SH DIAGNOSTICS
X
X.SH BUGS
XA known nuisance is that newscan will find the same article more than once if
Xthe article has been manually posted to more than one newsgroup that newscan
Xsearches.  If the article has been properly cross-posted to multiple
Xnewsgroups, newscan will find only one copy of the article.
X
XPlease report any bugs to the author.
X
X.SH COPYRIGHTS
X(C) Copyright 1993, 1994 by John F. McGowan, Ph.D.
X
XYou may use or modify newscan as you like.  However, you must credit John McGowan as the author of the original version of newscan and include the copyright notice above.
X
X.SH DISCLAIMER
X.I newscan
Xis provided
X.I as is.  
XThere is no warranty, express or implied, that it will do what you want.  There is no warranty that it will work reliably or without error.  The author disclaims all responsibility or liability for any consequences of using newscan, including but not limited to losses of time or money.  
X
XIn other words, use at your own risk.    
END_OF_FILE
if test 65210 -ne `wc -c <'newscan'`; then
    echo shar: \"'newscan'\" unpacked with wrong size!
fi
chmod +x 'newscan'
# end of 'newscan'
fi
if test -f 'cdrom.cfg' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'cdrom.cfg'\"
else
echo shar: Extracting \"'cdrom.cfg'\" \(1475 characters\)
sed "s/^X//" >'cdrom.cfg' <<'END_OF_FILE'
X# Template Configuration file for newscan Internet news scanner
X#
X# Note:  Lines beginning with # are comments.  Remove leading # and
X#        edit line if you wish to activate line.
X#
X# specify the Internet address of the NNTP server for the system
X# the NNTP server is the machine where the Internet news is stored.
XNNTP nntp.slac.stanford.edu
X# specify the mailbox format file for the found articles
XMBOX cd-rom.art
X# specify the Internet newsgroups to be searched
XSELECT comp.sys.ibm.pc.hardware.cd-rom
X# retrieve a news article if it contains a match to regular expression
X#WHERE /perl regular expression/[i]
XWHERE /NEC\s+3X/i
X# do not retrieve a news article if it contains a match to UNLESS regular expression
X#UNLESS /perl regular expression/[i]
X# a news article must contain a match to a REQUIRE regular expression to
X#  be retrieved.
X#REQUIRE /perl regular expression/[i]
X# collect statistics on search in a file -- to aid in refining search criteria
X#COLLECT STATISTICS IN <file-specfication>
X# collect statistics on frequency of articles that match a pattern
X#COLLECT STATISTICS ON /perl regular expression/i
X# collect statistics on search patterns specified by WHERE and REQUIRE lines
X#COLLECT STATISTICS ON SEARCH PATTERNS
X# collect statistics on unless patterns specified by UNLESS
X#COLLECT STATISTICS ON UNLESS PATTERNS
X# collect statistics on frequency of articles that match pairs of patterns
X#COLLECT STATISTICS ON PAIRS
X#<newsgroup>:<excluded range>
X#
X
X
END_OF_FILE
if test 1475 -ne `wc -c <'cdrom.cfg'`; then
    echo shar: \"'cdrom.cfg'\" unpacked with wrong size!
fi
# end of 'cdrom.cfg'
fi
if test -f 'pex.cfg' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'pex.cfg'\"
else
echo shar: Extracting \"'pex.cfg'\" \(1417 characters\)
sed "s/^X//" >'pex.cfg' <<'END_OF_FILE'
X# Template Configuration file for newscan Internet news scanner
X#
X# Note:  Lines beginning with # are comments.  Remove leading # and
X#        edit line if you wish to activate line.
X#
X# specify the Internet address of the NNTP server for the system
X# the NNTP server is the machine where the Internet news is stored.
XNNTP news.slac.stanford.edu
X# specify the mailbox format file for the found articles
XMBOX pex-articles
X# specify the Internet newsgroups to be searched
XSELECT comp.windows.x.pex
X# retrieve a news article if it contains a match to regular expression
XWHERE /pex/i
X# do not retrieve a news article if it contains a match to UNLESS regular expression
X#UNLESS /perl regular expression/[i]
X# a news article must contain a match to a REQUIRE regular expression to
X#  be retrieved.
X#REQUIRE /perl regular expression/[i]
X# collect statistics on search in a file -- to aid in refining search criteria
XCOLLECT STATISTICS IN pex-statistics++
X# collect statistics on frequency of articles that match a pattern
X#COLLECT STATISTICS ON /perl regular expression/i
X# collect statistics on search patterns specified by WHERE and REQUIRE lines
XCOLLECT STATISTICS ON SEARCH PATTERNS
X# collect statistics on unless patterns specified by UNLESS
X#COLLECT STATISTICS ON UNLESS PATTERNS
X# collect statistics on frequency of articles that match pairs of patterns
X#COLLECT STATISTICS ON PAIRS
X#<newsgroup>:<excluded range>
X#
X
X
END_OF_FILE
if test 1417 -ne `wc -c <'pex.cfg'`; then
    echo shar: \"'pex.cfg'\" unpacked with wrong size!
fi
# end of 'pex.cfg'
fi
if test -f 'extest.cfg' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'extest.cfg'\"
else
echo shar: Extracting \"'extest.cfg'\" \(1591 characters\)
sed "s/^X//" >'extest.cfg' <<'END_OF_FILE'
X# Template Configuration file for newscan Internet news scanner
X#
X# Note:  Lines beginning with # are comments.  Remove leading # and
X#        edit line if you wish to activate line.
X#
X# Line following: specify the Internet address of the NNTP
X# server for the system.
X# The NNTP server is the machine where the Internet news is stored.
X# OR this line may be omitted if you use the NNTPSERVER environment variable
X# This line takes precedence over NNTPSERVER variable.
XNNTP nntp.slac.stanford.edu
X# specify the mailbox format file for the found articles
XMBOX extest.art
X# specify the Internet newsgroups to be searched (space delimited list)
XSELECT comp.lang.p*
X# retrieve a news article if it contains a match to regular expression
XWHERE /fortran/i
X# do not retrieve a news article if it contains a match to UNLESS regular expression
X#UNLESS /perl regular expression/[i]
X# a news article must contain a match to a REQUIRE regular expression to
X#  be retrieved.
X#REQUIRE /perl regular expression/[i]
X# collect statistics on search in a file -- to aid in refining search criteria
X#COLLECT STATISTICS IN <file-specfication>
X# collect statistics on frequency of articles that match a pattern
X#COLLECT STATISTICS ON /perl regular expression/i
X# collect statistics on search patterns specified by WHERE and REQUIRE lines
X#COLLECT STATISTICS ON SEARCH PATTERNS
X# collect statistics on unless patterns specified by UNLESS
X#COLLECT STATISTICS ON UNLESS PATTERNS
X# collect statistics on frequency of articles that match pairs of patterns
X#COLLECT STATISTICS ON PAIRS
X#<newsgroup>:<excluded range>
X#
END_OF_FILE
if test 1591 -ne `wc -c <'extest.cfg'`; then
    echo shar: \"'extest.cfg'\" unpacked with wrong size!
fi
# end of 'extest.cfg'
fi
if test -f 'distrib.txt' -a "${1}" != "-c" ; then
  echo shar: Will not clobber existing file \"'distrib.txt'\"
else
echo shar: Extracting \"'distrib.txt'\" \(2012 characters\)
sed "s/^X//" >'distrib.txt' <<'END_OF_FILE'
X
Xnewscan is distributed as a shar (shell archive) file.  On AIX machines this
Xcan be generated by command:
X
X               shar -i newscan.list -o newscan.shar
X
Xwhere newscan.list contains:
X
Xnewscan
XREADME
Xdistrib.txt
X
X
XPLACES TO SUBMIT/POST newscan source:
X
Xalt.sources (honor system - supposedly only source code)
Xcomp.sources.misc (moderated - only source code allowed)
Xcomp.sources.unix (reviewed)
Xcomp.sources.reviewed (reviewed even more)
X
XPLACES TO ANNOUNCE A RELEASE OF newscan:
X
XNote: Criterion is newsgroups that have many postings (hundreds or thousands)
Xwhere manual checking is prohibitively time consuming AND newsgroup likely
Xto be used for something important such as momey decision, where user likely
Xto have strong incentive to learn to use something like newscan.
X
X
XComputer Source Groups
X------------------------
Xalt.sources.d
Xcomp.sources.d
Xcomp.lang.perl
X
XJob Seekers
X------------------------
Xmisc.jobs.misc
Xba.jobs.misc
Xsci.research.careers
Xcan.jobs
X
XNetNews Groups
X------------------------
Xnews.admin.misc
Xnews.misc
Xnews.software.readers
X
XOverloaded Hardware Information Groups
X---------------------------------------
Xcomp.sys.ibm.pc.hardware.cd-rom
Xcomp.sys.ibm.pc.hardware.chips
Xcomp.sys.ibm.pc.hardware.comm
Xcomp.sys.ibm.pc.hardware.misc
Xcomp.sys.ibm.pc.hardware.networking
Xcomp.sys.ibm.pc.hardware.storage
Xcomp.sys.ibm.pc.hardware.systems
Xcomp.sys.ibm.pc.hardware.video
Xcomp.sys.ibm.ps2.hardware
Xcomp.sys.mac.hardware
Xcomp.sys.next.hardware
Xcomp.sys.sgi.hardware
Xcomp.sys.sun.hardware
Xbiz.comp.hardware
X
XInternet Marketsplaces
X----------------------------
Xba.market.computers
Xba.market.housing
Xba.market.misc
Xba.market.vehicles
Xcomp.sys.amiga.marketplace
Xcomp.sys.apple2.marketplace
Xcomp.sys.next.marketplace
X...and many more ....
X
X
X
X
XPOST COMMAND
X
XPnews newsgroup title
X
Xe.g.
X
XPnews comp.sources.misc "newscan - a Netnews news article scanner"
X
X
Xor:
X
XPnews  #   prompts for comma separated list of newsgroups to post to
X
X-----------------------------END OF FILE----------------------------------
END_OF_FILE
if test 2012 -ne `wc -c <'distrib.txt'`; then
    echo shar: \"'distrib.txt'\" unpacked with wrong size!
fi
# end of 'distrib.txt'
fi
echo shar: End of shell archive.
exit 0



Sat, 10 May 1997 09:02:09 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. Announcing newscan 1.66 - NNTP NetNews Scanner!

2. newscan 1.45 - a NetNews network news article scanner

3. newscan 2.0 - a Perl Network News Scanner (Part 3 of 4)

4. newscan 2.0 - a Perl Network News Scanner (Part 2 of 4)

5. newscan 2.0 - a Perl Network News Scanner (Part 1 of 4)

6. newscan 2.0 - Announcement for Network News Scanner

7. Announcing newscan 1.105 - a Network News Scanner

8. newscan 1.105 - a Network News Scanner (Part 2 of 3)

9. newscan 1.45 (a news scanner) in comp.sources.misc

10. News -server login w/ News::NNTP

11. ANNOUNCE: Parse::RecDescent 1.66

12. ANNOUNCE: Text::Balanced 1.66

 

 
Powered by phpBB® Forum Software