Matching two patterns in a DB. (general expressions...) 
Author Message
 Matching two patterns in a DB. (general expressions...)

Good day everybody!

I have a database file, and I would like to match two different
pattern.  The DB file is separated like this:

^Belement one^Aelement two^Aelement 3^Aelement 4^Aelement 5^A and etc.

I want to match the first element and the fifth one.

So far I have the following code writen:

open(LOG,"$log");
$line=(<LOG>);

while($line ne ""){

if($line=~/^[^B]\b$element1\b[^A]\S[^A]\S[^A]\S[^A]$element2[^A]/){
            $matched_line=$line;
        }
        $line=(<LOG>);

Quote:
}

close(LOG);

It does not work for me.  However, the first part, when I take out
everything after the first [^A], it works.  Is using \S good here,
should I have a \ in front of the ^?

I would greatly appreciate any help you can send my way..

Thanks a million,

Dave Leger



Sat, 08 May 1999 03:00:00 GMT  
 Matching two patterns in a DB. (general expressions...)

Quote:

> Good day everybody!

> I have a database file, and I would like to match two different
> pattern.  The DB file is separated like this:

> ^Belement one^Aelement two^Aelement 3^Aelement 4^Aelement 5^A and etc.

> I want to match the first element and the fifth one.

> So far I have the following code writen:

> open(LOG,"$log");
> $line=(<LOG>);

> while($line ne ""){

> if($line=~/^[^B]\b$element1\b[^A]\S[^A]\S[^A]\S[^A]$element2[^A]/){
>             $matched_line=$line;
>         }
>         $line=(<LOG>);
> }
> close(LOG);

> It does not work for me.  However, the first part, when I take out
> everything after the first [^A], it works.  Is using \S good here,
> should I have a \ in front of the ^?

> I would greatly appreciate any help you can send my way..

> Thanks a million,

> Dave Leger


Comments. 1. [^A] is a character class. It matches any *single* character,
             which isn't "A".
          2. \S matches a *single* character which isn't a "white-space"
             character.

In order to match multiple non-space characters, you would use \S+ (if there
must be at least one), or \S* (if it's permissible to have none). These
patterns will find the longest string for which this regex is true; to find
the shortest one use \S+? etc.

In brief to make your code work, you should remove the [] characters (you
want to match ^A literally (I think), and put +? after each of the \S strings
(\S+? matches minimally, so ^A\S+?^A would match between pairs of ^A's (If you
leave out the ? it will matched longer strings).

I'd write (as a first pass)

open (LOG, "<$log") or die "Can't open LOGFILE, $log";
while (<LOG>) {
   my ($first, $second) = (/^\^B(.+?)(?:\^A.*?){3}\^A(.*?)\^A/);
   print "First is $first\n";
   print "Second is $second\n";

Quote:
}

I haven't tested this, however.

(/^\^B(.+?)(?:\^A.*?){3}\^A(.*?)\^A/)

I'll explain this.

The first ^ matches the start of the line.
\^B matches "^B"
(.+?) matches as many characters as possible before the "next proper" character.
(?: ... ) causes bracketing (but doesn't keep the result for later use as
back-references). \^A.*? inside this construct matches strings that start with
"^A" up to the next "^A". We want to match and ignore 3 of these, where the {3}
comes from after the bracket.
The next \^A matches the fourth "^A" (i.e. the start of the fifth field). We do
want this, so we surround the .*? with a pair of brackets. The final \^A
is the start of the sixth field.

Our pattern match returns two strings, the first and the fifth fields, which
we can assign to $first and $second.

Bob

Bob

--
All is flux, nothing is still; nothing endures but change
- Heraclitus



Sun, 09 May 1999 03:00:00 GMT  
 Matching two patterns in a DB. (general expressions...)

: Good day everybody!
:
: I have a database file, and I would like to match two different
: pattern.  The DB file is separated like this:
:
: ^Belement one^Aelement two^Aelement 3^Aelement 4^Aelement 5^A and etc.
:
: I want to match the first element and the fifth one.
:
: So far I have the following code writen:
:
: open(LOG,"$log");
: $line=(<LOG>);
:
: while($line ne ""){
:
: if($line=~/^[^B]\b$element1\b[^A]\S[^A]\S[^A]\S[^A]$element2[^A]/){
:             $matched_line=$line;
:         }
:         $line=(<LOG>);
: }
: close(LOG);
:
:
: It does not work for me.  However, the first part, when I take out
: everything after the first [^A], it works.  Is using \S good here,
: should I have a \ in front of the ^?
:

Depending on the size of the database, I recommend you do a "screening"
search, then only if it matches teh screening search check to
see if the fields correspond.

You have some regex problems as well. You really don't need the regex --
split() will work better for your format.

Try something like this:

while(<LOG>) {
        # here is the screening search.  If you have more
        # patterns than this, I recommend using the match_any
        # routine from the FAQ
        next unless /$element1/o;
        next unless /$element2/o;

        $line = $_;
        s/^\cB//;  # Strip your leading CTRL-B


                if $fields[0] eq $element1 and
                   $fields[4] eq $element2;

Quote:
}


the search. If you are sure that your elements will always be in
sequence, you could replate the first two lines with /$element1.*$element2/o.
(The /o is needed to prevent recompiling the regex on every loop
iteration.)

--
Regards,                                                      ___       ___
Mike Heins     [mailed and posted]  http://www.iac.net/~mikeh|_ _|____ |_ _|
                                    Internet Robotics         | ||  _ \ | |
This post reflects the              Oxford, OH  45056         | || |_) || |

                                    513.523.7621 FAX 7501        |_| \_\  



Sun, 09 May 1999 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. combine two pattern matches to one match?

2. Regular Expressions/Pattern Matching/Unordered pattern

3. indexing versus pattern matching, or combining the two

4. Matching two patterns at once

5. Pattern matching spanning more than one line and substitution between two files -

6. General Tool(s) for browsing/editing DB tables?

7. General Tool(s) for browsing/editing DB tables?

8. pattern matching across two lines

9. pattern matching across two lines

10. Regular Expressions: Matching words contained within character patterns?

11. loops, conditional expressions, and pattern match variables

12. PERLFUNC: m// - match a string with a regular expression pattern

 

 
Powered by phpBB® Forum Software