Multi-line pattern matching in 5.001n 
Author Message
 Multi-line pattern matching in 5.001n

: as an example, take this address....

: FooFoo Widgets, Inc. (AD213412)
:       66 FooFoo Avenue
:       West Podunk, MA 02155
:       USA

: All this information will appear in every record except for the country,
: which might or might not appear

: I'd like to split this into

: Name: FooFoo Widgets, Inc
: Account: AD213412
: Street: 66 FooFoo Avenue
: Town: West Podunk
: State: MA
: Zip: 02155
: Country: USA

I'm using 5.001m, but this works for me:

-----------------------------------------
#!/usr/bin/perl -w
#               ^^    _always_ enable warnings from perl

$_ =<<ENDADR;
FooFoo Widgets, Inc. (AD213412)
        66 FooFoo Avenue
        West Podunk, MA 02155
        USA
ENDADR
&formatAddr();

$_ =<<ENDADR;
FooFoo Widgets, Inc. (AD213412)
        66 FooFoo Avenue
        West Podunk, MA 02155
ENDADR
&formatAddr();

$_ = "wattabummer";
&formatAddr();

###########################
sub formatAddr {
   if (/([^(]+)\(([^)]+)\)\s*(.*)\s*([^,]+),\s*([A-Z]{2})\s*(.*)\s*(.*)/) {
      print <<ENDREC;
Name: $1
Account: $2
Street: $3
Town: $4
State: $5
Zip: $6
Country: $7

ENDREC
   }
else
   {warn "formatAddr did not match address\n";}

Quote:
}

-----------------------------------------

--
  Tad McClellan,      Logistics Specialist (IETMs and SGML guy)

  Interesting trivia: If you took all the sand in North Africa and spread
     it out... it would cover the Sahara desert.



Tue, 14 Jul 1998 03:00:00 GMT  
 Multi-line pattern matching in 5.001n
I've been having a horrible time trying to get some multi-line matching
regexp's working in perl 5.001n.  I'm at a loss as to why its not
working...

as an example, take this address....

FooFoo Widgets, Inc. (AD213412)
        66 FooFoo Avenue
        West Podunk, MA 02155
        USA

All this information will appear in every record except for the country,
which might or might not appear

I'd like to split this into

Name: FooFoo Widgets, Inc
Account: AD213412
Street: 66 FooFoo Avenue
Town: West Podunk
State: MA
Zip: 02155
Country: USA

I've tried every conceivable method, but can't get the regexp to match
the whole record.  For various reasons, I can't do it line by line, I
need to do a pattern match on the whole record (there's more to each
record, but the address is a good start)

Any ideas?  \n and \r's in regexp's didn't seem to work, nor did the hex
values for them, or \s's....

  Matthew E Cable  /  Senior Systems Administrator

   (617) 864-7800  /  Cambridge, MA



Tue, 14 Jul 1998 03:00:00 GMT  
 Multi-line pattern matching in 5.001n

Quote:
>as an example, take this address....
> FooFoo Widgets, Inc. (AD213412)
> 66 FooFoo Avenue
> West Podunk, MA 02155
>            USA

and then do this to it....

Quote:
>sub formatAddr {
>   if (/([^(]+)\(([^)]+)\)\s*(.*)\s*([^,]+),\s*([A-Z]{2})\s*(.*)\s*(.*)/) {
>     print <<ENDREC;
>     Name: $1
>     Account: $2
>     Street: $3
>     Town: $4
>     State: $5
>     Zip: $6
>     Country: $7
>     ENDREC
>        }

As much as I admire that reg-exp string I wonder how useful it is in
the real world.  Maintaining it over a period might be difficult by
say a perl novice.

I have been shredding vendor supplied documents into flat files for a  
while and my strategy has been to define a begining and an end to each
record and then a test to see if the next beginning is really a new
record and not some sort of gargage.  Just by using a click on/off
variable.

Also a switch or something seems to ring a bell as to the newline
aspect; perl /would have to|does/ require a whole different way to
look at things considering the way it gobbles lines as to the original
spec, I think (awk, sed ??).

poohbear



Mon, 20 Jul 1998 03:00:00 GMT  
 Multi-line pattern matching in 5.001n

Quote:

>>as an example, take this address....

>> FooFoo Widgets, Inc. (AD213412)
>> 66 FooFoo Avenue
>> West Podunk, MA 02155
>>            USA

>and then do this to it....

>>sub formatAddr {
>>   if (/([^(]+)\(([^)]+)\)\s*(.*)\s*([^,]+),\s*([A-Z]{2})\s*(.*)\s*(.*)/) {
>>         print <<ENDREC;
>>         Name: $1
>>         Account: $2
>>         Street: $3
>>         Town: $4
>>         State: $5
>>         Zip: $6
>>         Country: $7
>>         ENDREC
>>            }

>As much as I admire that reg-exp string I wonder how useful it is in
>the real world.  Maintaining it over a period might be difficult by
>say a perl novice.

Then use an extended regexp and comment it

($name, $account, $number, $street, $town, $state, $zip, $country) =
/
    (.*?) \(    (?# everything up to but not including the first paren)
    (.*?) \)    (?# account number terminated by, but not including, a paren)
    \s*         (?# skip any whitespace, newlines included)
    (.*?) \s+   (?# stuff up to but not including whitespace, then skipped)
    (.*?) \n    (?# the rest of the line)
    (.*?) ,     (?# stuff up to but not including the next comma)
    \s*         (?# skip whitespace)
    ([A-Z]{2})  (?# two upper case letters)
    \s*         (?# skip whitespace)
    (.*?) \n    (?# rest of the line)
    \s* (.*?) \n(?# all the next line except leading whitespace)
    $           (?# must be no more)
/xs;

This particular regexp still isn't flexible enough, though.
In particular, that zip code format stuff is restricted to only the
US and perhaps a few other countries (Israel?). In other words,
the only reason I'm following up is to remind people that you
*can* make regexps maintainable with perl5's /.../x regexps.

--Malcolm

--

Oxford University Computing Services
"Widget. It's got a widget. A lovely widget. A widget it has got." --Jack Dee



Tue, 21 Jul 1998 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Pattern bug matching whitespace in multi-line match?

2. regex to match a multi line pattern

3. Multi line pattern matches?

4. Multi-line pattern matching?

5. pattern matching in multi-line strings fails under perl4.034

6. regex to match a multi line pattern

7. Multi-line pattern matching

8. Multi line pattern match

9. Can't Match Multi-Line Pattern

10. Multiline pattern matching with command line invocation

11. multiline, multi pattern match

12. bug in anchored, multiline pattern match

 

 
Powered by phpBB® Forum Software