pattern matching; see content file as one string 
Author Message
 pattern matching; see content file as one string

Hi

Hi,

I've been trying and trying, but I don't seem able to get this
working...

I've got a source file, which looks like this:

gk18neel; ;ZFR000003004;ZFR-T  <QL><QL></KOPZFR/3/17>
'Betizing om in Gryk' is in <QL>fleurich  stik dat g?ns oandacht
ferget<QL><QL><QL></TXTZFR/1/9><QL><QL><QL></TXTZFR/1/9><QL><QL>YCHTENBREGE
- It heart net sa dat de frin fan in toanielstik forteld wurdt, mar
it moat diz'kear dochs mar: Neeltsje en Klaes krije inoar! Dat is de
tige forsjenbere en winske frin fan 'Betizing om in Gryk'. It is ek
sahwat it iennichste dat wis is yn it stik. Fierders is it echt in
greate tizeboel.<QL><QL>Durk fan Houten (Jan de Jong) is fabrikant
fan boerejonges. D?rfoar hat hy rezinen nedich e

well... After the </KOPZFR/3/17> ther is a line break, but the rest is
one line.

I want the script to pick out the stuff between </KOPZFR/3/17> and the
first </TXTZFR/1/9>, but those numbers in these examples can change
with other (source)files, so I thought:

open (SOURCE, "$dir/$file")
or die "Couldn't open $file: $!\n";
$_ = <SOURCE> ;
close (SOURCE);

while ( m|</kopzfr/.+?>(.+?)</txtzfr/.+?>|is ) {
print $1;

Quote:
}

But it doesn't work. I played with it a bit (well, not a bit...) and I
think it just doesn't see whatever is in the file as one string. When
I simplified it to

while ( m|</k(.+?)fr/|is ) {
print $1;

Quote:
}

it was printing " opfz " alright, but over and over again. But at
least now it found something.

Can somebody help me out?

Thanks,

Lex
Lex Thoonen
http://www.*-*-*.com/



Wed, 28 Jul 2004 16:17:54 GMT  
 pattern matching; see content file as one string

Quote:

>I've been trying and trying, but I don't seem able to get this
>working...

You are fixated on the pattern and the pattern is not where
your problem is.

There are *two* things to consider when a pattern match is not
working as you expect it to: the pattern, and the string that
the pattern is to be matched against.

You have a problem with the string that is to be matched against.

Quote:
>I've got a source file, which looks like this:

>gk18neel; ;ZFR000003004;ZFR-T  <QL><QL></KOPZFR/3/17>
>'Betizing om in Gryk' is in <QL>fleurich  stik dat g?ns oandacht
>ferget<QL><QL><QL></TXTZFR/1/9><QL><QL><QL></TXTZFR/1/9><QL><QL>YCHTENBREGE
>well... After the </KOPZFR/3/17> ther is a line break, but the rest is

                                                                     ^^

Quote:
>one line.

No it isn't. The rest is on several lines. Did your posting software
break it for you? How helpful (not!).

Quote:
>I want the script to pick out the stuff between </KOPZFR/3/17> and the
>first </TXTZFR/1/9>,
>$_ = <SOURCE> ;

That reads a single line into $_.

Since one thing you want to match is on one line, and the other
thing is on another line, it will never match if you have only
a single line string to match against.

Quote:
>while ( m|</kopzfr/.+?>(.+?)</txtzfr/.+?>|is ) {
>print $1;
>}

>But it doesn't work.

        ^^^^^^^^^^^^^

What does "doesn't work" mean?

That is a worthless description of your observations of the
code's behavior.

What did you observe that leads you to believe that it isn't "working"?

Did it match when you were expecting it to not match?

Did it not match when you were expecting it to match?

Does it match things that you don't want to match?

Does it not match things that you do want to match?

Does it generate messages?

Does it dump core?

??

Quote:
>I played with it a bit (well, not a bit...) and I
>think it just doesn't see whatever is in the file as one string.

Pattern matches do not match against file contents, they match
against a string in memory. It is up to you to load the memory
from the file correctly.

Quote:
>Can somebody help me out?

The docs installed on your very own hard disk can help you out:

   perldoc -q matching

      "I'm having trouble matching over more than one line.  What's wrong?"

--
    Tad McClellan                          SGML consulting

    Fort Worth, Texas



Wed, 28 Jul 2004 17:07:44 GMT  
 pattern matching; see content file as one string

Quote:

> Hi,

> I've been trying and trying, but I don't seem able to get this
> working...

> I've got a source file, which looks like this:

> gk18neel; ;ZFR000003004;ZFR-T  <QL><QL></KOPZFR/3/17> 'Betizing om in
> Gryk' is in <QL>fleurich  stik dat g?ns oandacht
> ferget<QL><QL><QL></TXTZFR/1/9><QL><QL><QL></TXTZFR/1/9><QL><QL>YCHTENBREGE
> - It heart net sa dat de frin fan in toanielstik forteld wurdt, mar it
> moat diz'kear dochs mar: Neeltsje en Klaes krije inoar! Dat is de tige
> forsjenbere en winske frin fan 'Betizing om in Gryk'. It is ek sahwat
> it iennichste dat wis is yn it stik. Fierders is it echt in greate
> tizeboel.<QL><QL>Durk fan Houten (Jan de Jong) is fabrikant fan
> boerejonges. D?rfoar hat hy rezinen nedich e

> well... After the </KOPZFR/3/17> ther is a line break, but the rest is
> one line.

> I want the script to pick out the stuff between </KOPZFR/3/17> and the
> first </TXTZFR/1/9>, but those numbers in these examples can change with
> other (source)files, so I thought:

possibly this could help:

#!/usr/bin/perl -w
use strict;
use diagnostics;

open (INFILE, $ARGV[0]) or die "could not open '$ARGV[0]' $!";
while (<INFILE>)
{      
    s#(</KOPZFR/.*>|</TXTZFR/.*>)##g;
    print;

Quote:
}

close (INFILE);

--
Hou in het hele land rekening met aanvriezende mist in de ardennen.



Wed, 28 Jul 2004 18:42:08 GMT  
 pattern matching; see content file as one string

Quote:

> Hi

> Hi,

> I've been trying and trying, but I don't seem able to get this
> working...

> I've got a source file, which looks like this:

> gk18neel; ;ZFR000003004;ZFR-T  <QL><QL></KOPZFR/3/17>
> 'Betizing om in Gryk' is in <QL>fleurich  stik dat gfns oandacht
> ferget<QL><QL><QL></TXTZFR/1/9><QL><QL><QL></TXTZFR/1/9><QL><QL>YCHTENBREGE
> - It heart net sa dat de ?frin fan in toanielstik forteld wurdt, mar
> it moat diz'kear dochs mar: Neeltsje en Klaes krije inoar! Dat is de
> tige forsjenbere en winske ?frin fan 'Betizing om in Gryk'. It is ek
> sahwat it iennichste dat wis is yn it stik. Fierders is it echt in
> greate tizeboel.<QL><QL>Durk fan Houten (Jan de Jong) is fabrikant
> fan boerejonges. D^rfoar hat hy rezinen nedich e

> well... After the </KOPZFR/3/17> ther is a line break, but the rest is
> one line.

> I want the script to pick out the stuff between </KOPZFR/3/17> and the
> first </TXTZFR/1/9>, but those numbers in these examples can change
> with other (source)files, so I thought:

> open (SOURCE, "$dir/$file")
> or die "Couldn't open $file: $!\n";

In this case $_ contains the first line of the file only.
You should set $/ to undef to read all the file contents in $_.

local ($/) = undef;

Quote:
> $_ = <SOURCE> ;
> close (SOURCE);

g modifier should be added to ...|isg to
print all the matches.

- Show quoted text -

Quote:
> while ( m|</kopzfr/.+?>(.+?)</txtzfr/.+?>|is ) {
> print $1;
> }

> But it doesn't work. I played with it a bit (well, not a bit...) and I
> think it just doesn't see whatever is in the file as one string. When
> I simplified it to

> while ( m|</k(.+?)fr/|is ) {
> print $1;
> }

> it was printing " opfz " alright, but over and over again. But at
> least now it found something.

> Can somebody help me out?

> Thanks,

> Lex
> Lex Thoonen
> http://www.peng.nl

Vladimir Sarkisov


Wed, 28 Jul 2004 19:38:24 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Pattern matching spanning more than one line and substitution between two files -

2. combine two pattern matches to one match?

3. Pattern Match - substitute a string after the match

4. Pattern Match - Don't Understand this particular one

5. Help on pattern matching, very diffcult one

6. Which one is the best (pattern matching)

7. Matching pattern in more than one line !

8. pattern matching with non-existence of a string

9. pattern matching with a string variable?

10. pattern matching with string variable?

11. Help on pattern matching, string size

12. pattern matching in multi-line strings fails under perl4.034

 

 
Powered by phpBB® Forum Software