recursive regexp 
Author Message
 recursive regexp

I have a file on a single line:

xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.

I need to extract string1, string2 ...string20 and put them on a new file.
How to do this?

Thanks

Luciano



Sun, 16 May 2004 18:53:13 GMT  
 recursive regexp

Quote:
> I have a file on a single line:

What, pray-tell, is a "file" on a single line?

Quote:
> xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
> text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.

> I need to extract string1, string2 ...string20 and put them on a new
> file. How to do this?

Suppose you had that line stored in "my $stringy", then you might try this:

while ($stringy =~ /<prefix>(.+?)<suffix>/mgi) {
    print "Hi, I am $1\n";

Quote:
}

Ommit "i" (case-insensitive), "m" (across newline chars) at your descretion.

- Mark



Sun, 16 May 2004 20:29:29 GMT  
 recursive regexp

Quote:

> Subject: recursive regexp

Nothing to do with recusion here.

Quote:
> I have a file on a single line:

> xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
> text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.

> I need to extract string1, string2 ...string20 and put them on a new file.
> How to do this?

perl -ne 'print /<prefix>(.*?)<suffix>/g' <a_file >a_new_file

--
     \\   ( )
  .  _\\__[oo

 .  l___\\
  # ll  l\\
 ###LL  LL\\



Sun, 16 May 2004 20:26:10 GMT  
 recursive regexp

Quote:

> I have a file on a single line:

> xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
> text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.

> I need to extract string1, string2 ...string20 and put them on a new
> file. How to do this?

Assuming <prefix> can be taken to have the unique characteristic of
starting  some stringN, I think you could:
open(IN, "fn");
$input = <>;
close IN;

open(OUT, ">strings.out");

        print $_\n" if defined $_;

Quote:
}

close OUT;

--
RobC
Please note I know only enough perl to be dangerous, certainly not enough
to take what I write as gospel.  My suggestions are just that,
suggestions, not answers.  Don't ever simply cut and paste any code I've
written and run it.  Nonetheless, I hope my contibution was helpful.  :)



Sun, 16 May 2004 21:31:37 GMT  
 recursive regexp


  >> I have a file on a single line:
  >>
  >> xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
  >> text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.
  >>
  >> I need to extract string1, string2 ...string20 and put them on a new
  >> file. How to do this?

  s> open(IN, "fn");

always check open for failure, even in examples. you never know who may
read this.



why the assignment to $_? map will just return the same value and that

and what's with the null grab? i think you mean something more like
this:


let the normal booleans work for you and test the regex for success.

  s> open(OUT, ">strings.out");

  s>         print $_\n" if defined $_;

the if is not needed as the code above will not have undef values in the
map results.

  s> Please note I know only enough perl to be dangerous, certainly not
  s> enough to take what I write as gospel.  My suggestions are just
  s> that, suggestions, not answers.  Don't ever simply cut and paste
  s> any code I've written and run it.  Nonetheless, I hope my
  s> contibution was helpful.  :)

since that is a signature and will be posted often, please spell check
it.

uri

--

-- Stem is an Open Source Network Development Toolkit and Application Suite -
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org



Sun, 16 May 2004 21:45:51 GMT  
 recursive regexp

Quote:

>i think you mean something more like
>this:



You don't need the ?:. Just


will do, as /.../ returns an empty list if the match failed. See:



        ($\, $,) = ("\n", "+");

-->
        1+2+3

--
        Bart.



Sun, 16 May 2004 22:39:54 GMT  
 recursive regexp


  >> i think you mean something more like
  >> this:
  >>

  BL> You don't need the ?:. Just


  BL> will do, as /.../ returns an empty list if the match failed. See:



  BL>        ($\, $,) = ("\n", "+");

  -->
  BL>        1+2+3

good point. but even better were the earlier regexes that just extracted
the strings. doing a split and then a match on each element is long
winded.

uri

--

-- Stem is an Open Source Network Development Toolkit and Application Suite -
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org



Sun, 16 May 2004 23:35:42 GMT  
 recursive regexp
Quote:

>I have a file on a single line:

>xxxxx(variable text)xxx<prefix>string1<suffix>xxxxx(variable
>text)xxx<prefix>string2<suffix>xxx ... and so on for 20 times.

>I need to extract string1, string2 ...string20 and put them on a new file.
>How to do this?

Sounds like it may be a good job for split. Is that a very close sample of
data?

I'm going to assume that xxxxx(variable text)xxxxxxxx really means just a
bunch of insignificant stuff, not necessarily with xxx's or ( ) in it?

And I'm going to assume that <prefix> and <suffix> really do exist. Is it
the same prefix and suffix for both parts?

Finally, I'm going to assume that the line is "well-formed" in the sense
that all <prefix> really are followed by a <suffix>

One problem you'll have with a pure regex solution is greediess of a .* part
you might put in there. For example, with this data:

<prefix>string1<suffix>junk<prefix>string2<suffix>

if you apply a pattern

/<prefix>.*<suffix>/ then it will capture everything from the first suffix
to the 2nd suffix. That's probably not what you want ;-)

If you use split you can completely bypass the problem with having to
specify "the bits in between". Split will use your simpler pattern to find
boundaries between bits:

If there really are < and > characters around the suffixes (are you
processing HTML?) then you can combine the pattern:

However, that's a bit defective, because the ( ) used to group the two parts
will seriously affect the results coming out of split. In particular, the
<prefix> and <suffix> will be returned as part of the array, and that's just
noise if you don't want to extract those bits. This happens because ( ) is
used to capture parts of a pattern match, and split takes care to return
them if you act like you want to capture them. The pattern can be 'fixed"
with

    /<(?:prefix|suffix)>/

because (?: something ) is special regex syntax saying "group this stuff
into a unit but don't remember it".

If you agree that this final pattern is uglier and barely worth the effort
(after all, it now doesn't save you any typing) then I'd stick with the
original /<prefix>|<suffix>/

Of course, if you really are matching some simple HTML, then usually the
suffix is just /prefix so you can use this pattern:

    /<\/?cmd>/

Anyway, I digress slightly.

Split will return all the bits on either side of the pattern. So, for your
data you'll get a list returned where every 2nd element is interesting and
the others are not interesting. So:



hmmmmmm - I'm looking for a one-line way to extract the bits. Bit I can't
think of something sweet and simple and effective. Possibly a really crazy
one-liner could be worked out but that's not pragmatic programming no matter
how much fun it is. So I'll use a simple loop:




    }


Actually, I'm starting to think of something that's almost a one-liner, and
it involves the grep operator, but it will only be a re-hash of the for loop
so I don't see the point. No doubt this is but a small part of your total
problem so it's time to move on and do the real work, right?
--
Space Corps Directive #723
Terraformers are expressly forbidden from recreating Swindon.
    -- Red Dwarf



Mon, 17 May 2004 02:42:41 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. recursive regexp

2. Newbie: Recursive data structure with recursive subroutine.

3. Recursive subroutine output to recursive subroutine problem

4. Optimize regexp for word list (trie regexp)

5. regexp of regexp ?

6. recursive listing

7. Help with Recursive Opens (Win NT)

8. Recursive directory handles?

9. recursive function call

10. recursive substitute expressions..

11. recursive SELECT on table - unblessed reference

12. recursive directory creation

 

 
Powered by phpBB® Forum Software