regexp: matching at least n chars out of a string of length m 
Author Message
 regexp: matching at least n chars out of a string of length m

 > $window = 30;
 > $want = 20;
 >

 >   my $i;
 >   my $result = "";
 >   for (i=0;i<length($string)-$window; $i++) {
 >     my $substring = substr($string,$i,$window);
*>     if ($substring =~ /some_magical_regexp/) {
|> #            stuff the results into $result;
|>             substr($result,$i,1) = "1";
|>     }
|>   }
|> }
|>
|> but I can't seem to figure out the right regexp.  Am I barking up the
|> wrong tree?
|
|  How about:
|--> if ($substring =~ /:{$want,}/) {

This should find at least $want occurrences of the previous character...
: in this case.

--
"Security through obscurity is no security at all."
                -comp.lang.perl.misc newsgroup posting

------------------------------------------------------------------------


* Schlumberger Geco-Prakla   (281) 596-1434 (Office Number)            *
* 1325 S. Dairy Ashford      (281) 596-1807 (Fax)                      *
* Houston, TX 77077                                                    *
------------------------------------------------------------------------



Mon, 27 Mar 2000 03:00:00 GMT  
 regexp: matching at least n chars out of a string of length m


Quote:
>Elijah
>------
>prechecking for zero colons in the whole thing might be useful

i knew i was missing something - my script should have checked the
portion of the string left after moving the window to see if there
were enough colons for a possible match.  this could save loads of
time in long strings but maybe increase the time for shorter string.

i had to read this after i sent mine :(

--

NY.pm - New York Perl M((o|u)ngers|aniacs)*  <URL:http://ny.pm.org/>
CGI Meta FAQ <URL:http://computerdog.com/CGI_MetaFAQ.html>
--still wondering about alt.fan.e-t-b



Mon, 27 Mar 2000 03:00:00 GMT  
 regexp: matching at least n chars out of a string of length m

Quote:

>I have  a collection of strings which contain many occurances of the
>character ':'.  I'd like to know, for each string, all the substrings of
>length $window than contain $want colons.

i think i understand this problem, so let me explain it back to you and
you tell me if i got it wrong:  for a substring of length N, find all
substrings that have *at least* M :'s, but perhaps more.  the : can be
anywhere in the substring.  the substring has to be of length N - it
can't be shorter or longer.

(i hope that is right)

given that, and knowing that using matching operators will be horribly
inefficient, i propose a solution which i think is close to linear in
length of the substring (my suspicion is that it is a little less than
linear for most cases).

the philosophy of the algorithm is to start at the beginning of the
string and check the substring of length N for the right number of
occurances of :.  if the right number are there, remember the offset
for that substring, then move the window along and check again, and
so on.

i haven't tested this extensively, so there might be quirks, but i
need to get back to the work i'm getting paid to do :)

#!/usr/bin/perl

#forgive me, i'm not a computer scientist...

my $string = 'asdF:asdf:safg:sg:saf:fsad::sfg:sdgfsdfg:::sdfg:sdfg:sfd';

my $window_size   = 15;
my $number_needed = 5;
my $offset        = 0;
my $length        = length $string;

while( ($length - $offset) >= $window_size )
   {  
   $current_window = substr($string, $offset, $window_size);

   $number_found = ($current_window =~ tr/:/:/);

   #if we found less than we needed, we can move the window
   #ahead by the number of extra finds needed since the intervening
   #substrings won't match.  e.g. if we found M-2, moving
   #the window one place will match *at most* M-1, so go ahead
   #and skip that one too.
   if( ($difference = ($number_needed - $number_found) ) > 0 )
      {
      $offset += $difference;
      }
   #if we found the exact number that is needed, remember this
   #starting position and advance the window by one place
   elsif( $difference == 0 )
      {

      }
   #if we found more than that needed, skip ahead to the
   #next place where there might be a possible match
   # e.g. if we found M+2, then we can move the window
   #two places ahead since moving
   #the window by one place will find *at least* M+1, and
   #so on.  we just need to record the offsets
   elsif( $difference < 0 )
      {
      while( $difference < 0 )
         {

         $difference++;
         }
      }
   }




   {
   print substr($string, $_, $window_size), "\n";
   }

__END__

There were 5 matches:
g:sg:saf:fsad::
:sg:saf:fsad::s
:saf:fsad::sfg:
fg:::sdfg:sdfg:
g:::sdfg:sdfg:s
:::sdfg:sdfg:sf

--

NY.pm - New York Perl M((o|u)ngers|aniacs)*  <URL:http://ny.pm.org/>
CGI Meta FAQ <URL:http://computerdog.com/CGI_MetaFAQ.html>



Mon, 27 Mar 2000 03:00:00 GMT  
 regexp: matching at least n chars out of a string of length m

Quote:

>I have      a collection of strings which contain many occurances of the
>character ':'.      I'd like to know, for each string, all the substrings of
>length $window than contain $want colons.

>I've been thinking along the lines of

>$window = 30;
>$want = 20;


>  my $i;
>  my $result = "";
>  for (i=0;i<length($string)-$window; $i++) {
>    my $substring = substr($string,$i,$window);
>    if ($substring =~ /some_magical_regexp/) {
>#            stuff the results into $result;
>            substr($result,$i,1) = "1";
>    }
>  }
>}

>but I can't seem to figure out the right regexp.  Am I barking up the
>wrong tree?

Yes. You really should use tr/:// to count the colons.

        if (($substring =~ tr/://)>=$want) { ... }

You can make it match EXACTLY $want colons, by replacing ">=" by "==".

PS. You could use { $substring =~ /(:)/g } too, as this returns an array
of all the matches. Just count them (hint: "scalar").

HTH,
Bart.



Tue, 28 Mar 2000 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Newbie: sprintf ('%-20s', 48 char string) returns 48 not 20 length string

2. HELP: deactivating regexp-active chars from string?

3. regexp for strings of chars

4. How can I pattern match a dos formated string (ctrl-m chars)

5. Match any char EXCEPT [char]

6. Regexp multiple string matches

7. Regexp to match a C-style string

8. Tricky: Generate matching string from regexp

9. Regexp fails matches @ string start

10. REGEXP: Problem matching string containing parens or brackets

11. regexp to return list of matches from string

12. reading char by char in a string

 

 
Powered by phpBB® Forum Software