
fast ways to count substrings
Previously, File::Tail would only handle single line entries delmited
with a hardcoded "\n". The program counted lines using tr/\n//. Since
i wanted to handle multicharacter delimiters, i needed to count using
a different technique. File::Tail is often used on rapidly growing
logs, so the developer will not include my patch unless I can verify
that it is close to the speed of tr/\n//. I attempted a variety of
techniques with pure perl and then resorted to Inline.pm in order to
make use of theorectically faster C. The benchmarks from my 200mhz
Pentium are enclosed as well as the benchmarking code including the C
routines. My question to the group - "did I miss any faaast techniques
which can be used to count either single character or multi character
delimiters in a string of data?? the delimiter needs to be variable at
runtime. Actually File::Tail is object oriented, so the delimiter is
encapsulated in the File::Tail object".
Benchmark: timing 300 iterations of c_strncmp, c_strncmp_M, c_strstr, c_strstr_M, m_array,
m_array_M, m_for, m_for_M, m_while, m_while_M, tr...
c_strncmp: 7 wallclock secs ( 7.44 usr + 0.00 sys = 7.44 CPU)
c_strncmp_M: 8 wallclock secs ( 7.31 usr + 0.00 sys = 7.31 CPU)
c_strstr: 1 wallclock secs ( 1.36 usr + 0.00 sys = 1.36 CPU)
c_strstr_M: 1 wallclock secs ( 1.18 usr + 0.00 sys = 1.18 CPU)
m_array: 14 wallclock secs (12.25 usr + 0.00 sys = 12.25 CPU)
m_array_M: 5 wallclock secs ( 4.91 usr + 0.00 sys = 4.91 CPU)
m_for: 12 wallclock secs (11.60 usr + 0.00 sys = 11.60 CPU)
m_for_M: 5 wallclock secs ( 4.91 usr + 0.00 sys = 4.91 CPU)
m_while: 8 wallclock secs ( 8.41 usr + 0.00 sys = 8.41 CPU)
m_while_M: 4 wallclock secs ( 3.94 usr + 0.00 sys = 3.94 CPU)
tr: 2 wallclock secs ( 1.31 usr + 0.00 sys = 1.31 CPU)
=== check counts ===
_m_while_M 2000
_c_strncmp 6000
_m_for 6000
_m_array 6000
_c_strstr 6000
_m_while 6000
_c_strncmp_M 2000
_tr 6000
_m_for_M 2000
_m_array_M 2000
_c_strstr_M 2000
----- snip -----
#!/usr/bin/perl -w
use Benchmark('timethese');
use Inline C;
my $tststr="abcdefghijklmnopqrstuvwxyz01234567890\n";
my $dta=($tststr x 2000).( ($tststr."\n") x 1900).("\n" x 200);
my %cnt=();
my $_tr = sub {
my $cnt=$dta=~tr/\n//;
$cnt{_tr}=$cnt;
Quote:
};
my $_m_while = sub {
my $cnt=0;
my $rcdsep="\n";
while ($dta=~m/$rcdsep/g) {$cnt++};
$cnt{_m_while}=$cnt;
Quote:
};
my $_m_for = sub {
my $cnt=0;
my $rcdsep="\n";
$cnt++ for($dta=~m/$rcdsep/g);
$cnt{_m_for}=$cnt;
Quote:
};
my $_m_array = sub {
my $cnt=0;
my $rcdsep="\n";
$cnt{_m_array}=$cnt;
Quote:
};
my $_m_while_M = sub {
my $cnt=0;
my $rcdsep="\n\n";
while ($dta=~m/$rcdsep/g) {$cnt++};
$cnt{_m_while_M}=$cnt;
Quote:
};
my $_m_for_M = sub {
my $cnt=0;
my $rcdsep="\n\n";
$cnt++ for($dta=~m/$rcdsep/g);
$cnt{_m_for_M}=$cnt;
Quote:
};
my $_m_array_M = sub {
my $cnt=0;
my $rcdsep="\n\n";
$cnt{_m_array_M}=$cnt;
Quote:
};
my $_c_strncmp = sub {
my $rcdsep="\n";
my $cnt=c_strncmp($dta,$rcdsep);
$cnt{_c_strncmp}=$cnt;
Quote:
};
my $_c_strncmp_M = sub {
my $rcdsep="\n\n";
my $cnt=c_strncmp($dta,$rcdsep);
$cnt{_c_strncmp_M}=$cnt;
Quote:
};
my $_c_strstr = sub {
my $rcdsep="\n";
my $cnt=c_strstr($dta,$rcdsep);
$cnt{_c_strstr}=$cnt;
Quote:
};
my $_c_strstr_M = sub {
my $rcdsep="\n\n";
my $cnt=c_strstr($dta,$rcdsep);
$cnt{_c_strstr_M}=$cnt;
Quote:
};
$_tr->(); # seed it??
timethese(300,
{ tr => $_tr,
m_while => $_m_while, m_for => $_m_for, m_array => $_m_array,
m_while_M => $_m_while_M, m_for_M => $_m_for_M, m_array_M => $_m_array_M,
c_strncmp => $_c_strncmp, c_strncmp_M => $_c_strncmp_M,
c_strstr => $_c_strstr, c_strstr_M => $_c_strstr_M
}
);
print "\n=== check counts ===\n";
printf("%18s %5d\n",$_,$cnt{$_}) for (keys %cnt);
__END__
__C__
#include <string.h>
int c_strncmp(char * buf, char * sep){
int cnt = 0;
int seplen = strlen(sep);
for(; *buf ; buf++)
if(!strncmp(buf,sep,seplen)){
cnt++;
buf+=seplen-1;
}
return cnt;
Quote:
}
int c_strstr(char * buf, char * sep){
int cnt = 0;
int seplen = strlen(sep);
for(; *buf; buf+=seplen, cnt++)
buf=strstr(buf,sep);
return cnt;
Quote:
}
---- snip ----
thanks in advance!!
___cliff rayman___cliff_AT_rayman.com___