Tied hash not scaling - advice?
I have a series of scripts, designed to help our operations center deal
with spam complaints. As part of this, it creates an audit trail. These
are arranged like: <auditdir>/6.29.99/209.162.144.3.blocked.
I've got a CGI tool that can search for a particular IP in the audit trail.
As time went by, using File::find for this got really slow, unsurprisingly.
So, I learned about tied hashes, and made an index file. All was well,
and it was blazingly fast.
Much more time has passed, and now it's completely broken. And when I
look at the index file:
-rw-r--r-- 1 kirbyk user 594935808 Jun 28 13:55 audit.index
Yikes! That's one big file!
I've deleted it and regenerated it, and it's still doing this.
So, clearly, I need to change something. I'm not sure if there's an
easy solution, or if I'll have to dig in and run something like a
mySQL database for the backend. It's really useful to be able to pull
up the reason we're blocking someone when an angry ISP is on the other
line. :-)
Here's the (pared down slightly) code. It worked, once upon a time:
#! /usr/local/bin/perl -w
use File::Find;
use POSIX;
use DB_File;
$auditDir = "/home/netbuild/postmaster/spam/audit";
$indexFile = "$auditDir/audit.index";
if (-e $indexFile) {
#if indexFile exists, don't start from scratch
$date[4]++;
$auditDir .= "/$date[4].$date[3].$date[5]";
}
tie %files, "DB_File", $indexFile, O_CREAT|O_RDWR, 0666, $DB_HASH;
find (\&add_file, $auditDir);
untie %files;
exit 0;
sub add_file {
$file = $File::Find::name;
return if -d $file;
return if $file eq $indexFile;
($site) = ($file =~ /.*\/.*\/(.*)$/);
($ip) = ($site =~ /(\d*\.\d*\.\d*\.\d*).*/);
return if !$ip;
$fileList = $main::files{$ip};
return if $fileList && $fileList =~ /$file/;
$fileList .= " $file";
$main::files{$ip} = $fileList;
Quote:
}
--
<*> Lips that taste of tears, they say, are the best for kissing - D. Parker