extracting e-mail addresses from text file database 
Author Message
 extracting e-mail addresses from text file database

Hello everybody,

I want to dynamically populate a simple text file database from entries in a
form using Perl. This file will contain e-mail addresses, names and other
types of data. The information will be kept as lines separated by tabs or
pipes. My question is: Is there a simple way of extracting e-mail addresses
without extracting all the other data? Also, lets say that the e-mail
addresses are placed at the beginning of each line. Is it possible to
extract the information that comes with a particular address, if so how can
I get my program to do this?



Mon, 18 Aug 2003 20:59:59 GMT  
 extracting e-mail addresses from text file database

Quote:

>I want to dynamically populate a simple text file database from entries in a
>form using Perl. This file will contain e-mail addresses, names and other
>types of data. The information will be kept as lines separated by tabs or
>pipes. My question is: Is there a simple way of extracting e-mail addresses
>without extracting all the other data?

I think the fact that they are email addresses is not needed to
solve your problem.

I think you are asking:

   Is there a way of extracting a particular field from a
   tab-separated record?

Is that it?

If so, then yes, there is a simple way using a "list slice":

   my $email_addr = (split /\t/, $line)[3];  # get the 4th field

See the "Slices" section in perldata.pod for more info.

Quote:
>Also, lets say that the e-mail
>addresses are placed at the beginning of each line. Is it possible to
>extract the information that comes with a particular address,

Yes.

Quote:
>if so how can
>I get my program to do this?

   perldoc -f split

--
    Tad McClellan                          SGML consulting

    Fort Worth, Texas



Tue, 19 Aug 2003 07:18:52 GMT  
 extracting e-mail addresses from text file database

Quote:
>Hello everybody,

>I want to dynamically populate a simple text file database from entries in a
>form using Perl. This file will contain e-mail addresses, names and other
>types of data. The information will be kept as lines separated by tabs or
>pipes. My question is: Is there a simple way of extracting e-mail addresses
>without extracting all the other data? Also, lets say that the e-mail
>addresses are placed at the beginning of each line. Is it possible to
>extract the information that comes with a particular address, if so how can
>I get my program to do this?

Are you looking for a match expression that allows extracting all
email addresses arbitrarily scattered through your text file? If so,
there are Perl FAQs on this. To be careful and thorough you should
*not* just use something like

Note that some of us don't like to give much help on email address
extraction, because most of those who do it are engaged in Internet
abuse.

But I think you're after something different. If you plan to make
your text file a structured document, then here's an approach that
assumes that you write your data to file with a consistent format for
each record (line), say:

name[tab]email[tab]phone[tab]zip[eol]  # eol = end of line

No problem if a "field" is blank, just keep the total number of tabs
per line consistent.

#perl -w

use strict;

## Let's say you got your input this way:
use CGI;
my $q = CGI->new;

# name the fields (columns) you want

# make a string of the data separated by tabs

## Then you append this line to your file:
open IN, ">>myfile" or die "Can't open datafile: $!\n";
print IN $dataline, "\n";
close IN;

## Then, skipping a few steps here, you read that line back out of
the file when you want to do something with it, so you now have that
same line in a scalar variable called, say, $db_line:

chomp $db_line;

## You now have each datum as a member of an ordered list (array),
which you can access like this:

print <<NOTE;
Hello $data[0]:

Thanks for inquiring about our secret Perl recipes.

Since you live in such a wealthy zip code area, $data[3], we'd love
to share our stuff with you.

Which is better for you, an email message sent to $data[1], or a
phone call at $data[2]?

NOTE

## Or, you could populate a hash from that same line:


## then each field's value is accessed via hash lookup, a la:

print "Hello, $info{name}:\n";

#etc.

In this approach, "extracting" the email address only requires that
you know what the field separator is (tab, here) and the order of the
fields in the line (which in this case is set by the order of the

Back where I said "skipping a few steps", the main question is how
you fetched the specific line you wanted. The best way to do this
depends on the size of the file. If it's a small file ( < 50k ), read
it into an array, and use grep to find the line. Knowing what column
you're looking in lets you use the field separator as part of thw
grep pattern:


close DB;

# find Heloise in first field
my $searchname = "Heloise";

# get her email from second field
my $heloise_email = (split "\t" => $the_line)[1];

Note that if there is more than one Heloise in your list, the above
will give you the first one it finds; grep returns a list, which is
why $the_line is in parentheses. Also, as written, the grep pattern
requires an exact match by case and allows nothing else in the first
field, not even spaces.

Also note that my use of =>, as in
   split "\t" => $db_line;
is nothing esoteric. => is just a fancy comma, but I find it a visual aid.

There are many ways to do all of the above operations, and plenty of
other ways to organize data. Structure and access methods should be
shaped by the nature of your data, and what you want to do with it.

HTH

1;

--

   - Bruce

__bruce_van_allen__santa_cruz_ca__



Tue, 19 Aug 2003 10:58:05 GMT  
 extracting e-mail addresses from text file database

Steeve> My question is: Is there a simple way of extracting e-mail addresses
Steeve> without extracting all the other data? Also, lets say that the e-mail
Steeve> addresses are placed at the beginning of each line. Is it possible to
Steeve> extract the information that comes with a particular address, if so how can
Steeve> I get my program to do this?

My question is: since an email address can contain insignificant
whitespace, how will you know when the address has ended?  You'll have
to look for a subset of legal email addresses in order for it to work.

If you wanna grab up to the first whitespace, a simple split on /\s+/ will
do it.

print "Just another Perl hacker,"

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Tue, 19 Aug 2003 11:41:35 GMT  
 extracting e-mail addresses from text file database
Thanks guys for your comments they were really useful and educative. I will
try to apply them and hopefully it's gonna work.

Merci beaucoup

Steeve



Tue, 19 Aug 2003 22:41:06 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. extracting e-mail addresses from text file database

2. Are you an old Clipper or Xbase programmer?

3. extracting e-mail address from a text file

4. Extracting email addresses from a file

5. regex - slurp file and extract email addresses

6. Tip: Indenting large blocks of code.

7. Installing BDE without any installation wizards?

8. Extract IP addresses from text file

9. Extract email address

10. Extract email address from user profile?

11. *** PLEASE HELP *** Extracting the return address on an email message

12. Two pages with VESA mode

 

 
Powered by phpBB® Forum Software