Still stumped on match regexps 
Author Message
 Still stumped on match regexps

Thankfully, Tad helped fix my initial matching problems but I've now
spent hours trying to craft 2 final desired additions to work.

According to a Perl Guide, inside the [] brackets, only the ^ - and ]
characters need to be escaped due to special meaning.  I've been sure to
escape the ^ and ]  characters but don't want the - involved.  Things
still don't work.

Reference the raw script (at bottom), I've commented out (M)atch 10 and
(M)atch 11 which are the problems.


the e-mail address:  


So, I've tried a bunch of combinations and am stuck now at (unworking):



e-mail address (and BEFORE the .TLD) but
also to include __ (underscore), and am similarly stuck at:



"HELP!" ... Thank you.

Eric
----

#!/usr/local/bin/perl5 -- -*-perl-*-

use warnings;
use strict;

#my $email = $FORM{'email'};              # use this line for live CGI Form
processing

comment out
email_check() if length $email >40    # most likely bogus or some
malicious code!

entries
              or $email =~ m/\s/      # delete some valid email addresses  :-)
              or $email =~ m/^[^A-Za-z0-9]/         # checks front part of
e-mail


              or $email =~ m/.*[^A-Za-z0-9]\./      # checks part right before
.TLD
              or $email !~ m/.*\.[A-Za-z]{2,4}$/    # checks for 2-4 character
TLD



weirdos before .TLD

(myowndomain)

sub email_check {

print "$email appears invalid\n";                         # temporary testing line - HTML
2 follow

Quote:
}



Fri, 17 Dec 2004 02:58:17 GMT  
 Still stumped on match regexps
Quote:

> Thankfully, Tad helped fix my initial matching problems but I've now
> spent hours trying to craft 2 final desired additions to work.

[blah blah blah, trying to make a regex for matching valid emails].

What's wrong with using Email::Valid->rfc822($email) ?

--

pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';



Fri, 17 Dec 2004 04:32:56 GMT  
 Still stumped on match regexps

Quote:

> What's wrong with using Email::Valid->rfc822($email) ?

Hello, Benjamin. I took a peek but this way will be easier to modify as
desired and only a few lines.  Plus...there *was* a problem pointed out
as mentioned in an earlier post.

BTW, do you ever sleep?  :-)

Regards,

Eric



Fri, 17 Dec 2004 03:52:45 GMT  
 Still stumped on match regexps
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Quote:
> Thankfully, Tad helped fix my initial matching problems but I've now
> spent hours trying to craft 2 final desired additions to work.

> According to a Perl Guide, inside the [] brackets, only the ^ - and ]
> characters need to be escaped due to special meaning.  I've been sure
> to escape the ^ and ]  characters but don't want the - involved.
> Things still don't work.

Whichever "Perl Guide" you're using is incorrect.  The backslash
character also needs to be escaped, as does whichever character you are
using as a match delimiter.

Quote:

> the e-mail address:  



Do you *really* get so much mail from spammers so incompetent that they
can't cob together a realistic-looking email address?  Just curious...

Quote:
> So, I've tried a bunch of combinations and am stuck now at
> (unworking):



\|  ?? You said | didn't need to be escaped, and you were correct.  I
expect that you're trying to match \ or | with this pair of characters.  
Backslashes need to be escaped, for rather obvious reasons.  \\|

Also, look at the final character before your closing bracket ].  A
forward slash!  Perl sees that and says "Aha! End of pattern", because
you started the pattern with a forward slash.  And then it complains that
it can't find a closing bracket to match the opening bracket that it did
find.  Escape the forward slash: \/

- --
Eric
print scalar reverse sort qw p ekca lre reh
ts uJ p, $/.r, map $_.$", qw e p h tona e;

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPR/FPWPeouIeTNHoEQKGeACdGC0gXQAAue9TPPW2GQnn5vezBXsAoOOL
yV59j3/ySTD+tjBKHlAXC3Uo
=D4le
-----END PGP SIGNATURE-----



Fri, 17 Dec 2004 03:53:22 GMT  
 Still stumped on match regexps

Quote:


> > What's wrong with using Email::Valid->rfc822($email) ?

> Hello, Benjamin. I took a peek but this way will be easier to modify
> as desired and only a few lines.  Plus...there *was* a problem pointed
> out as mentioned in an earlier post.

You mean winzip getting confused?

Save the file, rename it from blahblah.tar to blahblah.tgz.

Then open the .tgz file with winzip.

I thought I already told you that.

Quote:
> BTW, do you ever sleep?  :-)

Yes.  A better question might be, "Are the time and timezone on my
computer set correctly?"  And to that, I'm afraid I have to answer "no"
(well actually, I just fixed it).

Hmm, 11pm now, g'night all :)

--

pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';



Fri, 17 Dec 2004 04:01:49 GMT  
 Still stumped on match regexps

Quote:

> Reference the raw script (at bottom), I've commented out (M)atch 10 and
> (M)atch 11 which are the problems.

> the e-mail address:

> So, I've tried a bunch of combinations and am stuck now at (unworking):


> e-mail address (and BEFORE the .TLD) but
> also to include __ (underscore), and am similarly stuck at:


You don't want to kick out periods within the name part.


This is both a valid and common email address format.

You forgot to include a space for rejected characters.

Your methodology is overkill. There is no need for a
complex lengthy regex. A simple transliteration will
catch any character you do not want to pass.


email address, you want to kick it out. If not, your



Perl's substring function or Perl's split function
could be used for this. Use of split would preserve
your original input data.

You could increase efficiency by checking your entire
email address for invalid characters, then checking

My presumption is you are aware parsing for valid
email addresses is exceptionally difficult. You
may expect a lot of errors. Trick is to minimize
your error percentage rate.

A good programming practice, for this case study,
is to save copies of email addresses rejected for
human eye examination. Doing so, will reduce your
percentage error rate to zero.

Godzilla!
--
"...the integers are a subset of the rational numbers,
 so indeed, a whole number is a decimal number."

  - Perl FAQ Maintainer

TEST SCRIPT
___________

#!perl

## substring method
## make a working copy to protect input



 { print "Boss, This One Ain't Right #1"; }
else
 {

  if ($email_1 =~ tr/~!#%^&*()[]{}|,;"'<>? // > 0)
   { print "Boss, This One Ain't Right #2"; }

  if ($email =~ tr/[~!#%^&*()_[]{}|,;:"'<>? \/ // > 0)
   { print "Boss, This One Ain't Right #3"; }
 }

print "\n\nAll Done Boss.";

## split method:



 { print "Boss, This One Ain't Right #1"; }
else
 {

  if ($email_1 =~ tr/~!#%^&*()[]{}|,;"'<>? // > 0)
   { print "Boss, This One Ain't Right #2"; }

  if ($email_2 =~ tr/[~!#%^&*()_[]{}|,;:"'<>? \/ // > 0)
   { print "Boss, This One Ain't Right #3"; }
 }

print "\n\nAll Done Boss.";

## slight efficiency improvement
## check entire string
## then check for _ character



 { print "Boss, This One Ain't Right #1"; }
else
 {
  if ($email =~ tr/~!#%^&*()[]{}|,;"'<>? // > 0)
   { print "Boss, This One Ain't Right #2"; }


  if ($email_2 =~ tr/_// > 0)
   { print "Boss, This One Ain't Right #3"; }
 }

print "\n\nAll Done Boss.";



Fri, 17 Dec 2004 04:10:51 GMT  
 Still stumped on match regexps

                 ^                            ^
                 ^                            ^ end of m// operator here...

--
    Tad McClellan                          SGML consulting

    Fort Worth, Texas



Fri, 17 Dec 2004 04:52:04 GMT  
 Still stumped on match regexps

Quote:

> Whichever "Perl Guide" you're using is incorrect.  The backslash
> character also needs to be escaped, as does whichever character you are
> using as a match delimiter.

ARRRGGGHHH!!!  I was wondering about this ... it was a "Guide" I found
on the net since my own books (Especially Jeffrey's) are still buried in
storage somewhere.  By "match delimiter", you mean m/ /  ... the two
forward slashes?

Quote:

> Do you *really* get so much mail from spammers so incompetent that they
> can't cob together a realistic-looking email address?  Just curious...

No...it's not from spammers.  This is to weed out the turkey entries on
a web mailing list form and the jerkoffs just twiddling.

FYI (and you've probably seen this) too many spammers are now forging
the intended Recipient's e-mail address and so now I'm getting some

Chinese are the worst offenders, and have had communicationxs going with
the American Embassy in Beijing trying to get something done about the
problem.  Most likely, I've wasted my time.

Quote:

> \|  ?? You said | didn't need to be escaped, and you were correct.  I
> expect that you're trying to match \ or | with this pair of characters.
> Backslashes need to be escaped, for rather obvious reasons.  \\|

Ooops ;-(  Thanks. The "Guide" was terribly deficient.

Quote:

> Also, look at the final character before your closing bracket ].  A
> forward slash!  Perl sees that and says "Aha! End of pattern", because
> you started the pattern with a forward slash.  And then it complains that
> it can't find a closing bracket to match the opening bracket that it did
> find.  Escape the forward slash: \/

THANK YOU... that need "escaped" me.

Regards,

"Eric"



Fri, 17 Dec 2004 04:55:40 GMT  
 Still stumped on match regexps

Quote:


> You don't want to kick out periods within the name part.


> This is both a valid and common email address format.

Hmmm...I'm a bit confused about this.  I thought the "." (period)
right after m/ would match ANY character, and the "\." at the end
of the regexp is supposed to be for the DOT as in ".TLD" ... my
initial concern in starting this was not to axe some.machine.name, etc.

Quote:

> You forgot to include a space for rejected characters.

You mean I can leave a plain old "space" between any of characters
listed within the [] ??? I'm not quite sure what you mean by
"rejected" charcters, however.

Quote:

> Your methodology is overkill. There is no need for a
> complex lengthy regex. A simple transliteration will
> catch any character you do not want to pass.


> email address, you want to kick it out. If not, your




I hadn't considered that approach, but it sure makes sense!

Quote:

> Perl's substring function or Perl's split function
> could be used for this. Use of split would preserve
> your original input data.

> You could increase efficiency by checking your entire
> email address for invalid characters, then checking


AHHHhhhh SOOOooo.

Quote:

> A good programming practice, for this case study,
> is to save copies of email addresses rejected for
> human eye examination. Doing so, will reduce your
> percentage error rate to zero.

I analyzed the initial webform entries which is what
necessitated doing something to deal with the garbage
"screwing around" entries...especially from China ;-(

Thanks for the advice & assistance, Godzilla.

Regards,

Eric



Fri, 17 Dec 2004 05:04:03 GMT  
 Still stumped on match regexps

Quote:


>                  ^                            ^
>                  ^                            ^ end of m// operator here...

Tnx, Tad.  I also had to \\ the \ as the other Eric pointed out...and
Godzilla! suggested putting in a blank space for rejected characters
(which I still don't understand but gave it a try).

***  BTW, did you get my e-mail???

Anyway, I retweaked the two regexps to:



(and)


before .TLD

... but still got several unwanted results after testing the whole
enchalada:
















Godzilla! suggested using Perl's split function and some efficiency
alternatives.

This thing seems like it's soooooo close, and from what I can see, it
*should* work, but doesn't.  There still must be a Magic Bean that needs
plugging in???

Regards,

Eric



Fri, 17 Dec 2004 06:50:34 GMT  
 Still stumped on match regexps


Wed, 18 Jun 1902 01:00:00 GMT  
 Still stumped on match regexps

[snip]

Quote:
> Tnx, Tad.

[snip]

Quote:
> ***  BTW, did you get my e-mail???

Yes. I chose not to respond.

-----

When you switch from an alias to your real name, you should mention
that you are the alias.

It took me a couple moments to figure out that the email was from

-----

You duplicated the entire message in HTML.

HTML is for web servers. I am not a web server, I am a person.

My mail filter scores down emails that are duplicated in HTML. I
found your email when I checked emails that were in my "spam"
mailbox.

Using (more than) twice the bandwidth for zero gain is wasting resources.

-----

I seldom want to send email rather than post, but if I do want to
send email, and find that I cannot, I make a negative scorefile
entry for that poster.


for how to de-munge it. So I've added the corresponding scorefile
entry (on top of the general-purpose score rule that your posts
already triggered).

This is unfortunate, but a well-maintained scorefile is necessary
for my mental well-being.  :-(

-----
-----

While any of the above might have been enough for ignoration,
especially if I was "busy" as I was that week, the real reason
I did not respond to your email was:

I was too busy clearing the decks to take time off for the
YAPC 2002 conference, and:

   I do not generally answer Perl questions in email.

   I answer Perl questions on the newsgroup, so that the answer
   will be available to everyone.

   Hoarding answers in private emails does not serve the community.

   I volunteer my time for the Perl community.

   Individual help is billed at my usual rate.  :-)

-----

Quote:
> Godzilla! suggested

Please don't feed the troll.

(this was what I had hoped to email you about, as it has no Perl content)

--
    Tad McClellan                          SGML consulting

    Fort Worth, Texas



Fri, 17 Dec 2004 15:20:22 GMT  
 Still stumped on match regexps


Wed, 18 Jun 1902 01:00:00 GMT  
 Still stumped on match regexps
[snip]

Ehh?  I thought that underscores were valid in domains.  My mistake.

Quote:










> Godzilla! suggested using Perl's split function and some efficiency
> alternatives.

> This thing seems like it's soooooo close, and from what I can see, it
> *should* work, but doesn't.  There still must be a Magic Bean that
> needs plugging in???

Magic Bean:  Email::Valid.

--

pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';



Fri, 17 Dec 2004 16:53:32 GMT  
 Still stumped on match regexps


Wed, 18 Jun 1902 01:00:00 GMT  
 
 [ 18 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Stumped on Pattern Matching

2. Partial matching of regexps

3. Finding all regexps matches at once?

4. Multiline regexps with matches

5. matching word boundaries with regexps

6. Help needed: Regexps (pattern matching) with hashes?!

7. regexps with index(), regexps vs strings, clarify $*

8. Perl regexps compared to Python regexps

9. Multiple Levels of Array Dereferencing Have Me Stumped

10. stumped making assoc array of lists

11. Stumped at To: header parsing

12. Still Stumped

 

 
Powered by phpBB® Forum Software