Looking for AWK (or whatever) code to figure out genders of names 
Author Message
 Looking for AWK (or whatever) code to figure out genders of names

Once upon a time, billions of years ago, somebody posted something to a
newsgroup (and I think it was this one) that figured out (with some degree
of accuracy) the gender of a name (given the name as input).  It was pretty
involved and did a fair amount of analysis to render the verdict.  I thought
I had saved it somewhere, but I cannot find it now.  It would have been
called something like "gender.awk".  I've looked in deja with no luck so far.

Anyone heard of this?  Any ideas where I might find it?

(Just in case the above isn't clear, here's what a sample session might look
like:

% cat names
Bill
Joe
Marcia
Betty
Jules
% gawk -f gender.awk names
male
male
female
female
male
%



Fri, 24 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

...

Quote:
>make a data file:
>bob m
>sue f

>load it into an array and look it up:

Please send me your data file.  I will test it thoroughly and bill you in
the amount of 1 dollar per name not found.

In case, this wasn't clear, I am interested in an algorithmic solution, not
a table lookup.  As I said, I did see one, once upon a time.



Fri, 24 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:

> newsgroup (and I think it was this one) that figured out (with some degree
> of accuracy) the gender of a name (given the name as input).  It was pretty
> involved and did a fair amount of analysis to render the verdict.  I thought
> I had saved it somewhere, but I cannot find it now.  It would have been
> called something like "gender.awk".
> Anyone heard of this?  Any ideas where I might find it?


##
## Newsgroups: comp.lang.perl.misc,comp.lang.awk
## Subject: Re: Wanted: RegEx's for Guessing Sex from Name
## Date: Tue, 17 Dec 1996 02:57:53 GMT
##
## in response to:
##
## I once saw a very cute ~20 line public-domain AWK program that was very
## accurate at guessing the sexes (M/F) of a list of names.  It was just a
## bunch of obscure-looking regular expression pattern matches, but it
## worked amazingly well.

# gender.awk - guess the gender of a christian name
# by Scott Pakin 8/91 in CLM 12/91

{ debug = 1 }

# assume male
/./ {
        sex = "m"
        if (debug > 0) print "assume male" }

# most names ending in a/e/i/y are female
/^.*[aeiy]$/ {
        sex = "f"
        if (debug > 0) print "most names ending in a/e/i/y are female" }

# Allison and variations
/^All?[iy]((ss?)|z)on$/ {
        sex = "f"
        if (debug > 0) print "Allison and variations" }

# Cathleen, Eileen, Maureen
/^.*een$/ {
        sex = "f"
        if (debug > 0) print "Cathleen, Eileen, Maureen" }

# Barry, Larry, Perry
/^[^S].*r[rv]e?y?$/ {
        sex = "m"
        if (debug > 0) print "Barry, Larry, Perry" }

# Clive, Dave, Steve
/^[^G].*v[ei]$/ {
        sex = "m"
        if (debug > 0) print "Clive, Dave, Steve" }

# Carolyn, Gwendolyn, Vivian
/^[^BD].*(b[iy]|y|via)nn?$/ {
        sex = "f"
        if (debug > 0) print "Carolyn, Gwendolyn, Vivian" }

# Dewey, Stanley, Wesley
/^[^AJKLMNP][^o][^eit]*([glrsw]ey|lie)$/ {
        sex = "m"
        if (debug > 0) print "Dewey, Stanley, Wesley" }

# Heather, Ruth, Velvet
/^[^GKSW].*(th|lv)(e[rt])?$/ {
        sex = "f"
        if (debug > 0) print "Heather, Ruth, Velvet" }

# Gregory, Jeremy, Zachary
/^[CGJWZ][^o][^dnt]*y$/ {
        sex = "m"
        if (debug > 0) print "Gregory, Jeremy, Zachary" }

# Leroy, Murray, Roy
/^.*[Rlr][abo]y$/ {
        sex = "m"
        if (debug > 0) print "Leroy, Murray, Roy" }

# Abigail, Jill, Lillian
/^[AEHJL].*il.*$/ {
        sex = "f"
        if (debug > 0) print "Abigail, Jill, Lillian" }

# Janet, Jennifer, Joan
/^.*[Jj](o|o?[ae]a?n.*)$/ {
        sex = "f"
        if (debug > 0) print "Janet, Jennifer, Joan" }

# Duane, Eugene, Rene
/^.*[GRguw][ae]y?ne$/ {
        sex = "m"
        if (debug > 0) print "Duane, Eugene, Rene" }

# Fleur, Lauren, Muriel
/^[FLM].*ur(.*[^eotuy])?$/ {
        sex = "f"
        if (debug > 0) print "Fleur, Lauren, Muriel" }

# Lance, Quincy, Vince
/^[CLMQTV].*[^dl][in]c.*[ey]$/ {
        sex = "m"
        if (debug > 0) print "Lance, Quincy, Vince" }

# Margaret, Marylou, Miriam
/^M[aei]r[^tv].*([^cklnos]|([^o]n))$/ {
        sex = "f"
        if (debug > 0) print "Margaret, Marylou, Miriam" }

# Clyde, Kyle, Pascale
/^.*[ay][dl]e$/ {
        sex = "m"
        if (debug > 0) print "Clyde, Kyle, Pascale" }

# Blake, Luke, Mike
/^[^o]*ke$/ {
        sex = "m"
        if (debug > 0) print "Blake, Luke, Mike" }

# Carol, Karen, Sharon
/^[CKS]h?(ar[^lst]|ry).+$/ {
        sex = "f"
        if (debug > 0) print "Carol, Karen, Sharon" }

# Pam, Pearl, Rachel
/^[PR]e?a([^dfju]|qu)*[lm]$/ {
        sex = "f"
        if (debug > 0) print "Pam, Pearl, Rachel" }

# Annacarol, Leann, Ruthann
/^.*[Aa]nn.*$/ {
        sex = "f"
        if (debug > 0) print "Annacarol, Leann, Ruthann" }

# Deborah, Leah, Sarah
/^.*[^cio]ag?h$/ {
        sex = "f"
        if (debug > 0) print "Deborah, Leah, Sarah" }

# Frances, Megan, Susan
/^[^EK].*[grsz]h?an(ces)?$/ {
        sex = "f"
        if (debug > 0) print "Frances, Megan, Susan" }

# Ethel, Helen, Gretchen
/^[^P]*([Hh]e|[Ee][lt])[^s]*[ey].*[^t]$/ {
        sex = "f"
        if (debug > 0) print "Ethel, Helen, Gretchen" }

# George, Joshua, Theodore
/^[^EL].*o(rg?|sh?)?(e|ua)$/ {
        sex = "m"
        if (debug > 0) print "George, Joshua, Theodore" }

# Delores, Doris, Precious
/^[DP][eo]?[lr].*s$/ {
        sex = "f"
        if (debug > 0) print "Delores, Doris, Precious" }

# Anthony, Henry, Rodney
/^[^JPSWZ].*[denor]n.*y$/ {
        sex = "m"
        if (debug > 0) print "Anthony, Henry, Rodney" }

# Karin, Kim, Kristin
/^K[^v]*i.*[mns]$/ {
        sex = "f"
        if (debug > 0) print "Karin, Kim, Kristin" }

# Bradley, Brady, Bruce
/^Br[aou][cd].*[ey]$/ {
        sex = "m"
        if (debug > 0) print "Bradley, Brady, Bruce" }

# Agnes, Alexis, Glynis
/^[ACGK].*[deinx][^aor]s$/ {
        sex = "f"
        if (debug > 0) print "Agnes, Alexis, Glynis" }

# Ignace, Lee, Wallace
/^[ILW][aeg][^ir]*e$/ {
        sex = "m"
        if (debug > 0) print "Ignace, Lee, Wallace" }

# Juliet, Mildred, Millicent
/^[^AGW][iu][gl].*[drt]$/ {
        sex = "f"
        if (debug > 0) print "Juliet, Mildred, Millicent" }

# Ari, Bela, Ira
/^[ABEIUY][euz]?[blr][aeiy]$/ {
        sex = "m"
        if (debug > 0) print "Ari, Bela, Ira" }

# Iris, Lois, Phyllis
/^[EGILP][^eu]*i[ds]$/ {
        sex = "f"
        if (debug > 0) print "Iris, Lois, Phyllis" }

# Randy, Timothy, Tony
/^[ART][^r]*[dhn]e?y$/ {
        sex = "m"
        if (debug > 0) print "Randy, Timothy, Tony" }

# Beatriz, Bridget, Harriet
/^[BHL].*i.*[rtxz]$/ {
        sex = "f"
        if (debug > 0) print "Beatriz, Bridget, Harriet" }

# Antoine, Jerome, Tyrone
/^.*oi?[mn]e$/ {
        sex = "m"
        if (debug > 0) print "Antoine, Jerome, Tyrone" }

# Danny, Demetri, Dondi
/^D.*[mnw].*[iy]$/ {
        sex = "m"
        if (debug > 0) print "Danny, Demetri, Dondi" }

# Pete, Serge, Shane
/^[^BG](e[rst]|ha)[^il]*e$/ {
        sex = "m"
        if (debug > 0) print "Pete, Serge, Shane" }

# Angel, Gail, Isabel
/^[ADFGIM][^r]*([bg]e[lr]|il|wn)$/ {
        sex = "f"
        if (debug = 0) print "Angel, Gail, Isabel" }

# print the guess
{ print $0, sex }



Fri, 24 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names


Quote:

>> newsgroup (and I think it was this one) that figured out (with some degree
>> of accuracy) the gender of a name (given the name as input).  It was pretty
>> involved and did a fair amount of analysis to render the verdict.  I thought
>> I had saved it somewhere, but I cannot find it now.  It would have been
>> called something like "gender.awk".
>> Anyone heard of this?  Any ideas where I might find it?



Thank you, thank you!  That was exactly what I was looking for.

And I guess 1996 is not exactly billions of years ago, now is it?



Fri, 24 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:

> Once upon a time, billions of years ago, somebody posted something to
a
> newsgroup (and I think it was this one) that figured out (with some
degree
> of accuracy) the gender of a name (given the name as input).  It was
pretty
> involved and did a fair amount of analysis to render the verdict.  I
thought
> I had saved it somewhere, but I cannot find it now.  It would have
been
> called something like "gender.awk".  I've looked in deja with no luck

so far.

Check the source for the Perl Module Text::GenderFromName. It might
contain something you can translate to awk.

http://amaunet.informatik.uni-dortmund.de/cgi-
bin/CPAN/authors/id/JONO/Text-GenderFromName-0.102.tar.gz

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:

>Once upon a time, billions of years ago, somebody posted something to a
>newsgroup (and I think it was this one) that figured out (with some degree
>of accuracy) the gender of a name (given the name as input).  It was pretty
>involved and did a fair amount of analysis to render the verdict.  I thought
>I had saved it somewhere, but I cannot find it now.  It would have been
>called something like "gender.awk".  I've looked in deja with no luck so far.

>Anyone heard of this?  Any ideas where I might find it?

>(Just in case the above isn't clear, here's what a sample session might look
>like:

>% cat names
>Bill
>Joe
>Marcia
>Betty
>Jules
>% gawk -f gender.awk names
>male
>male
>female
>female
>male
>%

make a data file:
bob m
sue f

load it into an array and look it up:
awk ' BEGIN { load array here}
      array[$1] ~ /m/ { print "male" ; next}
      array[$1] ~ /f/ { print "female" ; next}
                      { print "$1 is not in data file"}' filna
This is untested and may not work.  And watch out for gender
nutrial names.

marc



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names


Quote:


> > Once upon a time, billions of years ago, somebody posted something
to
> a
> > newsgroup (and I think it was this one) that figured out (with some
> degree
> > of accuracy) the gender of a name (given the name as input).  It was
> pretty
> > involved and did a fair amount of analysis to render the verdict.  I
> thought
> > I had saved it somewhere, but I cannot find it now.  It would have
> been
> > called something like "gender.awk".  I've looked in deja with no
luck
> so far.

> Check the source for the Perl Module Text::GenderFromName. It might
> contain something you can translate to awk.

Now that I have had a glance at the source myself I see that this is in
fact a Perl implementation of an awk solution, probably the code you
were looking for. I quote:

"This is an adaptation of an 8/91 awk script by Scott Pakin in the
December 91 issue of Computer Language Monthly."

FWIW,
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:




> >> newsgroup (and I think it was this one) that figured out (with
some degree
> >> of accuracy) the gender of a name (given the name as input).  It
was pretty
> >> involved and did a fair amount of analysis to render the verdict.
I thought
> >> I had saved it somewhere, but I cannot find it now.  It would have
been
> >> called something like "gender.awk".
> >> Anyone heard of this?  Any ideas where I might find it?


> Thank you, thank you!  That was exactly what I was looking for.

You might still benefit from looking at the Perl module I mentioned. It
completes the algorithm with some necessary exceptions and also deals
better with the situation where it can't guess the gender.

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:


>Check the source for the Perl Module Text::GenderFromName. It might
>contain something you can translate to awk.

Blech!  Couldn't you just point me to some old 360 assembler code you have
laying around - probably more readable.

Anyway, the right, AWK, solution was posted here last night.  Thanks again
to Darrell!



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:


> > newsgroup (and I think it was this one) that figured out (with some degree
> > of accuracy) the gender of a name (given the name as input).  It was pretty
> > involved and did a fair amount of analysis to render the verdict.  I thought
> > I had saved it somewhere, but I cannot find it now.  It would have been
> > called something like "gender.awk".
> > Anyone heard of this?  Any ideas where I might find it?


> ##
> ## Newsgroups: comp.lang.perl.misc,comp.lang.awk
> ## Subject: Re: Wanted: RegEx's for Guessing Sex from Name
> ## Date: Tue, 17 Dec 1996 02:57:53 GMT
> ##
> ## in response to:
> ##
> ## I once saw a very cute ~20 line public-domain AWK program that was very
> ## accurate at guessing the sexes (M/F) of a list of names.  It was just a
> ## bunch of obscure-looking regular expression pattern matches, but it
> ## worked amazingly well.

> # gender.awk - guess the gender of a christian name
> # by Scott Pakin 8/91 in CLM 12/91

[ SNIP OF CODE ]

Cool, but it needs another exception case for names ending in "ie".
These are wrong.
Eddie female
Arnie female

Of course when you start giving it non-us names all rules go out the
window.
These are all wrong.

Ramzi female
Hiroshi female
Enrique female
Dilli female
Hari female
Chunhui female
Sri female
Britt female

--
/<eystroke



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names


...

Quote:
>These are wrong.
>Eddie female
>Arnie female

It doesn't handle nicknames/diminutives.  This should be listed in the
known limitations file (I'll contact the author on this; you need not
worry about it)  The assumption is made that (almost) any name that
ends in a pronounced vowel sound is female.  Obviously, this is easily
broken if you are allowed to send it Billy, Kenny, Johnny, etc.

Quote:
>Of course when you start giving it non-us names all rules go out the
>window.

Of course.  The specs say it works for *Christian* names.  Or didn't you
read that part?

Quote:
>These are all wrong.

Yup.


Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

Quote:



>...
>>make a data file:
>>bob m
>>sue f

>>load it into an array and look it up:

>Please send me your data file.  I will test it thoroughly and bill you in
>the amount of 1 dollar per name not found.

no.  But a nice try for some, ok lots of easy money.

Quote:

>In case, this wasn't clear, I am interested in an algorithmic solution, not
>a table lookup.  As I said, I did see one, once upon a time.

How would it handle gender nutral names like Renea(sp?)

marc



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names

..
Quote:
> . . . The assumption is made that (almost) any name that
>ends in a pronounced vowel sound is female. Obviously,
>this is easily broken if you are allowed to send it Billy,
>Kenny, Johnny, etc.
..
>Of course.  The specs say it works for *Christian* names.
>Or didn't you read that part?

..

*Christian* names? Do you mean English given names?

FWIW, many old testament male names end in pronounced
vowels, e.g., Joshua, Isaiah. It's also hard to distinguish
Muriel (f)  from Ezekiel (m). Then there's names from
languages other than English. Maybe no big deal outside the
US, but there are still quite a few old-world given names
in use in the US.

* Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names
What is the Perl Module? My newsreader missed something.
Sam
Quote:






> > >> newsgroup (and I think it was this one) that figured out (with
> some degree
> > >> of accuracy) the gender of a name (given the name as input).  It
> was pretty
> > >> involved and did a fair amount of analysis to render the verdict.
> I thought
> > >> I had saved it somewhere, but I cannot find it now.  It would have
> been
> > >> called something like "gender.awk".
> > >> Anyone heard of this?  Any ideas where I might find it?


> > Thank you, thank you!  That was exactly what I was looking for.

> You might still benefit from looking at the Perl module I mentioned. It
> completes the algorithm with some necessary exceptions and also deals
> better with the situation where it can't guess the gender.

> /Peter
> --
> -= Spam safe(?) e-mail address: pez68 at netscape.net =-

> Sent via Deja.com http://www.deja.com/
> Before you buy.



Sat, 25 May 2002 03:00:00 GMT  
 Looking for AWK (or whatever) code to figure out genders of names


Quote:
>..
>>Of course.  The specs say it works for *Christian* names.
>>Or didn't you read that part?
>..

>*Christian* names? Do you mean English given names?

It does say Christian names.  I'm not making this up.
Go back and read the source posted by Darrell.

Now, admittedly, I am making a little joke here.  I'm perfectly aware that
"Christian name" is an old-fashioned term for "first name".

But, yes, the point is that the algorithm was designed to work on standard,
American, white bread, first names.  If someone from some other part of the
world wants to put one together for their nation/culture, they are more than
welcome to do so.

Quote:
>FWIW, many old testament male names end in pronounced
>vowels, e.g., Joshua, Isaiah. It's also hard to distinguish
>Muriel (f)  from Ezekiel (m). Then there's names from
>languages other than English. Maybe no big deal outside the
>US, but there are still quite a few old-world given names
>in use in the US.

Well, if they are Old Testament names, then they aren't Christian
names, are they? (1)

Bet you never thought we'd get into talking religion (2) in
comp.lang.awk, did you?

(1) Interestingly enough, I just tested Joshua, and it got it right.
(2) Real religion, that is.  Not AWK vs. Perl or AWK vs. sed...



Sat, 25 May 2002 03:00:00 GMT  
 
 [ 28 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Outputting info from awk and using the info to name the awk output file

2. Whatever conversion to whatever...

3. AWK newbie is looking for a AWK help with his 1st program

4. Changing Field Names in Database - Looking 4 code

5. Looking for a simple program written in whatever the language is going to be called ;-)

6. Looking for figures from vhdl times

7. ] Trying to figure out how to parse smalltalk code

8. Object code performance: figures wanted

9. Help with code to figure number of pages....

10. Looking for VERILOG books (advanced HDL Coding / Coding for Synthesis)

11. Print outs of Richtext Streams

 

 
Powered by phpBB® Forum Software