probability distribution function in python
Author Message
probability distribution function in python

Hello,

I'm doing something concerning probability distribution function. Does anyone can resolve the factorial function  ==> n! (the denominator of Poisson DF)since it will overflow the integer limit.

Or, where can I find it in any python modules or library? Thanks.

tawee

Sat, 27 Sep 2003 14:50:25 GMT
probability distribution function in python

"""
I'm doing something concerning probability distribution function. Does
anyone can resolve the factorial function  ==> n! (the denominator of
Poisson DF)since it will overflow the integer limit.
"""

You can just use Python 'longs' (unlimited-length integers).  Try:

D:\ian>python
Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32

Quote:
>>> def fac(n):

...     if n<=0: return 1L
...     return n*fac(n-1)
...
Quote:
>>> fac(100)

9332621544394415268169923885626670049071596826438162146859296389521759999322
9915
6089414639761565182862536979208272237582511852109168640000000000000000000000
00L

The 'L' at the end of the first line of fac() is crucial -- it
ensures all results will be long (unlimited-precision) ints.

"""
Or, where can I find it in any python modules or library? Thanks.
"""

http://gmpy.sourceforge.net (currently stalled, but I'll get
back to polishing it up for release eventually -- it does appear
to be very solid) wraps the GMP library, which includes a very
speedy implementation of factorials (and quite a few other
number-theoretical functions).

Alex

Sat, 27 Sep 2003 19:12:49 GMT
probability distribution function in python
It is numerically a bad idea to compute the factorial directly. Most often
it is encountered in expressions, where two similarly huge numbers are
divided, resulting in a quite normal (not so huge) number.

x^n/n! should be computed as exp(n*Ln(x) - Ln(n!)), where Ln is a natural
logarithm, and Ln(n!) can be computed using Stirling expansion for a
factorial (see Abramowitz, Stegun, Handbook of mathematical functions - if I
remember correctly the reference)

Best regards,

Tomasz Lisowski

Hello,

I'm doing something concerning probability distribution function. Does
anyone can resolve the factorial function  ==> n! (the denominator of
Poisson DF)since it will overflow the integer limit.

Or, where can I find it in any python modules or library? Thanks.

tawee

Sat, 27 Sep 2003 20:37:35 GMT
probability distribution function in python
Dear Tomasz,

You have to calculate n! , any way. This will cause integer overflow. But
any way if you cheat the def factorial to be fp as follows;

def fac(n):
if n == 1. then
return 1.
return n*fac(n-1.)

It works, but I don't like this way. Smalltalk can perform this calculation
very fast without losing its significance.

tawee

Sat, 27 Sep 2003 23:30:48 GMT
probability distribution function in python

Quote:
> It is numerically a bad idea to compute the factorial directly. Most often
> it is encountered in expressions, where two similarly huge numbers are
> divided, resulting in a quite normal (not so huge) number.

For the specific (and frequent) case of binomial coefficients, aka
combinations ((a+b)!/(a!b!)), GMP directly offers a shortcut that
can be rather speedy (and gmpy, my Python wrapper for GMP, of
course exposes it -- it may be the single gmpy function I call
most often, as I often compute cards-related probabilities:-).

Alex

Sun, 28 Sep 2003 03:59:05 GMT
probability distribution function in python

Quote:

> Here's a better one, but you have to learn about lambda and
> reduce:

> >>> def fact(n):
> ...  return reduce(lambda x,y: long(x*y), range(1,n+1))
> ...

This looks nice as an isolated excercise, but fails if you need to use
these factorials somewhere, like in calculating Poisson probabilities in
the original question. The problem is that you need the factorial to
divide a real number (float), and for that promote your factorial to a
float and that overflows. A better solution was already posted here:
calculate log-Poisson with Stirling approximation for factorials. Here
is the first approximation:

Quote:
>>> from math import *
>>> def stirling(n):

...     log(2)/2+log(n)/2+log(pi)/2+n*log(n)-n

We may compare this to your fact() in two cases:

Quote:
>>> log(fact(100))
363.739375556
>>> stirling(100)
363.738542225
>>> log(fact(200))
inf
>>> stirling(200)

863.231570526

So fact(200) gives a result, but fails when this result is used in
floating point calculations while Stirling still works. So the idea of
using log-Poisson is to avoid overflows in interim results: the final
result is in range 0...1 and it can be calculated for higher n as well
as long as you avoid large floats.

Now it depends on the needs of accuracy whether the simple Stirling
suffices. A common practice is to use straight factorial for smaller n
and switch to Stirling in larger n. There are very simple correction
terms for larger n (and for smaller n you can calculate or cut-and-paste
the corrections explicitly) depending on the magnitude of n. These you
need in serious applications, and in that case I wouldn't write them
myself but I would get a good numeric library function.

In case you don't want the Poisson probability but a Poisson
log-likelihood, you might consider skipping the factorial completely.

cheers, jo
--
Jari Oksanen -- Oulu, Finland

Sun, 28 Sep 2003 15:55:24 GMT
probability distribution function in python

Howdy-

Here is some "pure Python" (rather than an extension) implementing
some functions from the *Numerical Recipes* books related to
factorials.  Use at your own risk!  (i.e., not yet thoroughly tested).

-Tom Loredo

from math import *
import Numeric
N = Numeric

#============= Exceptions ===============

max_iters = 'Too many iterations: '

#============= Global constants ===============

rt2 = sqrt(2.)
gammln_cof = N.array([76.18009173, -86.50532033, 24.01409822,
-1.231739516e0, 0.120858003e-2, -0.536382e-5])
gammln_stp = 2.50662827465

#============= Gamma, Incomplete Gamma ===========

def gammln(xx):
"""Logarithm of the gamma function."""
global gammln_cof, gammln_stp
x = xx - 1.
tmp = x + 5.5
tmp = (x + 0.5)*log(tmp) - tmp
ser = 1.
for j in range(6):
x = x + 1.
ser = ser + gammln_cof[j]/x
return tmp + log(gammln_stp*ser)

def gser(a, x, itmax=700, eps=3.e-7):
"""Series approx'n to the incomplete gamma function."""
gln = gammln(a)
if (x < 0.):
if (x == 0.):
return(0.)
ap = a
sum = 1. / a
delta = sum
n = 1
while n <= itmax:
ap = ap + 1.
delta = delta * x / ap
sum = sum + delta
if (abs(delta) < abs(sum)*eps):
return (sum * exp(-x + a*log(x) - gln), gln)
n = n + 1
raise max_iters, str((abs(delta), abs(sum)*eps))

def gcf(a, x, itmax=200, eps=3.e-7):
"""Continued fraction approx'n of the incomplete gamma function."""
gln = gammln(a)
gold = 0.
a0 = 1.
a1 = x
b0 = 0.
b1 = 1.
fac = 1.
n = 1
while n <= itmax:
an = n
ana = an - a
a0 = (a1 + a0*ana)*fac
b0 = (b1 + b0*ana)*fac
anf = an*fac
a1 = x*a0 + anf*a1
b1 = x*b0 + anf*b1
if (a1 != 0.):
fac = 1. / a1
g = b1*fac
if (abs((g-gold)/g) < eps):
return (g*exp(-x+a*log(x)-gln), gln)
gold = g
n = n + 1
raise max_iters, str(abs((g-gold)/g))

def gammp(a, x):
"""Incomplete gamma function."""
if (x < 0. or a <= 0.):
raise ValueError, (a, x)
if (x < a+1.):
return gser(a,x)[0]
else:
return 1.-gcf(a,x)[0]

def gammq(a, x):
"""Incomplete gamma function."""
if (x < 0. or a <= 0.):
raise ValueError, repr((a, x))
if (x < a+1.):
return 1.-gser(a,x)[0]
else:
return gcf(a,x)[0]

def factrl(n, ntop=0, prev=N.ones((33),N.Float)):
"""Factorial of n.
The first 33 values are stored as they are calculated to
speed up subsequent calculations."""
if n < 0:
raise ValueError, 'Negative argument!'
elif n <= ntop:
return prev[n]
elif n <= 32:
for j in range(ntop+1, n+1):
prev[j] = j * prev[j-1]
ntop = n
return prev[n]
else:
return exp(gammln(n+1.))

def factln(n, prev=N.array(101*(-1.,))):
"""Log factorial of n.
Values for n=0 to 100 are stored as they are calculated to
speed up subsequent calls."""
if n < 0:
raise ValueError, 'Negative argument!'
elif n <= 100:
if prev[n] < 0:
prev[n] = gammln(n+1.)
return prev[n]
else:
return gammln(n+1.)

def combln(Ntot, n):
"""Log of number of combinations of n items from Ntot total."""
return factln(Ntot) - factln(n) - factln(Ntot-n)

Mon, 29 Sep 2003 02:14:37 GMT

 Page 1 of 1 [ 7 post ]

Relevant Pages