<< Tutorial Tutorial Distribution Function Plots >>

distfun >> distfun > Tutorial > Tutorial

Tutorial

A tutorial of the Distfun toolbox.

Purpose

The goal of this document is to illustrate practical uses of the distfun toolbox.

Normal distribution

Reference: 68-95-99.7 rule. (2012, July 20). In Wikipedia, The Free Encyclopedia. Retrieved 09:11, August 8, 2012, from http://en.wikipedia.org/wiki/68-95-99.7_rule

Assume that X is a normally distributed random variable, where μ is the mean of the distribution, and σ is its standard deviation: Therefore,

\begin{eqnarray}
        Pr(\mu-\sigma \le x \le \mu+\sigma)   &\approx& 0.6827 \\
        Pr(\mu-2\sigma \le x \le \mu+2\sigma) &\approx& 0.9545 \\
        Pr(\mu-3\sigma \le x \le \mu+3\sigma) &\approx& 0.9973
        \end{eqnarray}

To check this, we can use the distfun_normcdf function.

distfun_normcdf(1,0,1)-distfun_normcdf(-1,0,1)
distfun_normcdf(2,0,1)-distfun_normcdf(-2,0,1)
distfun_normcdf(3,0,1)-distfun_normcdf(-3,0,1)

The previous script produces the following output.

-->distfun_normcdf(1,0,1)-distfun_normcdf(-1,0,1)
 ans  =
    0.6826895  
-->distfun_normcdf(2,0,1)-distfun_normcdf(-2,0,1)
 ans  =
    0.9544997  
-->distfun_normcdf(3,0,1)-distfun_normcdf(-3,0,1)
 ans  =
    0.9973002  
    

Binomial distribution

Reference: Section 4.3 - Binomial Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer

Problem : Suppose that items coming off a production line are classified as defective (D), or nondefective (N), independently of each other. The probability that an item is non-defective is pr=0.8. At one point in the production, we create a sample by randomly picking three items. Compute the probability that the sample contains 0, 1, 2 or 3 non-defective items.

Let X be the number of non-defective items in the sample. Then X has a binomial distribution, with parameter pr=0.8.

To calculate the required probabilities, we can use the distfun_binopdf function as follows.

s0 = distfun_binopdf(0,3,0.8)
s1 = distfun_binopdf(1,3,0.8)
s2 = distfun_binopdf(2,3,0.8)
s3 = distfun_binopdf(3,3,0.8)
s0+s1+s2+s3

The previous script produces the following output.

-->s0 = distfun_binopdf(0,3,0.8)
 s1  = 
    0.008  
-->s1 = distfun_binopdf(1,3,0.8)
 s2  = 
    0.096   
-->s2 = distfun_binopdf(2,3,0.8)
 s5  = 
    0.384   
-->s3 = distfun_binopdf(3,3,0.8)
 s8  = 
    0.512  
-->s0+s1+s2+s3
 ans  =
    1.  
    

Link between Binomial and Normal distribution

Reference: http://en.wikipedia.org/wiki/Binomial_distribution, section "Normal_approximation"

When n increases, the binomial distribution with parameters n and pr approximates the normal distribution with parameters n*pr and sqrt(n*pr*(1-pr)).

n=[1 2 4 8 16 32];
pr=0.5;
ny=3;
nx=2;
scf();
for i=1:nx
    for j=1:ny
        ij=(i-1)*ny + j;
        subplot(ny,nx,ij)
        mu=n(ij)*pr;
        sigma=sqrt(n(ij)*pr*(1-pr));
        xmin=max(mu-3*sigma,0);
        xmax=min(mu+3*sigma,n(ij));
        x=linspace(xmin,xmax,100);
        xbino=unique(floor(x));
        y=distfun_binopdf(xbino,n(ij),pr);
        plot(xbino,y,"ro");
        y=distfun_normpdf(x,mu,sigma);
        plot(x,y,"b-");
        xtitle("n="+string(n(ij)),"X","Density");
        legend(["Binomiale","Normal"]);
    end
end

The previous script produces the following figure.

Link between Poisson and Normal distribution

Reference: http://en.wikipedia.org/wiki/Poisson_distribution, section "Related distributions"

When lambda increases, the Poisson distribution with parameter lambda approximates the normal distribution with mean lambda and standard deviation sqrt(lambda).

lambda=[4. 16. 32. 10000.];
ny=2;
nx=2;
scf();
for i=1:nx
    for j=1:ny
        ij=(i-1)*ny + j;
        subplot(ny,nx,ij)
        mu=lambda(ij);
        sigma=sqrt(lambda(ij));
        xmin=max(mu-3*sigma,0);
        xmax=mu+3*sigma;
        x=linspace(xmin,xmax,100);
        xpoi=unique(floor(x));
        y=distfun_poisspdf(xpoi,lambda(ij));
        plot(xpoi,y,"ro");
        y=distfun_normpdf(x,mu,sigma);
        plot(x,y,"b-");
        xtitle("lambda="+string(lambda(ij)));
        legend(["Poisson","Normal"]);
    end
end

The previous script produces the following figure.

Geometric distribution

Reference: Section 8.4 - Geometric Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer

Problem : If the probability of a certain test yielding "positive" reaction equals 0.4, what is the probability that fewer than 5 "negative" reactions occur before the first positive reaction occur.

Let X be the number of negative reactions, before the first positive reaction occur. Then X has a geometric distribution with parameter pr=0.4. We have to compute P(X<=4).

To calculate the required probability, we can use the distfun_geocdf function as follows.

p = distfun_geocdf(4,0.4)

The previous script produces the following output.

-->p = distfun_geocdf(4,0.4)
 p  =
    0.92224 
    

Hypergeometric distribution

Reference: Section 8.7 - Hypergeometric Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer

Problem : Small electric motors are shipped in lots of 50. Before such a shipment is accepted, an inspector chooses 5 of these motors and inspects them. If none of these motors are defective, the lot is accepted. If one or more are found to be defective, the entire shipment is inspected. Suppose that there are, in fact, three defective motors in the lot. What is the probability that 100 percent inspection is required?

Let the number of defective motors found be X. Then X has an hypergeometric distribution, with parameters: the total number of motors M=50, the number of defective motors k=3 and the number of motors inspected N=5.

100 percent inspection will be required if and only if X>=1. Hence, we have to compute P(X>=1)=P(X>0).

To calculate the required probability, we can use the distfun_hygecdf function as follows.

p = distfun_hygecdf(0,50,3,5,%f)

The previous script produces the following output.

-->p = distfun_hygecdf(0,50,3,5,%f)
 p  =
    0.2760204 
    

Hypergeometric distribution: recurrence

The Hypergeometric distribution function f with parameters M, k and N satisfies the following recurrence:

f(x+1)=\frac{(N-x)(k-x)}{(x+1)(M-k-N+x+1)}f(x)

for x=max(0,N-M+k),...,min(N,k).

The following script checks this property for M=10, k=5 and N=7. For X=0 and X=1, the probability is zero, and this is why we start from X=2.

M=10;
k=5;
N=7;
p=distfun_hygepdf(2,M,k,N);
mprintf("pdf(%s)=%s\n",string(0),string(p))
for x=2:4
    p=(N-x)*(k-x)*p/(x+1)/(M-N-k+x+1);
    px=distfun_hygepdf(x+1,M,k,N);
    mprintf("P(X=%s): recurrence=%s, PDF=%s\n",..
    string(x+1),string(p),string(px))
end

The previous script produces the following output.

P(X=3): recurrence=0.4166667, PDF=0.4166667
P(X=4): recurrence=0.4166667, PDF=0.4166667
P(X=5): recurrence=0.0833333, PDF=0.0833333
    

Poisson distribution

Reference: Section 8.2 - The Poisson Distribution as approximation to the Binomial Distribution , "Introductory Probability and Statistical Applications" by Paul L. Meyer

Problem : At a busy traffic intersection the probability p of an individual car having an accident is very small, say p=0.002. However, during a certain part of the day, say between 4 p.m and 6 p.m , a large number of cars pass through the intersection, say 1000. Under these conditions, what is the probability of two or more accidents occuring during that period?

Let us assume that if X is the number of accidents among the 1000 cars. Then X has a binomial distribution with parameter pr=0.002. Hence, we have to compute P(X>=2)=P(X>1).

To calculate binomial probability, we can use the distfun_binocdf function as follows.

p = distfun_binocdf(1,1000,0.002,%f)

The previous script produces the following output.

-->distfun_binocdf(1,1000,0.002,%f)
 p  =
    0.5942651
    

If n is large and pr is small, we can approximate binomial distribution using poisson distribution.

To calculate poisson probability, we can use the distfun_poisscdf function as follows.

p = distfun_poisscdf(1,2,%f)

The previous script produces the following output.

-->p = distfun_poisscdf(1,2,%f)
 p  = 
    0.5939942 
    

Chi-square distribution

Reference: Section 5.8.1 - The Chisquare Distribution, and p.613 Table A2, "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004)"

Compute a table of complementary quantiles of the Chi-Square distribution, using given values of alpha and given degrees of freedom k. In other words, compute x so that P(X>x)=alpha, for given values of alpha.

alpha = [
0.995
0.99
0.975
0.95
0.05 
0.025 
0.01 
0.005
];
A = []; 
for k=1:30
    A(k,:) = distfun_chi2inv(alpha',k,%f);
end
disp([(1:30)' A])

The previous script produces the following output.

k alpha=0.995 alpha=0.99 alpha=0.975 alpha=0.95 alpha=0.05 alpha=0.025 alpha=0.01 alpha=0.005
k=1 0.00004 0.00016 0.00098 0.00393 3.84146 5.02389 6.6349 7.87944
k=2 0.01003 0.02010 0.05064 0.10259 5.99146 7.37776 9.21034 10.5966
k=3 0.07172 0.11483 0.21580 0.35185 7.81473 9.3484 11.3449 12.8382
k=4 0.20699 0.29711 0.48442 0.71072 9.48773 11.1433 13.2767 14.8603
k=5 0.41174 0.55430 0.83121 1.14548 11.0705 12.8325 15.0863 16.7496
k=6 0.67573 0.87209 1.23734 1.63538 12.5916 14.4494 16.8119 18.5476
k=7 0.98926 1.23904 1.68987 2.16735 14.0671 16.0128 18.4753 20.2777
k=8 1.34441 1.6465 2.17973 2.73264 15.5073 17.5345 20.0902 21.955
k=9 1.73493 2.0879 2.70039 3.32511 16.919 19.0228 21.666 23.5894
k=10 2.15586 2.55821 3.24697 3.9403 18.307 20.4832 23.2093 25.1882
k=11 2.60322 3.05348 3.81575 4.57481 19.6751 21.92 24.725 26.7568
k=12 3.07382 3.57057 4.40379 5.22603 21.0261 23.3367 26.217 28.2995
k=13 3.56503 4.10692 5.00875 5.89186 22.362 24.7356 27.6882 29.8195
k=14 4.07467 4.66043 5.62873 6.57063 23.6848 26.1189 29.1412 31.3193
k=15 4.60092 5.22935 6.26214 7.26094 24.9958 27.4884 30.5779 32.8013
k=16 5.14221 5.81221 6.90766 7.96165 26.2962 28.8454 31.9999 34.2672
k=17 5.69722 6.40776 7.56419 8.67176 27.5871 30.191 33.4087 35.7185
k=18 6.2648 7.01491 8.23075 9.39046 28.8693 31.5264 34.8053 37.1565
k=19 6.84397 7.63273 8.90652 10.117 30.1435 32.8523 36.1909 38.5823
k=20 7.43384 8.2604 9.59078 10.8508 31.4104 34.1696 37.5662 39.9968
k=21 8.03365 8.8972 10.2829 11.5913 32.6706 35.4789 38.9322 41.4011
k=22 8.64272 9.54249 10.9823 12.338 33.9244 36.7807 40.2894 42.7957
k=23 9.26042 10.1957 11.6886 13.0905 35.1725 38.0756 41.6384 44.1813
k=24 9.88623 10.8564 12.4012 13.8484 36.415 39.3641 42.9798 45.5585
k=25 10.5197 11.524 13.1197 14.6114 37.6525 40.6465 44.3141 46.9279
k=26 11.1602 12.1981 13.8439 15.3792 38.8851 41.9232 45.6417 48.2899
k=27 11.8076 12.8785 14.5734 16.1514 40.1133 43.1945 46.9629 49.6449
k=28 12.4613 13.5647 15.3079 16.9279 41.3371 44.4608 48.2782 50.9934
k=29 13.1211 14.2565 16.0471 17.7084 42.557 45.7223 49.5879 52.3356
k=30 13.7867 14.9535 16.7908 18.4927 43.773 46.9792 50.8922 53.672

Exponential distribution

Reference: Section 5.6 - Exponentials Random Variables , "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004) "

Problem: Suppose the number of miles a car can run before its batters discharges out is exponentially distributed with an average value of 10 000 miles. If a 5 000 miles trip is to be made, what is the probability that the trip would be completed without the replacement of battery?

The exponential distribution has memoryless property. Let X be the remaining lifetime of the battery. Therefore, X has exponential distribution with parameter lambda = 1/10000. Hence the desired probability is

P(X \geq 5000) = e^\left(-5000\lambda\right) = e^\left(-\frac{1}{2}\right) = 0.606

with 3 significant digits.

To compute the probability, we can use the distfun_expcdf function as follows.

p = distfun_expcdf(5000,10000,%f)

The previous script produces the following output.

-->p = distfun_expcdf(5000,10000,%f)
 p  =
    0.6065307
    

Uniform distribution

Reference: Section 5.4 - The Uniform Random Variable, "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004) "

Problem: Buses arrive at an interval of 15 minutes starting at 7 a.m. That is, they arrive at 7:00, 7:15, 7:30, 7:45, and so on. If a passenger arrives at the stop between 7:00 and 7:30, calculate the probability that he waits, (a) less that 5 minutes for a bus; (b) at least 12 minutes for a bus.

Let X be the arrival time of the passenger of the bus. Then X is a uniform random variable in the interval [0,30]. The desired probability for (a) is

P(10 \leq X \leq 15) + P(25 \leq X \leq 30) = \frac{5}{30} + \frac{5}{30} = \frac{1}{3}

a = 0
b = 30
p = (distfun_unifcdf(15,a,b) - distfun_unifcdf(10,a,b))+..
(distfun_unifcdf(30,a,b) - distfun_unifcdf(25,a,b))

The previous script produces the following output.

-->p = (distfun_unifcdf(15,a,b) - distfun_unifcdf(10,a,b))+..
-->(distfun_unifcdf(30,a,b) - distfun_unifcdf(25,a,b))
 p  =
    0.3333333    
	

Similarly, he would have to wait at least 12 minutes if he arrives between 7:00 and 7:03 or between 7:15 and 7:18, the desired probability for (b) is

P(0 \leq X \leq 3) + P(15 \leq X \leq 18) = \frac{3}{30} + \frac{3}{30} = \frac{1}{5}

a = 0
b = 30
p = (distfun_unifcdf(3,a,b) - distfun_unifcdf(0,a,b))+..
(distfun_unifcdf(18,a,b) - distfun_unifcdf(15,a,b))

The previous script produces the following output .

-->p = (distfun_unifcdf(3,0,30) - distfun_unifcdf(0,0,30))+..
-->(distfun_unifcdf(18,0,30) - distfun_unifcdf(15,0,30))
 p  =
    0.2     
	

Hypergeometric and Binomial distributions

Assume that X has Hypergeometric distribution with parameters M, k and N. Assume that Y has Binomial distribution with parameters N and pr=k/M.

The following figure presents the Hypergeometric urn.

The following figure presents the Binomial urn.

If M and k are large compared to N, and if p is not close to 0 or 1, then X and Y have approximately the same distribution.

In the following script, we use a constant value of N and pr, and let M and k increase. We can see that the Hypergeometric distribution become closer and closer to the Binomial distribution.

M=[80 800 8000 80000];
k=[50 500 5000 50000];
N=30;
scf();
for i=1:2
    for j=1:2
        ij=(i-1)*2 + j;
        subplot(2,2,ij)
        x=(0:N)';
        pr=k(ij)/M(ij);
        pHy=distfun_hygepdf(x,M(ij),k(ij),N);
        pBi=distfun_binopdf(x,N,pr);
        plot(x,pHy,"ro-");
        plot(x,pBi,"bo-");
        strtitle=msprintf("M=%s, k=%s, N=%s, pr=%s", ..
        string(M(ij)),string(k(ij)),string(N),string(pr));
        xtitle(strtitle)
        legend(["Hypergeometric" "Binomial"],"in_upper_left");
    end
end

The previous script produces the following figure.

Gumbel distribution

Let X maximum of n random variables with normal distribution. When n increases, the distribution of X becomes closer and closer to the Gumbel distribution.

Reference http://www.panix.com/~kts/Thesis/extreme/extreme2.html

stacksize("max");
N=2000;
mu=0;
sigma=1;
x=linspace(0,6,100);
scf();
xtitle("Max. of n Normal variables","X","Frequency")
//
n=10;
R=distfun_normrnd(mu,sigma,n,N);
X=max(R,"r");
histplot(50,X,style=2);
b = distfun_norminv(1-1/n,mu,sigma);
a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");
//
n=100;
R=distfun_normrnd(mu,sigma,n,N);
X=max(R,"r");
histplot(50,X,style=3);
b = distfun_norminv(1-1/n,mu,sigma);
a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");
//
n=1000;
R=distfun_normrnd(mu,sigma,n,N);
X=max(R,"r");
histplot(50,X,style=4);
legend(["n=10","n=100","n=1000"]);
b = distfun_norminv(1-1/n,mu,sigma);
a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");

The previous script produces the following figure.

Let X maximum of n random variables with exponential distribution. When n increases, the distribution of X becomes closer and closer to the Gumbel distribution. Reference http://www.panix.com/~kts/Thesis/extreme/extreme2.html

stacksize("max");
N=2000;
mu=20;
x=linspace(0,300,100);
scf();
xtitle("Max. of n Exp variables","X","Frequency")
//
n=10;
R=distfun_exprnd(mu,n,N);
X=max(R,"r");
histplot(50,X,style=2);
b = distfun_expinv(1-1/n,mu);
a = distfun_expinv(1-1/(n*exp(1)),mu) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");
//
n=100;
R=distfun_exprnd(mu,n,N);
X=max(R,"r");
histplot(50,X,style=3);
b = distfun_expinv(1-1/n,mu);
a = distfun_expinv(1-1/(n*exp(1)),mu) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");
//
n=1000;
R=distfun_exprnd(mu,n,N);
X=max(R,"r");
histplot(50,X,style=4);
legend(["n=10","n=100","n=1000"]);
b = distfun_expinv(1-1/n,mu);
a = distfun_expinv(1-1/(n*exp(1)),mu) - b;
y=distfun_evpdf(-x,-b,a);
plot(x,y,"k");

The previous script produces the following figure.

Multinomial distribution

Source : Wikipedia, Multinomial distribution, http://en.wikipedia.org/wiki/Multinomial_distribution

In a recent three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?

n=6
P=[0.2,0.3,0.5]
x=[1,2,3]
p=distfun_mnpdf(x,n,P) // 0.135

Report an issue
<< Tutorial Tutorial Distribution Function Plots >>