A tutorial of the Distfun toolbox.
The goal of this document is to illustrate practical uses of the distfun toolbox.
Reference: 68-95-99.7 rule. (2012, July 20). In Wikipedia, The Free Encyclopedia. Retrieved 09:11, August 8, 2012, from http://en.wikipedia.org/wiki/68-95-99.7_rule
Assume that X is a normally distributed random variable, where μ is the mean of the distribution, and σ is its standard deviation: Therefore,
To check this, we can use the distfun_normcdf
function.
distfun_normcdf(1,0,1)-distfun_normcdf(-1,0,1) distfun_normcdf(2,0,1)-distfun_normcdf(-2,0,1) distfun_normcdf(3,0,1)-distfun_normcdf(-3,0,1) | ![]() | ![]() |
The previous script produces the following output.
-->distfun_normcdf(1,0,1)-distfun_normcdf(-1,0,1) ans = 0.6826895 -->distfun_normcdf(2,0,1)-distfun_normcdf(-2,0,1) ans = 0.9544997 -->distfun_normcdf(3,0,1)-distfun_normcdf(-3,0,1) ans = 0.9973002
Reference: Section 4.3 - Binomial Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer
Problem : Suppose that items coming off a production line are classified as defective (D), or nondefective (N), independently of each other. The probability that an item is non-defective is pr=0.8. At one point in the production, we create a sample by randomly picking three items. Compute the probability that the sample contains 0, 1, 2 or 3 non-defective items.
Let X be the number of non-defective items in the sample. Then X has a binomial distribution, with parameter pr=0.8.
To calculate the required probabilities,
we can use the distfun_binopdf
function as follows.
s0 = distfun_binopdf(0,3,0.8) s1 = distfun_binopdf(1,3,0.8) s2 = distfun_binopdf(2,3,0.8) s3 = distfun_binopdf(3,3,0.8) s0+s1+s2+s3 | ![]() | ![]() |
The previous script produces the following output.
-->s0 = distfun_binopdf(0,3,0.8) s1 = 0.008 -->s1 = distfun_binopdf(1,3,0.8) s2 = 0.096 -->s2 = distfun_binopdf(2,3,0.8) s5 = 0.384 -->s3 = distfun_binopdf(3,3,0.8) s8 = 0.512 -->s0+s1+s2+s3 ans = 1.
Reference: http://en.wikipedia.org/wiki/Binomial_distribution, section "Normal_approximation"
When n increases, the binomial distribution with parameters n and pr approximates the normal distribution with parameters n*pr and sqrt(n*pr*(1-pr)).
n=[1 2 4 8 16 32]; pr=0.5; ny=3; nx=2; scf(); for i=1:nx for j=1:ny ij=(i-1)*ny + j; subplot(ny,nx,ij) mu=n(ij)*pr; sigma=sqrt(n(ij)*pr*(1-pr)); xmin=max(mu-3*sigma,0); xmax=min(mu+3*sigma,n(ij)); x=linspace(xmin,xmax,100); xbino=unique(floor(x)); y=distfun_binopdf(xbino,n(ij),pr); plot(xbino,y,"ro"); y=distfun_normpdf(x,mu,sigma); plot(x,y,"b-"); xtitle("n="+string(n(ij)),"X","Density"); legend(["Binomiale","Normal"]); end end | ![]() | ![]() |
The previous script produces the following figure.
Reference: http://en.wikipedia.org/wiki/Poisson_distribution, section "Related distributions"
When lambda increases, the Poisson distribution with parameter lambda approximates the normal distribution with mean lambda and standard deviation sqrt(lambda).
lambda=[4. 16. 32. 10000.]; ny=2; nx=2; scf(); for i=1:nx for j=1:ny ij=(i-1)*ny + j; subplot(ny,nx,ij) mu=lambda(ij); sigma=sqrt(lambda(ij)); xmin=max(mu-3*sigma,0); xmax=mu+3*sigma; x=linspace(xmin,xmax,100); xpoi=unique(floor(x)); y=distfun_poisspdf(xpoi,lambda(ij)); plot(xpoi,y,"ro"); y=distfun_normpdf(x,mu,sigma); plot(x,y,"b-"); xtitle("lambda="+string(lambda(ij))); legend(["Poisson","Normal"]); end end | ![]() | ![]() |
The previous script produces the following figure.
Reference: Section 8.4 - Geometric Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer
Problem : If the probability of a certain test yielding "positive" reaction equals 0.4, what is the probability that fewer than 5 "negative" reactions occur before the first positive reaction occur.
Let X be the number of negative reactions, before the first positive reaction occur. Then X has a geometric distribution with parameter pr=0.4. We have to compute P(X<=4).
To calculate the required probability,
we can use the distfun_geocdf
function as follows.
The previous script produces the following output.
-->p = distfun_geocdf(4,0.4) p = 0.92224
Reference: Section 8.7 - Hypergeometric Distribution, "Introductory Probability and Statistical Applications" by Paul L. Meyer
Problem : Small electric motors are shipped in lots of 50. Before such a shipment is accepted, an inspector chooses 5 of these motors and inspects them. If none of these motors are defective, the lot is accepted. If one or more are found to be defective, the entire shipment is inspected. Suppose that there are, in fact, three defective motors in the lot. What is the probability that 100 percent inspection is required?
Let the number of defective motors found be X. Then X has an hypergeometric distribution, with parameters: the total number of motors M=50, the number of defective motors k=3 and the number of motors inspected N=5.
100 percent inspection will be required if and only if X>=1. Hence, we have to compute P(X>=1)=P(X>0).
To calculate the required probability, we can use
the distfun_hygecdf
function as follows.
The previous script produces the following output.
-->p = distfun_hygecdf(0,50,3,5,%f) p = 0.2760204
The Hypergeometric distribution function f with parameters M, k and N satisfies the following recurrence:
for x=max(0,N-M+k),...,min(N,k).
The following script checks this property for M=10, k=5 and N=7. For X=0 and X=1, the probability is zero, and this is why we start from X=2.
M=10; k=5; N=7; p=distfun_hygepdf(2,M,k,N); mprintf("pdf(%s)=%s\n",string(0),string(p)) for x=2:4 p=(N-x)*(k-x)*p/(x+1)/(M-N-k+x+1); px=distfun_hygepdf(x+1,M,k,N); mprintf("P(X=%s): recurrence=%s, PDF=%s\n",.. string(x+1),string(p),string(px)) end | ![]() | ![]() |
The previous script produces the following output.
P(X=3): recurrence=0.4166667, PDF=0.4166667 P(X=4): recurrence=0.4166667, PDF=0.4166667 P(X=5): recurrence=0.0833333, PDF=0.0833333
Reference: Section 8.2 - The Poisson Distribution as approximation to the Binomial Distribution , "Introductory Probability and Statistical Applications" by Paul L. Meyer
Problem : At a busy traffic intersection the probability p of an individual car having an accident is very small, say p=0.002. However, during a certain part of the day, say between 4 p.m and 6 p.m , a large number of cars pass through the intersection, say 1000. Under these conditions, what is the probability of two or more accidents occuring during that period?
Let us assume that if X is the number of accidents among the 1000 cars. Then X has a binomial distribution with parameter pr=0.002. Hence, we have to compute P(X>=2)=P(X>1).
To calculate binomial probability, we can use the distfun_binocdf
function as follows.
The previous script produces the following output.
-->distfun_binocdf(1,1000,0.002,%f) p = 0.5942651
If n is large and pr is small, we can approximate binomial distribution using poisson distribution.
To calculate poisson probability, we can use the
distfun_poisscdf
function as follows.
The previous script produces the following output.
-->p = distfun_poisscdf(1,2,%f) p = 0.5939942
Reference: Section 5.8.1 - The Chisquare Distribution, and p.613 Table A2, "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004)"
Compute a table of complementary quantiles of the Chi-Square distribution, using given values of alpha and given degrees of freedom k. In other words, compute x so that P(X>x)=alpha, for given values of alpha.
alpha = [ 0.995 0.99 0.975 0.95 0.05 0.025 0.01 0.005 ]; A = []; for k=1:30 A(k,:) = distfun_chi2inv(alpha',k,%f); end disp([(1:30)' A]) | ![]() | ![]() |
The previous script produces the following output.
k | alpha=0.995 | alpha=0.99 | alpha=0.975 | alpha=0.95 | alpha=0.05 | alpha=0.025 | alpha=0.01 | alpha=0.005 |
k=1 | 0.00004 | 0.00016 | 0.00098 | 0.00393 | 3.84146 | 5.02389 | 6.6349 | 7.87944 |
k=2 | 0.01003 | 0.02010 | 0.05064 | 0.10259 | 5.99146 | 7.37776 | 9.21034 | 10.5966 |
k=3 | 0.07172 | 0.11483 | 0.21580 | 0.35185 | 7.81473 | 9.3484 | 11.3449 | 12.8382 |
k=4 | 0.20699 | 0.29711 | 0.48442 | 0.71072 | 9.48773 | 11.1433 | 13.2767 | 14.8603 |
k=5 | 0.41174 | 0.55430 | 0.83121 | 1.14548 | 11.0705 | 12.8325 | 15.0863 | 16.7496 |
k=6 | 0.67573 | 0.87209 | 1.23734 | 1.63538 | 12.5916 | 14.4494 | 16.8119 | 18.5476 |
k=7 | 0.98926 | 1.23904 | 1.68987 | 2.16735 | 14.0671 | 16.0128 | 18.4753 | 20.2777 |
k=8 | 1.34441 | 1.6465 | 2.17973 | 2.73264 | 15.5073 | 17.5345 | 20.0902 | 21.955 |
k=9 | 1.73493 | 2.0879 | 2.70039 | 3.32511 | 16.919 | 19.0228 | 21.666 | 23.5894 |
k=10 | 2.15586 | 2.55821 | 3.24697 | 3.9403 | 18.307 | 20.4832 | 23.2093 | 25.1882 |
k=11 | 2.60322 | 3.05348 | 3.81575 | 4.57481 | 19.6751 | 21.92 | 24.725 | 26.7568 |
k=12 | 3.07382 | 3.57057 | 4.40379 | 5.22603 | 21.0261 | 23.3367 | 26.217 | 28.2995 |
k=13 | 3.56503 | 4.10692 | 5.00875 | 5.89186 | 22.362 | 24.7356 | 27.6882 | 29.8195 |
k=14 | 4.07467 | 4.66043 | 5.62873 | 6.57063 | 23.6848 | 26.1189 | 29.1412 | 31.3193 |
k=15 | 4.60092 | 5.22935 | 6.26214 | 7.26094 | 24.9958 | 27.4884 | 30.5779 | 32.8013 |
k=16 | 5.14221 | 5.81221 | 6.90766 | 7.96165 | 26.2962 | 28.8454 | 31.9999 | 34.2672 |
k=17 | 5.69722 | 6.40776 | 7.56419 | 8.67176 | 27.5871 | 30.191 | 33.4087 | 35.7185 |
k=18 | 6.2648 | 7.01491 | 8.23075 | 9.39046 | 28.8693 | 31.5264 | 34.8053 | 37.1565 |
k=19 | 6.84397 | 7.63273 | 8.90652 | 10.117 | 30.1435 | 32.8523 | 36.1909 | 38.5823 |
k=20 | 7.43384 | 8.2604 | 9.59078 | 10.8508 | 31.4104 | 34.1696 | 37.5662 | 39.9968 |
k=21 | 8.03365 | 8.8972 | 10.2829 | 11.5913 | 32.6706 | 35.4789 | 38.9322 | 41.4011 |
k=22 | 8.64272 | 9.54249 | 10.9823 | 12.338 | 33.9244 | 36.7807 | 40.2894 | 42.7957 |
k=23 | 9.26042 | 10.1957 | 11.6886 | 13.0905 | 35.1725 | 38.0756 | 41.6384 | 44.1813 |
k=24 | 9.88623 | 10.8564 | 12.4012 | 13.8484 | 36.415 | 39.3641 | 42.9798 | 45.5585 |
k=25 | 10.5197 | 11.524 | 13.1197 | 14.6114 | 37.6525 | 40.6465 | 44.3141 | 46.9279 |
k=26 | 11.1602 | 12.1981 | 13.8439 | 15.3792 | 38.8851 | 41.9232 | 45.6417 | 48.2899 |
k=27 | 11.8076 | 12.8785 | 14.5734 | 16.1514 | 40.1133 | 43.1945 | 46.9629 | 49.6449 |
k=28 | 12.4613 | 13.5647 | 15.3079 | 16.9279 | 41.3371 | 44.4608 | 48.2782 | 50.9934 |
k=29 | 13.1211 | 14.2565 | 16.0471 | 17.7084 | 42.557 | 45.7223 | 49.5879 | 52.3356 |
k=30 | 13.7867 | 14.9535 | 16.7908 | 18.4927 | 43.773 | 46.9792 | 50.8922 | 53.672 |
Reference: Section 5.6 - Exponentials Random Variables , "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004) "
Problem: Suppose the number of miles a car can run before its batters discharges out is exponentially distributed with an average value of 10 000 miles. If a 5 000 miles trip is to be made, what is the probability that the trip would be completed without the replacement of battery?
The exponential distribution has memoryless property. Let X be the remaining lifetime of the battery. Therefore, X has exponential distribution with parameter lambda = 1/10000. Hence the desired probability is
with 3 significant digits.
To compute the probability, we can use the distfun_expcdf function as follows.
The previous script produces the following output.
-->p = distfun_expcdf(5000,10000,%f) p = 0.6065307
Reference: Section 5.4 - The Uniform Random Variable, "Introduction to Probability and Statistics for Engineers and Scientists 3rd ed - S. Ross (Elsevier, 2004) "
Problem: Buses arrive at an interval of 15 minutes starting at 7 a.m. That is, they arrive at 7:00, 7:15, 7:30, 7:45, and so on. If a passenger arrives at the stop between 7:00 and 7:30, calculate the probability that he waits, (a) less that 5 minutes for a bus; (b) at least 12 minutes for a bus.
Let X be the arrival time of the passenger of the bus. Then X is a uniform random variable in the interval [0,30]. The desired probability for (a) is
a = 0 b = 30 p = (distfun_unifcdf(15,a,b) - distfun_unifcdf(10,a,b))+.. (distfun_unifcdf(30,a,b) - distfun_unifcdf(25,a,b)) | ![]() | ![]() |
The previous script produces the following output.
-->p = (distfun_unifcdf(15,a,b) - distfun_unifcdf(10,a,b))+.. -->(distfun_unifcdf(30,a,b) - distfun_unifcdf(25,a,b)) p = 0.3333333
Similarly, he would have to wait at least 12 minutes if he arrives between 7:00 and 7:03 or between 7:15 and 7:18, the desired probability for (b) is
a = 0 b = 30 p = (distfun_unifcdf(3,a,b) - distfun_unifcdf(0,a,b))+.. (distfun_unifcdf(18,a,b) - distfun_unifcdf(15,a,b)) | ![]() | ![]() |
The previous script produces the following output .
-->p = (distfun_unifcdf(3,0,30) - distfun_unifcdf(0,0,30))+.. -->(distfun_unifcdf(18,0,30) - distfun_unifcdf(15,0,30)) p = 0.2
Assume that X has Hypergeometric distribution with parameters M, k and N. Assume that Y has Binomial distribution with parameters N and pr=k/M.
The following figure presents the Hypergeometric urn.
The following figure presents the Binomial urn.
If M and k are large compared to N, and if p is not close to 0 or 1, then X and Y have approximately the same distribution.
In the following script, we use a constant value of N and pr, and let M and k increase. We can see that the Hypergeometric distribution become closer and closer to the Binomial distribution.
M=[80 800 8000 80000]; k=[50 500 5000 50000]; N=30; scf(); for i=1:2 for j=1:2 ij=(i-1)*2 + j; subplot(2,2,ij) x=(0:N)'; pr=k(ij)/M(ij); pHy=distfun_hygepdf(x,M(ij),k(ij),N); pBi=distfun_binopdf(x,N,pr); plot(x,pHy,"ro-"); plot(x,pBi,"bo-"); strtitle=msprintf("M=%s, k=%s, N=%s, pr=%s", .. string(M(ij)),string(k(ij)),string(N),string(pr)); xtitle(strtitle) legend(["Hypergeometric" "Binomial"],"in_upper_left"); end end | ![]() | ![]() |
The previous script produces the following figure.
Let X maximum of n random variables with normal distribution. When n increases, the distribution of X becomes closer and closer to the Gumbel distribution.
Reference http://www.panix.com/~kts/Thesis/extreme/extreme2.html
stacksize("max"); N=2000; mu=0; sigma=1; x=linspace(0,6,100); scf(); xtitle("Max. of n Normal variables","X","Frequency") // n=10; R=distfun_normrnd(mu,sigma,n,N); X=max(R,"r"); histplot(50,X,style=2); b = distfun_norminv(1-1/n,mu,sigma); a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); // n=100; R=distfun_normrnd(mu,sigma,n,N); X=max(R,"r"); histplot(50,X,style=3); b = distfun_norminv(1-1/n,mu,sigma); a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); // n=1000; R=distfun_normrnd(mu,sigma,n,N); X=max(R,"r"); histplot(50,X,style=4); legend(["n=10","n=100","n=1000"]); b = distfun_norminv(1-1/n,mu,sigma); a = distfun_norminv(1-1/(n*exp(1)),mu,sigma) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); | ![]() | ![]() |
The previous script produces the following figure.
Let X maximum of n random variables with exponential distribution. When n increases, the distribution of X becomes closer and closer to the Gumbel distribution. Reference http://www.panix.com/~kts/Thesis/extreme/extreme2.html
stacksize("max"); N=2000; mu=20; x=linspace(0,300,100); scf(); xtitle("Max. of n Exp variables","X","Frequency") // n=10; R=distfun_exprnd(mu,n,N); X=max(R,"r"); histplot(50,X,style=2); b = distfun_expinv(1-1/n,mu); a = distfun_expinv(1-1/(n*exp(1)),mu) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); // n=100; R=distfun_exprnd(mu,n,N); X=max(R,"r"); histplot(50,X,style=3); b = distfun_expinv(1-1/n,mu); a = distfun_expinv(1-1/(n*exp(1)),mu) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); // n=1000; R=distfun_exprnd(mu,n,N); X=max(R,"r"); histplot(50,X,style=4); legend(["n=10","n=100","n=1000"]); b = distfun_expinv(1-1/n,mu); a = distfun_expinv(1-1/(n*exp(1)),mu) - b; y=distfun_evpdf(-x,-b,a); plot(x,y,"k"); | ![]() | ![]() |
The previous script produces the following figure.
Source : Wikipedia, Multinomial distribution, http://en.wikipedia.org/wiki/Multinomial_distribution
In a recent three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?