Review Article Volume 4 Issue 6
Tests of hypotheses for the parameters of a bivariate geometric distribution
Fitrat Hossain,1 Munni Begum2
Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
1Department of Mathematics, Marquette University, USA
2Department of Mathematical Sciences, Ball State University, USA
Correspondence: Munni Begum, Department of Mathematical Sciences, Ball State University, Muncie, IN 47306, USA
Received: September 17, 2016 | Published: November 7, 2016
Citation: Hossain F, Begum M. Tests of hypotheses for the parameters of a bivariate geometric distribution. Biom Biostat Int J. 2016;4(6):244-249. DOI: 10.15406/bbij.2016.04.00112
Download PDF
Abstract
A bivariate geometric distribution is an extension to a univariate geometric distribution where the occurrence of three different types of events is considered. Many statisticians have studied and given different forms of a bivariate geometric distribution. In this paper, we considered the form given by Phatak & Sreehari.1 We estimated the parameters of this distribution under three different models using maximum likelihood estimation (mle) and derived deviances as the goodness of fit statistics for testing the parameters and deviance difference for comparing two models. Using simulated data we found that the deviance measure works well to test a reduced model against a full model.
Keywords: bivariate geometric distribution, deviance, deviance difference
Introduction
Many situations in real world cannot be described by a single variable. Simultaneous occurrence of multiple events warrants multivariate distributions. For instance, univariate geometric distribution can represent occurrence of failure of one component of a system. However, to study systems with several components that may have different types of failures, such as twin engines of an airplane or the paired organ in a human body, bivariate geometric distributions are suitable. Bivariate geometric distribution has increasingly important roles in various fields, including reliability and survival analysis. There are different forms of a bivariate geometric distribution. Phatak & Sreehari1 provided a form of the bivariate geometric distribution which is considered here. They introduced a form of probability mass function which take into consideration of three different types of events. There are other forms which can be seen in Nair & Nair,2 Hawkes,3 Arnold et al.4 and Sreehari & Vasudeva.5 Basu & Dhar6 proposed a bivariate geometric model which is analogous to bivariate exponential model developed by Marshal & Olkin.7 Characterization results are developed by Sun & Basu,8 Sreehari,9 and Sreehari & Vasudeva.5
Omey & Minkova10 considered the bivariate geometric distribution with negative correlation coefficient and analyzed some properties, probability generating function, probability mass function, moments and tail probabilities. Krishna & Pundir,11 studied the plausibility of a bivariate geometric distribution as a reliability model. They derived the maximum likelihood estimators and Bayes estimators of the parameters and various reliability characteristics. They also compared these estimators using Monte-Carlo simulation.
In this paper, the parameters of a saturated model, reduced model and generalized linear model (glm) for a bivariate geometric distribution are estimated using the maximum likelihood method. We also derived deviances as the goodness of fit statistics for testing parameters corresponding to these models and deviance difference to compare two related models in order to determine which model fits the data well. Rest of the paper is organized as follows: section 2 describes the univariate geometric distribution, section 3 presents the bivariate geometric distribution, section 4 presents hypothesis testing, section 5 discusses a numerical example with simulated data and section 6 has the conclusion.
Univariate geometric distribution
The probability mass function (pmf) of a random variable Y which follows a geometric distribution with probability of success p can be written as,
.
The moment generating function can be given by,
the mean and the variance of this distribution are
An extension to the univariate geometric distribution is the bivariate geometric distribution which is discussed in the next section.
Bivariate geometric distribution
The joint probability mass function of a bivariate geometric distribution can be obtained by the product of a marginal and a conditional distribution, introduced by Phatak & Sreehari.1 They considered a process from which the units could be classified as good, marginal and bad with probabilities
,
and
respectively. They proposed that the probability mass function of observing the first bad unit after several good and marginal units are passed as follows:
(1)
.
Here
and
denote the number of good and marginal units respectively before the first bad unit is observed.
The marginal distribution of
is a geometric distributions with probability of success
, and can be written as follows,
The conditional distribution of
given
is
(3)
The product of the marginal distribution of
in equation (2) and the conditional distribution of
given
in equation (3) gives the mass function of bivariate geometric distribution in equation (1).
Maximum likelihood estimation
Estimation of parameters in the absence of regressors
In order to find the maximum likelihood estimators (mle)s from a saturated model (parameters are different for each pair of observations), it suffices to consider the likelihood functions based on the marginal and conditional mass functions. Let
be independent random vectors each having bivariate geometric distribution with different pairs of parameters
for
.
The log likelihood function based on the conditional distribution of
given
can be written as follows using (3):
(4)
Differentiating (4) with respect to
and setting it equal to zero, we get the mle of
as,
(5)
The log likelihood function based on the marginal distribution of
from (2) is,
Differentiating (6) with respect to
and setting it equal to zero, the mle of
can be derived as,
(7)
Here,
and
are the maximum likelihood estimators of
and
,
respectively under the saturated model.
Similarly the maximum likelihood estimators (mle)s from a reduced model (parameters are the same for each pair of observations) can be obtained as:
(8)
(9)
Where
and
are the maximum likelihood estimators of
and
respectively under the reduced model.
Estimation of parameters in the presence of regressors: In the presence of regressors, one can employ a generalized linear model and hence estimate the parameters in terms of the estimated model parameters. The conditional distribution of
given
in (3) can be set as exponential family representation as follows,
Here the natural parameter and the function of the natural parameter respectively are,
Thus the mean of the conditional distribution of
given
is
A generalized linear model based on the conditional distribution of
given
can be written as,
Since,
represents the number of trials before a certain event can occur it is considered as count response, the linear predictor can be written as the logarithm of the mean
. Thus the conditional link function can be expressed as,
(10)
Here,
is an element of the matrix
corresponding to the covariate
which represents the effect of covariate to the mean responses through the link function
.
Differentiating (6) again with respect to
, setting it to zero and using (10) we get,
(11)
Hypothesis testing
In order to test the identical parameter assumption across each pair of observed data, we derived deviance as a goodness of fit statistics. Additional deviance statistics are derived for generalized linear model (glm) to compare two nested glms.
Deviance for reduced model with identical parameter assumption
The log likelihood function for the saturated model can be written using (1) and the maximum likelihood estimates of the parameters
and
from equations (5) and (7) respectively as follows,
(12)
Similarly, the log likelihood function of the reduced model can be written using (1) and the maximum likelihood estimates of
and
from equations (8) and (9) respectively as follows,
(13)
Thus the deviance statistic for testing the identical parameter for each observed pair of data can be expressed as follows,
(14)
According to Dobson (2001),
follows a
distribution with
degrees of freedom.
Deviance for a GLM
The deviance statistic for the glm of interest can be written using (1) and the maximum likelihood estimates of
and
based on the glm from equations (10) and (11) respectively as follows,
(15)
Thus the deviance can be expressed as follows,
(16)
According to Dobson (2001),
follows
distribution with
degrees of freedom.
Comparison between two GLMs
In order to compare two nested generalized linear models, we consider the following hypotheses. The null hypothesis corresponding to a smaller model
in terms of number of regression parameters is
The alternative hypothesis corresponding to a bigger model (M1 with q < p < n) within which the smaller model is nested can be written as,
We can test
against
using the difference of the deviance statistics. Here,
is used to denote the likelihood function corresponding to the model
and
to denote the likelihood function corresponding to the model
. Hence the deviance difference can be written as,
According to Dobson12 this
follows
distribution with
degrees of freedom.
If the value of
is consistent with the
distribution we would generally choose the
corresponding to
because it is simpler. On the other hand, if the value of
is in the critical region i.e., greater than the upper tail
point of the
distribution then would reject
in favor of
on the grounds that model
provides a significantly better description of the data.
Data simulation and analysis
To determine the efficiency of our derived deviances we need to have data with known parameters. However, we cannot generate data directly from bivariate geometric distribution using the available computer software packages. Krishna and Pundir suggested an algorithm based on a theorem given by Hogg et al.13 to generate random numbers from bivariate geometric distribution. According to this, paired values can be generated from a bivariate geometric distribution using the following steps,
- Step 1: Generate krandom numbers from univariate geometric distribution with probability of success
.
- Step 2: Suppose that our generated random numbers from the geometric distribution are
.
- Step 3: Generate k random numbers
, k times each from a negative binomial distribution with parameters
and
.
- Step 4: These generated pairs are from the bivariate geometric distribution with parameters
and
.
Deviance checking for reduced model
In this subsection, we use the following steps to check our derived deviance for the reduced model with identical values of parameters
for each observed pair of data.
- Step 1: Assume some fixed values of
and
.
- Step 2: Generate k random numbers from univariate geometric distribution with probability of success
using the assumed values of
and
from Step 1.
- Step 3: Suppose that our generated random numbers from the geometric distribution are
.
- Step 4: Generate k random numbers
, k times each from the negative binomial distribution with parameters
and
.
- Step 5: The generated pairs are from the bivariate geometric distribution with parameters
and
.
- Step 6: Estimate deviance which is derived in (14).
We take the values of
and
ranging from 0.10 to 0.90 and satisfying the constraint
. We considered several values for the pair
and generate random pairs to observe the efficiency of our derived deviance under different parametric values. For each specified pairs of parameters
, we ran this experiment twice to see whether there is a change in our decision due to randomness. The values of the pair of parameters and the corresponding deviance values are tabulated as follows.
The deviance we derived to test the parameters of the reduced model works well as we see that all, but four of the values of the deviances are smaller than
. However, among these four values of the deviances three are greater than
, but less than
. So, it can be concluded that our derived deviance works well. On the other hand, if most of the values of the deviances had a larger value than our desired
value, then we had to conclude that our derived deviance does not work in testing hypothesis regarding the parameters of the reduced model.
Parameters |
Deviance |
(0.95) |
(0.975) |
(0.99) |
q1=0.30,q2=0.30 |
177.4164 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.30 |
172.3071 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.40 |
185.3107 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.40 |
159.5293 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.50 |
193.8942 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.50 |
158.266 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.60 |
223.1697 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.60 |
193.667 |
231.8292 |
238.8612 |
247.2118 |
q1=0.40,q2=0.30 |
216.3456 |
231.8292 |
238.8612 |
247.2118 |
q1=0.40,q2=0.30 |
211.828 |
231.8292 |
238.8612 |
247.2118 |
q1=0.50,q2=0.30 |
148.1757 |
231.8292 |
238.8612 |
247.2118 |
q1=0.50,q2=0.30 |
254.3887 |
231.8292 |
238.8612 |
247.2118 |
q1=0.60,q2=0.30 |
239.3245 |
231.8292 |
238.8612 |
247.2118 |
q1=0.60,q2=0.30 |
215.4915 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.50 |
232.1984 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.50 |
191.7516 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.60 |
184.1803 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.60 |
236.0869 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.10 |
97.9206 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.10 |
85.10731 |
231.8292 |
238.8612 |
247.2118 |
Table 1 Estimation of deviance for different parameters under consideration
Parameters |
Deviance |
(0.95) |
(0.975) |
(0.99) |
q1=0.10,q2=0.20 |
100.8624 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.20 |
155.157 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.30 |
155.157 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.30 |
123.3245 |
231.8292 |
238.8612 |
247.2118 |
q1=0.20,q2=0.20 |
113.3245 |
231.8292 |
238.8612 |
247.2118 |
q1=0.20,q2=0.20 |
147.3637 |
231.8292 |
238.8612 |
247.2118 |
q1=0.20,q2=0.30 |
166.6306 |
231.8292 |
238.8612 |
247.2118 |
q1=0.20,q2=0.30 |
157.8232 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.10 |
133.2772 |
231.8292 |
238.8612 |
247.2118 |
q1=0.30,q2=0.10 |
131.2191 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.80 |
183.8584 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.80 |
218.8224 |
231.8292 |
238.8612 |
247.2118 |
q1=0.80,q2=0.10 |
203.6515 |
231.8292 |
238.8612 |
247.2118 |
q1=0.80,q2=0.10 |
177.6116 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.40 |
144.1728 |
231.8292 |
238.8612 |
247.2118 |
q1=0.10,q2=0.40 |
168.524 |
231.8292 |
238.8612 |
247.2118 |
q1=0.70,q2=0.10 |
169.3248 |
231.8292 |
238.8612 |
247.2118 |
q1=0.70,q2=0.10 |
177.8397 |
231.8292 |
238.8612 |
247.2118 |
q1=0.60,q2=0.10 |
177.1335 |
231.8292 |
238.8612 |
247.2118 |
q1=0.70,q2=0.10 |
197.0526 |
231.8292 |
238.8612 |
247.2118 |
q1=0.50,q2=0.10 |
159.3473 |
231.8292 |
238.8612 |
247.2118 |
q1=0.50,q2=0.10 |
146.7018 |
231.8292 |
238.8612 |
247.2118 |
Table 2 Estimation of deviance for different parameters under consideration (contd)
Conclusion
In this paper, we addressed an important problem of inference regarding bivariate geometric distribution and developed testing procedure for the parameters of this distribution with and without covariate information. Our method depends on deriving the deviance statistics using maximum likelihood estimators (mle) of parameters. Our mles of the parameters of the bivariate geometric distribution are obtained using the conditional and the marginal distributions.
We conducted a numerical analysis based on simulated data for the testing the 11 identical parameter assumption for each pair of observed data. Our numerical example did not consider any covariate information. We found that without covariate information our derived deviance worked well in most cases.
Acknowledgments
Conflicts of interest
Author declares that there are no conflicts of interest.
References
- Phatak AG, Sreehari M. Some characterizations of a bivariate geometric distributions. Journal of Indian Statistical Association. 1981;19:141‒146.
- Nair KRM, Nair NU. On characterizing the bivariate exponential and geometric distributions. Ann Inst Statist Math. 1988;40(2):267‒271.
- Hawkes AG. On characterizing the bivariate exponential and geometric distributions. Journal of the Royal Statistical Society, Series. 1972;B34:1293.
- Arnold BC, Castillo E, Erbia J. Conditionally specified distributions. Springer, New York, USA. 1992.
- Sreehar M, Vasudev R. Characterizations of multivariate geometric distributions in terms of conditional distributions. Metrica. 2012;75(2):271‒286.
- Basu AP, Dhar SK. Bivariate geometric distribution. Journal of Applied Statistical Sciences. 1995;2(1):12.
- Marshal AW, Olkin IA. A multivariate exponential distribution. Journal of the American Statistical Association. 1967;62(317):30‒44.
- Sun K, Basu AP. A characterization of a bivariate geometric distribution. Statistics and Probability Letters. 1995;23(4):307‒311.
- Sreehari G. Characterization via conditional distributions. Journal of Indian Statistical Association. 2005;43:77‒93.
- Omey Edward, Minkova DL. Bivariate geometric distributions. Hub Research Paper 2013=02 Economics & Science. 2013.
- Krishna Hare, Pundir PS. A bivariate geometric distribution with applications to reliability. Communications in Statistics‒ Theory and Methods. 2009;38(7):1079‒1093.
- Dobson JA. An introduction to generalized linear models. (2nd edn), Chapman & Hall, CRC press, UK. 2001.
- Hogg RV, McKean JW, Craig AT. Introduction to mathematical statistics. (6th edn), New Delhi: Pearson Education, India. 2005.
©2016 Hossain, et al. This is an open access article distributed under the terms of the,
which
permits unrestricted use, distribution, and build upon your work non-commercially.