The Univariate Poisson regression model is given by
(1)
where
, for
, and the outcome
has the probability distribution
(2)
Note that in equation (2),
is presumed to increase with X so that b is a priori greater than zero. For the model given in (1)-(2), the log-likelihood function of the random sample
can be written as
, (3)
where
. The likelihood equations for estimating a and b do not have explicit solutions because of the nonlinear function
,
. To obtain the MMLEs, we first express the likelihood equations in terms of the ordered variates
. The likelihood equations can be re-written as
(4)
and
(5)
Where
is the concomitant of
and
. Linearizing the intractable function
by using the first two terms of its Taylor series expansion around the population quantiles
(
), we find
, (6)
where
and
for
. In order to calculate
values, we define a dummy random variable U with the probability density function
(7)
and re-write the model (1) as
, (8)
where
,
. Thus, the
values can be obtained from the equation
, (9)
For
; asymptotically
Alternatively, when n is large one can utilize the standard normal distribution
. (10)
Incorporating (6) into (4)-(5) and solving the resulting modified likelihood equations yield the explicit MMLEs below:
and
, (10)
where
,
,
, (11)
and
. (12)
The MMLEs derived above are asymptotically equivalent to their corresponding MLEs, giving them the same attractive asymptotic properties; however, one can refine the estimates by re-calculating
and
values by replacing the theoretical population quantiles with their estimated values
. This process might be repeated until a desired convergence is met. The stabilization generally is reached within a few iterations.
Asymptotic variances and co-variances
Vaughan and Tiku15 proved rigorously that the MMLEs are asymptotically unbiased and their variances and co-variances are exactly the same as those of the MLEs. In the present situation, therefore, the asymptotic variances and the covariance of
and
are given by
, where I is the Fisher Information matrix consisting of the elements
, and
. From the modified likelihood equations
,
,
the Fisher Information matrix can be easily obtained as
, (13)
where
. V is estimated by replacing
with its estimate
,
(
). Since
values converge to
values as n tends to infinity,
and
values are treated as constant coefficients for large n, see also.3 Hence, the asymptotic variances can be estimated by
, (14)
. (15)
Hypothesis testing
Testing the null hypothesis
is of great practical importance in Poisson regression modelling. The likelihood ratio statistic for testing
is
, where
and
denote the maximized log-modified likelihood functions under the null and alternative hypotheses, respectively. The null distribution of LR is asymptotically a chi-square with 1 degree of freedom. Large values of LR lead the rejection of
. Alternatively, the Wald statistic W (the ratio of
to its standard error) might be used. Since
is asymptotically equivalent to the MLE, the null distribution of W is asymptotically normal N(0,1). Large values of W lead to the rejection of
.
Numerical example
To compare ML and MML estimates numerically, we analyzed the data given on page 82 of Agresti.19 The data is from a study of nesting horseshoe crabs where the response Y is the number of satellites that each female crab has, and the corresponding values of the covariate X is the carapace width of 173 crabs. The study investigates the relationship between Y and X. We calculated the MMLEs from equations (10)-(12) and their approximate standard errors from (14)-(15). The FORTRAN code written to carry out the calculations can be obtained from the author. Our results are completely consistent with those given in,19 which is expected; see Table 1.
|
Coefficient |
Estimate |
SE |
W |
LR |
ML |
a |
-3.3048 |
0.5422 |
|
|
|
b |
0.164 |
0.02 |
8.2 |
64.9 |
|
Coefficient |
Estimate |
SE |
W |
LR |
MML |
a |
-3.3047 |
0.5423 |
|
|
|
b |
0.164 |
0.0199 |
8.241 |
64.91 |
Table 1 MLEs and MMLEs along with their standard errors for horseshoe crab data
Remark: In solving (10)-(12), Oral18 proposed to calculate the initial
values from the least squares estimators (LSEs), which is a different approach than using equation (9) (Approach 1) or equation (10) (Approach 2). Since
,
values can be approximated by
, where
and
are the LSEs; see also3 and.20 When using this approach (say, Approach 3), the
values need to be revised after the first iteration with their estimated values
as described above. Estimating population quantiles from the LSEs changes neither the derivations/solutions (10)-(12) nor the results. However, the total revision number needed for stabilization under different approaches is not the same, see also3 page 889. To compare the performance of these three approaches, we conducted a simulation study where we calculated the bias values and variances of the resulting MLEs as well as the coverage probabilities. We also provided average revision numbers needed for stabilization from each approach. We set
to zero and considered various values for
and sample size n. Our results from 10,000 Monte Carlo runs are given in Table 2.
As can be seen from Table 2, all approaches provide same biases and variances after stabilization, which is expected. The resulting coverage probabilities from all approaches are close to 0.95, and as sample size increases, both bias values and variances decrease, also as expected. For a given
value, it can be seen that the fastest stabilization is achieved by Approach 2 (i.e. equation (10)). Thus, although all three approaches yield the same results, we suggest to calculate initial
values
from equation (10) because the stabilization from this approach is the fastest one.
|
|
|
0.1 |
|
0.5 |
|
1.0 |
|
n |
|
|
|
|
|
|
|
|
30 |
Approach 1 |
Bias |
0.0355 |
0.0006 |
0.0341 |
0.0067 |
0.0278 |
0.0078 |
|
Variance |
0.0373 |
0.0389 |
0.0403 |
0.0369 |
0.0451 |
0.0328 |
|
Coverage prob. |
0.9521 |
0.9537 |
0.9522 |
0.9541 |
0.9525 |
0.9488 |
|
No of revisions |
4.67 |
|
3.45 |
|
6.10 |
|
Approach 2 |
Bias |
0.0355 |
0.0006 |
0.0341 |
0.0067 |
0.0279 |
0.0079 |
|
Variance |
0.0373 |
0.0388 |
0.0403 |
0.0369 |
0.0451 |
0.0328 |
|
Coverage prob. |
0.9521 |
0.9537 |
0.9522 |
0.9541 |
0.9525 |
0.9488 |
|
No of revisions |
2.65 |
|
2.00 |
|
1.81 |
|
Approach 3 |
Bias |
0.0355 |
0.0006 |
0.0341 |
0.0067 |
0.0278 |
0.0079 |
|
Variance |
0.0372 |
0.0389 |
0.0402 |
0.0368 |
0.0451 |
0.0328 |
|
Coverage prob. |
0.9521 |
0.9537 |
0.9523 |
0.9540 |
0.9525 |
0.9488 |
|
No of revisions |
2.98 |
|
3.04 |
|
4.63 |
|
100 |
Approach 1 |
Bias |
0.0098 |
0.0007 |
0.0079 |
0.0002 |
0.0081 |
0.0017 |
|
Variance |
0.0104 |
0.0104 |
0.0113 |
0.0094 |
0.0126 |
0.0074 |
|
Coverage prob. |
0.9504 |
0.9518 |
0.9470 |
0.9494 |
0.9538 |
0.9495 |
|
No of revisions |
4.78 |
|
3.24 |
|
6.59 |
|
Approach 2 |
Bias |
0.0098 |
0.0007 |
0.0079 |
0.0001 |
0.0081 |
0.0016 |
|
Variance |
0.0103 |
0.0104 |
0.0114 |
0.0095 |
0.0126 |
0.0073 |
|
Coverage prob. |
0.9504 |
0.9518 |
0.9470 |
0.9494 |
0.9538 |
0.9496 |
|
No of revisions |
2.91 |
|
2.01 |
|
1.32 |
|
Approach 3 |
Bias |
0.0098 |
0.0007 |
0.0078 |
0.0002 |
0.0082 |
0.0017 |
|
Variance |
0.0104 |
0.0104 |
0.0113 |
0.0094 |
0.0126 |
0.0073 |
|
Coverage prob. |
0.9505 |
0.9518 |
0.9470 |
0.9494 |
0.9539 |
0.9495 |
|
No of revisions |
2.99 |
|
3.00 |
|
4.69 |
|
250 |
Approach 1 |
Bias |
0.0041 |
0.0002 |
0.0041 |
0.0000 |
0.0013 |
0.0007 |
|
Variance |
0.0041 |
0.0040 |
0.0044 |
0.0036 |
0.0049 |
0.0026 |
|
Coverage prob. |
0.9506 |
0.9452 |
0.9500 |
0.9479 |
0.9507 |
0.9513 |
|
No of revisions |
4.78 |
|
3.11 |
|
6.87 |
|
Approach 2 |
Bias |
0.0041 |
0.0002 |
0.0040 |
0.0000 |
0.0012 |
0.0007 |
|
Variance |
0.0040 |
0.0041 |
0.0044 |
0.0036 |
0.0049 |
0.0026 |
|
Coverage prob. |
0.9506 |
0.9452 |
0.9500 |
0.9479 |
0.9507 |
0.9513 |
|
No of revisions |
2.90 |
|
2.00 |
|
1.12 |
|
Approach 3 |
Bias |
0.0041 |
0.0002 |
0.0041 |
0.0001 |
0.0013 |
0.0007 |
|
Variance |
0.0041 |
0.0041 |
0.0044 |
0.0036 |
0.0049 |
0.0026 |
|
Coverage prob. |
0.9506 |
0.9452 |
0.9500 |
0.9479 |
0.9507 |
0.9513 |
|
No of revisions |
2.99 |
|
3.00 |
|
4.76 |
|
Table 2 Bias values, variances, convergence probabilities and average revision numbers using three different approaches to calculate
values
Generalization to multivariable case
Now consider k
covariates and assume all of them take positive values without loss of generality. The Poisson regression model with k covariates can be written as
(16)
where
and
, (17)
for
,
. Following the same lines of,3 in order to rank the z-values we can assume that all covariates are equally effective in increasing the response Y, i.e. we initially take
’s all equal, and order the z-values that would correspond to the ordered x-values, where
. In other words, the ordered z-values become
, (18)
where the vector
is the ith row of the matrix
, (19)
which is constructed by arranging the rows of the X matrix
,
so as to correspond to the ordered
value
. The MMLEs can be obtained along the same lines as in the Univariate case:
(20)
where
,
is given by (11) and M is the nxn diagonal matrix
.
The asymptotic variance-covariance matrix V of the estimators can be derived from the Fisher information matrix
as given below
(21)
where
. V is estimated by replacing
by
,
.
Robustness
Measures of influence considered in linear regression models, such as high leverage values, are analogous in the GLM framework. Large leverage values typically mean that there are outliers in covariates. When outliers present in the data, inferences based on MLEs becomes unreliable. In fact, it has been showed that MLEs are not robust in GLMs.21 In Poisson regression setting, if there are outliers in the continuous covariates, the estimates can be influenced. Thus, we also studied the robustness properties of the derived MMLEs under several outlier models. We considered the Univariate model given by (1) for simplicity. We assumed that
and performed a Monte-Carlo study for different values of
, where (n-r) of the observations
(we don’t know which) come from the Standard Normal Distribution with
and the remaining r observations come from the Normal distribution with a scale
where c is a positive constant. We calculated the value of r from the equation
(Dixon’s outlier model). The outlier models considered are:
(a) (n-r) come from N(0,1) and r come from N(0,1) (No outlier situation),
(b) (n-r) come from N(0,1) and r come from N(0,1.5),
(c) (n-r) come from N(0,1) and r come from N(0,2),
(d) (n-r) come from N(0,1) and r come from N(0,4).
Note that the model (a) above does not involve outliers and is given for the sake of comparisons. In order to be able to make direct comparisons, after generating the X values we divided them by the standard deviation of the distribution for each model. After generating the X values, we calculated
and
for
to generate
values from Poisson distribution with mean
(
). The values obtained from 5000 runs are given in (Table 3).
|
|
Model (a): No outlier |
Model (b) |
|
n |
Bias(
) |
Bias(
) |
Var(
) |
Var(
) |
Bias(
) |
Bias(
) |
Var(
) |
Var(
) |
0.1 |
30 |
0.0326 |
0.0075 |
0.0371 |
0.0385 |
0.0330 |
0.0021 |
0.0377 |
0.0388 |
|
50 |
0.0195 |
0.0004 |
0.0214 |
0.0219 |
0.0211 |
0.0033 |
0.0214 |
0.0218 |
|
100 |
0.0082 |
0.0039 |
0.0103 |
0.0102 |
0.0094 |
0.0008 |
0.0103 |
0.0104 |
0.2 |
30 |
0.0316 |
0.0020 |
0.0377 |
0.0388 |
0.0302 |
0.0034 |
0.0388 |
0.0386 |
|
50 |
0.0202 |
0.0029 |
0.0217 |
0.0215 |
0.0200 |
0.0007 |
0.0219 |
0.0212 |
|
100 |
0.0108 |
0.0000 |
0.0104 |
0.0102 |
0.0115 |
0.0002 |
0.0108 |
0.0103 |
0.4 |
30 |
0.0323 |
0.0033 |
0.0391 |
0.0376 |
0.0346 |
0.0012 |
0.0407 |
0.0385 |
|
50 |
0.0207 |
0.0005 |
0.0224 |
0.0209 |
0.0165 |
0.0004 |
0.0230 |
0.0209 |
|
100 |
0.0086 |
0.0004 |
0.0109 |
0.0097 |
0.0089 |
0.0009 |
0.0113 |
0.0099 |
|
|
Model (c) |
Model (d) |
|
n |
Bias(
) |
Bias(
) |
Var(
) |
Var(
) |
Bias(
) |
Bias(
) |
Var(
) |
Var(
) |
0.1 |
30 |
0.02827 |
0.0003 |
0.0381 |
0.0413 |
0.0385 |
0.0026 |
0.0372 |
0.0477 |
|
50 |
0.0192 |
0.0013 |
0.0220 |
0.0220 |
0.0211 |
0.0004 |
0.0213 |
0.0246 |
|
100 |
0.0095 |
0.0001 |
0.0105 |
0.0104 |
0.0099 |
0.0002 |
0.0108 |
0.0107 |
0.2 |
30 |
0.0362 |
0.0036 |
0.0396 |
0.0419 |
0.0350 |
0.0080 |
0.0375 |
0.0467 |
|
50 |
0.0163 |
0.0014 |
0.0208 |
0.0211 |
0.0189 |
0.0057 |
0.0221 |
0.0242 |
|
100 |
0.0111 |
0.0010 |
0.0106 |
0.0105 |
0.0109 |
0.0013 |
0.0099 |
0.0103 |
0.4 |
30 |
0.0313 |
0.0015 |
0.0396 |
0.0370 |
0.0314 |
0.0017 |
0.0379 |
0.0461 |
|
50 |
0.0178 |
0.0030 |
0.0226 |
0.0207 |
0.0166 |
0.0001 |
0.0218 |
0.0223 |
|
100 |
0.0079 |
0.0003 |
0.0110 |
0.0093 |
0.0074 |
0.0039 |
0.0111 |
0.0088 |
Table 3 Empirical biases and variances from Dixon’s outlier models
As can be seen from the table, the biases in the estimates are negligible for all models. The variances
(hence the Wald statistics W) are almost the same for a given n for the models (a), (b), (c) and (d), which means that the MMLEs are robust to outliers in the covariate. Note that the MML methodology achieves robustness through the
(
) values.