The Univariate  Poisson regression model is given by 
  
  
 (1)
  where
, for
, and the outcome 
has the probability distribution
  
(2)
  Note  that in equation (2), 
is presumed to increase with X so that b is a priori greater than zero. For  the model given in (1)-(2), the log-likelihood function of the random sample 
 can be written  as
  
, (3)
  where
. The likelihood equations for estimating a and b do not have explicit solutions  because of the nonlinear function
, 
. To obtain the MMLEs, we first express the likelihood  equations in terms of the ordered variates 
. The likelihood equations can be re-written as
  
 (4)
  and
  
 (5)
  Where 
 is the concomitant of 
 
 and 
. Linearizing the intractable function 
 by using the  first two terms of its Taylor series expansion around the population quantiles 
 (
), we find 
  
, (6)
  where 
 and 
 for 
. In order to calculate 
 values, we  define a dummy random variable U with the probability density function 
  
 (7)
  and  re-write the model (1) as
  
, (8)
  where 
, 
. Thus, the 
 values can be  obtained from the equation
  
, (9)
  For 
; asymptotically 
Alternatively, when n is large one can utilize the  standard normal distribution
  
. (10)
  Incorporating  (6) into (4)-(5) and solving the resulting modified likelihood equations yield  the explicit MMLEs below:
  
 and 
, (10)
  where
  
, 
, 
, (11)
  and
  
. (12)
  The  MMLEs derived above are asymptotically equivalent to their corresponding MLEs,  giving them the same attractive asymptotic properties; however, one can refine  the estimates by re-calculating 
 and 
 values by  replacing the theoretical population quantiles with their estimated values 
. This process might be repeated until a desired  convergence is met. The stabilization generally is reached within a few iterations.
  Asymptotic variances and co-variances
  Vaughan and Tiku15 proved rigorously that the MMLEs are asymptotically unbiased  and their variances and co-variances are exactly the same as those of the MLEs.  In the present situation, therefore, the asymptotic variances and the  covariance of 
 and 
 are given by 
,  where I is the Fisher Information  matrix consisting of the elements 
, and 
. From  the modified likelihood equations 
  
,
  
,
  the  Fisher Information matrix can be easily obtained as 
  
, (13)
  where 
. V is estimated by replacing 
 with its  estimate 
, 
 (
). Since 
 values converge  to 
 values as n tends  to infinity, 
 and 
 values are  treated as constant coefficients for large n, see also.3 Hence, the asymptotic variances can be estimated by
  
, (14)
  
. (15)
  Hypothesis testing
  Testing  the null hypothesis 
 is of great  practical importance in Poisson regression modelling. The likelihood ratio  statistic for testing 
 is 
, where 
 and 
 denote the  maximized log-modified likelihood functions under the null and alternative  hypotheses, respectively. The null distribution of LR is asymptotically a chi-square with 1 degree of freedom. Large  values of LR lead the rejection of
. Alternatively, the Wald statistic W (the  ratio of 
 to its standard  error) might be used. Since 
 is  asymptotically equivalent to the MLE, the null distribution of W is asymptotically normal N(0,1). Large  values of W lead to the rejection of 
.
  Numerical example
  To  compare ML and MML estimates numerically, we analyzed the data given on page 82  of Agresti.19 The data is from a study of  nesting horseshoe crabs where the response Y is the number of satellites that each female  crab has, and the corresponding values of the covariate X is the carapace width of 173 crabs. The study investigates the  relationship between Y and X. We calculated the MMLEs from equations (10)-(12) and  their approximate standard errors from (14)-(15). The FORTRAN code written to  carry out the calculations can be obtained from the author. Our results are  completely consistent with those given in,19  which is expected; see Table 1. 
  
  
  
  
    
      |  | Coefficient | Estimate | SE | W | LR | 
    
      | ML | a | -3.3048 | 0.5422 |  |  | 
    
      |  | b | 0.164 | 0.02 | 8.2 | 64.9 | 
    
      |  | Coefficient | Estimate | SE | W | LR | 
    
      | MML | a | -3.3047 | 0.5423 |  |  | 
    
      |  | b | 0.164 | 0.0199 | 8.241 | 64.91 | 
  
  Table 1 MLEs  and MMLEs along with their standard errors for horseshoe crab data
 
 
 
  
  
  
  
  Remark: In solving (10)-(12), Oral18  proposed to calculate the initial 
 values  from the least squares estimators (LSEs), which is a different approach than  using equation (9) (Approach 1) or equation (10) (Approach 2). Since 
, 
 values  can be approximated by 
, where
  
 and 
  are the LSEs; see also3 and.20  When using this approach (say, Approach 3), the 
 values need to  be revised after the first iteration with their estimated values 
 as described  above. Estimating population quantiles from the LSEs changes neither the  derivations/solutions (10)-(12) nor the results. However, the total revision  number needed for  stabilization under different approaches is not the same, see also3 page 889. To  compare the performance of these three approaches, we conducted a simulation  study where we calculated the bias values and variances of the resulting MLEs as  well as the coverage probabilities. We also provided average revision  numbers needed for stabilization from each approach. We set 
 to zero and  considered various values for 
 and sample size n. Our results from 10,000  Monte Carlo runs are given in Table 2.  
  As can be seen from Table 2, all approaches  provide same biases and variances after stabilization, which is expected.  The resulting coverage  probabilities from all approaches are close to 0.95, and as sample size  increases, both bias values and variances decrease, also as expected. For a  given 
 value, it can  be seen that the fastest stabilization is achieved by Approach 2 (i.e. equation  (10)). Thus, although all three approaches yield the same  results, we suggest to calculate initial 
 values 
 from equation  (10) because  the stabilization from this approach is the fastest one.
    
      |   |   | 
 | 0.1 |   | 0.5 |   | 1.0 |   | 
    
      | n |   |   | 
 | 
 | 
 | 
 | 
 | 
 | 
    
      | 30 | Approach 1 | Bias | 0.0355 | 0.0006 | 0.0341 | 0.0067 | 0.0278 | 0.0078 | 
    
      |   | Variance | 0.0373 | 0.0389 | 0.0403 | 0.0369 | 0.0451 | 0.0328 | 
    
      |   | Coverage prob. | 0.9521 | 0.9537 | 0.9522 | 0.9541 | 0.9525 | 0.9488 | 
    
      |   | No of revisions | 4.67 |   | 3.45 |   | 6.10 |   | 
    
      | Approach 2 | Bias | 0.0355 | 0.0006 | 0.0341 | 0.0067 | 0.0279 | 0.0079 | 
    
      |   | Variance | 0.0373 | 0.0388 | 0.0403 | 0.0369 | 0.0451 | 0.0328 | 
    
      |   | Coverage prob. | 0.9521 | 0.9537 | 0.9522 | 0.9541 | 0.9525 | 0.9488 | 
    
      |   | No of revisions | 2.65 |   | 2.00 |   | 1.81 |   | 
    
      | Approach 3 | Bias | 0.0355 | 0.0006 | 0.0341 | 0.0067 | 0.0278 | 0.0079 | 
    
      |   | Variance | 0.0372 | 0.0389 | 0.0402 | 0.0368 | 0.0451 | 0.0328 | 
    
      |   | Coverage prob. | 0.9521 | 0.9537 | 0.9523 | 0.9540 | 0.9525 | 0.9488 | 
    
      |   | No of revisions | 2.98 |   | 3.04 |   | 4.63 |   | 
    
      | 100 | Approach 1 | Bias | 0.0098 | 0.0007 | 0.0079 | 0.0002 | 0.0081 | 0.0017 | 
    
      |   | Variance | 0.0104 | 0.0104 | 0.0113 | 0.0094 | 0.0126 | 0.0074 | 
    
      |   | Coverage prob. | 0.9504 | 0.9518 | 0.9470 | 0.9494 | 0.9538 | 0.9495 | 
    
      |   | No of revisions | 4.78 |   | 3.24 |   | 6.59 |   | 
    
      | Approach 2 | Bias | 0.0098 | 0.0007 | 0.0079 | 0.0001 | 0.0081 | 0.0016 | 
    
      |   | Variance | 0.0103 | 0.0104 | 0.0114 | 0.0095 | 0.0126 | 0.0073 | 
    
      |   | Coverage prob. | 0.9504 | 0.9518 | 0.9470 | 0.9494 | 0.9538 | 0.9496 | 
    
      |   | No of revisions | 2.91 |   | 2.01 |   | 1.32 |   | 
    
      | Approach 3 | Bias | 0.0098 | 0.0007 | 0.0078 | 0.0002 | 0.0082 | 0.0017 | 
    
      |   | Variance | 0.0104 | 0.0104 | 0.0113 | 0.0094 | 0.0126 | 0.0073 | 
    
      |   | Coverage prob. | 0.9505 | 0.9518 | 0.9470 | 0.9494 | 0.9539 | 0.9495 | 
    
      |   | No of revisions | 2.99 |   | 3.00 |   | 4.69 |   | 
    
      | 250 | Approach 1 | Bias | 0.0041 | 0.0002 | 0.0041 | 0.0000 | 0.0013 | 0.0007 | 
    
      |   | Variance | 0.0041 | 0.0040 | 0.0044 | 0.0036 | 0.0049 | 0.0026 | 
    
      |   | Coverage prob. | 0.9506 | 0.9452 | 0.9500 | 0.9479 | 0.9507 | 0.9513 | 
    
      |   | No of revisions | 4.78 |   | 3.11 |   | 6.87 |   | 
    
      | Approach 2 | Bias | 0.0041 | 0.0002 | 0.0040 | 0.0000 | 0.0012 | 0.0007 | 
    
      |   | Variance | 0.0040 | 0.0041 | 0.0044 | 0.0036 | 0.0049 | 0.0026 | 
    
      |   | Coverage prob. | 0.9506 | 0.9452 | 0.9500 | 0.9479 | 0.9507 | 0.9513 | 
    
      |   | No of revisions | 2.90 |   | 2.00 |   | 1.12 |   | 
    
      | Approach 3 | Bias | 0.0041 | 0.0002 | 0.0041 | 0.0001 | 0.0013 | 0.0007 | 
    
      |   | Variance | 0.0041 | 0.0041 | 0.0044 | 0.0036 | 0.0049 | 0.0026 | 
    
      |   | Coverage prob. | 0.9506 | 0.9452 | 0.9500 | 0.9479 | 0.9507 | 0.9513 | 
    
      |   | No of revisions | 2.99 |   | 3.00 |   | 4.76 |   | 
  
  Table 2 Bias values, variances,  convergence probabilities and average revision numbers using three different  approaches to calculate 
    
 values 
 
 
 
  Generalization to multivariable case
  Now  consider k 
 covariates and  assume all of them take positive values without loss of generality. The Poisson  regression model with k covariates can be written as
  
 (16)
  where 
 and
  
, (17)
  for 
, 
. Following the same lines of,3  in order to rank the z-values we can  assume that all covariates are equally effective in increasing the response Y, i.e. we initially take
’s all equal, and order the z-values that would correspond to the ordered x-values, where 
. In other words, the ordered z-values become
  
, (18)
  where  the vector 
 is the ith row of the matrix
  
, (19)
  which  is constructed by arranging the rows of the X matrix
  
,
  so as  to correspond to the ordered 
 value
. The MMLEs can be obtained along the same lines as in  the Univariate case:
  
 (20)
  where 
 , 
 is given by  (11) and M is the nxn diagonal matrix
  
.
  The  asymptotic variance-covariance matrix V of  the estimators can be derived from the Fisher information matrix 
 as given below
  
 (21)
  where 
. V is  estimated by replacing 
 by 
  
, 
 
.
  Robustness
  Measures  of influence considered in linear regression models, such as high leverage  values, are analogous in the GLM framework. Large leverage values typically  mean that there are outliers in covariates. When outliers present in the data,  inferences based on MLEs becomes unreliable. In fact, it has been showed that  MLEs are not robust in GLMs.21 In Poisson  regression setting, if there are outliers in the continuous covariates, the  estimates can be influenced. Thus, we also studied the robustness properties of  the derived MMLEs under several outlier models. We considered the Univariate  model given by (1) for simplicity. We assumed that 
 and performed a  Monte-Carlo study for different values of 
, where (n-r) of the observations 
 (we don’t know  which) come from the Standard Normal Distribution with 
 and the  remaining r observations come from the Normal distribution with a scale 
 where c is a  positive constant. We calculated the value of r from the equation 
 (Dixon’s  outlier model). The outlier models considered are:
  (a) (n-r) come from N(0,1) and r come from N(0,1) (No outlier situation),
  (b) (n-r) come from N(0,1) and r come from N(0,1.5),
  (c) (n-r) come from N(0,1) and r come from N(0,2),
  (d) (n-r) come from N(0,1) and r come from N(0,4).
  Note  that the model (a) above does not involve outliers and is given for the sake of  comparisons. In order to be able to make direct comparisons, after generating  the X values we divided them by the standard deviation of the distribution for each  model. After generating the X values, we calculated 
and 
 for 
 to generate 
 values from  Poisson distribution with mean 
 (
). The values obtained from 5000 runs are given in (Table 3). 
    
      |   |   | Model (a): No outlier | Model (b) | 
    
      | 
 | n | Bias(
) | Bias(
) | Var(
) | Var(
) | Bias(
) | Bias(
) | Var(
) | Var(
) | 
    
      | 0.1 | 30 | 0.0326 | 0.0075 | 0.0371 | 0.0385 | 0.0330 | 0.0021 | 0.0377 | 0.0388 | 
    
      |   | 50 | 0.0195 | 0.0004 | 0.0214 | 0.0219 | 0.0211 | 0.0033 | 0.0214 | 0.0218 | 
    
      |   | 100 | 0.0082 | 0.0039 | 0.0103 | 0.0102 | 0.0094 | 0.0008 | 0.0103 | 0.0104 | 
    
      | 0.2 | 30 | 0.0316 | 0.0020 | 0.0377 | 0.0388 | 0.0302 | 0.0034 | 0.0388 | 0.0386 | 
    
      |   | 50 | 0.0202 | 0.0029 | 0.0217 | 0.0215 | 0.0200 | 0.0007 | 0.0219 | 0.0212 | 
    
      |   | 100 | 0.0108 | 0.0000 | 0.0104 | 0.0102 | 0.0115 | 0.0002 | 0.0108 | 0.0103 | 
    
      | 0.4 | 30 | 0.0323 | 0.0033 | 0.0391 | 0.0376 | 0.0346 | 0.0012 | 0.0407 | 0.0385 | 
    
      |   | 50 | 0.0207 | 0.0005 | 0.0224 | 0.0209 | 0.0165 | 0.0004 | 0.0230 | 0.0209 | 
    
      |   | 100 | 0.0086 | 0.0004 | 0.0109 | 0.0097 | 0.0089 | 0.0009 | 0.0113 | 0.0099 | 
    
      |   |   | Model (c) | Model (d) | 
    
      | 
 | n | Bias(
) | Bias(
) | Var(
) | Var(
) | Bias(
) | Bias(
) | Var(
) | Var(
) | 
    
      | 0.1 | 30 | 0.02827 | 0.0003 | 0.0381 | 0.0413 | 0.0385 | 0.0026 | 0.0372 | 0.0477 | 
    
      |   | 50 | 0.0192 | 0.0013 | 0.0220 | 0.0220 | 0.0211 | 0.0004 | 0.0213 | 0.0246 | 
    
      |   | 100 | 0.0095 | 0.0001 | 0.0105 | 0.0104 | 0.0099 | 0.0002 | 0.0108 | 0.0107 | 
    
      | 0.2 | 30 | 0.0362 | 0.0036 | 0.0396 | 0.0419 | 0.0350 | 0.0080 | 0.0375 | 0.0467 | 
    
      |   | 50 | 0.0163 | 0.0014 | 0.0208 | 0.0211 | 0.0189 | 0.0057 | 0.0221 | 0.0242 | 
    
      |   | 100 | 0.0111 | 0.0010 | 0.0106 | 0.0105 | 0.0109 | 0.0013 | 0.0099 | 0.0103 | 
    
      | 0.4 | 30 | 0.0313 | 0.0015 | 0.0396 | 0.0370 | 0.0314 | 0.0017 | 0.0379 | 0.0461 | 
    
      |   | 50 | 0.0178 | 0.0030 | 0.0226 | 0.0207 | 0.0166 | 0.0001 | 0.0218 | 0.0223 | 
    
      |   | 100 | 0.0079 | 0.0003 | 0.0110 | 0.0093 | 0.0074 | 0.0039 | 0.0111 | 0.0088 | 
  
  Table 3 Empirical  biases and variances from Dixon’s outlier models
 
 
 
  As  can be seen from the table, the biases in the estimates are negligible for all  models. The variances 
 (hence the Wald  statistics W) are almost the same for  a given n for the models (a), (b),  (c) and (d), which means that the MMLEs are robust to outliers in the  covariate. Note that the MML methodology achieves robustness through the 
 (
) values.