
 
 
Research Article Volume 3 Issue 4
     
 
On poisson-sujatha distribution and its applications to model count data from biological sciences
 Rama Shanker,
   
    
 
   
    
    
  
    
    
   
      
      
        
        Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
        
        
 
 
 
          
     
    
    
    
    
    
        
        
       
     
   
 
    
   Hagos Fesshaye  
  
Department of Statistics, Eritrea Institute of Technology, Eritrea
Correspondence: Rama Shanker, Department of Statistics, Eritrea Institute of Technology, Asmara, Eritrea
Received: January 29, 2016 | Published: March 10, 2016
Citation: Shanker R, Fesshaye H. On poisson-sujatha distribution and its applications to model count data from biological sciences. Biom Biostat Int J. 2016;3(4):100-106. DOI: 10.15406/bbij.2016.03.00069
 Download PDF
        
       
 
  
Abstract
  
In this paper a simple method for finding moments of Poisson-Sujatha distribution (PSD) introduced by Shanker1 has been suggested and hence the first four moments about origin and the variance has been given. The PSD has been fitted to the same data-sets relating to ecology and genetics to which earlier Shanker & Hagos2 has fitted Poisson-Lindley distribution (PLD) introduced by Sankaran3 and Poisson-distribution (PD) and the goodness of fit of PSD shows satisfactory fit in majority of data-sets.
Keywords: sujatha distribution, poisson-sujatha distribution, lindley distribution, poisson-lindley distribution, moments, compounding, estimation of parameter, goodness of fit
 
  
Introduction
  The Poisson-Sujatha distribution (PSD) having probability mass function  
                                                                                    
                     
  
   (1.1)
has been introduced by Shanker1 for modeling count data-sets. The PSD arises from Poisson distribution when its parameter  follows Sujatha distribution introduced by Shanker4 having probability density function 
                                                                                                        
    (1.2)
We have 
                                                                                                        
    (1.3)
            
             
                            
                                                                                                     
      (1.4)
Which is the Poisson-Sujatha distribution (PSD).
 Shanker4 has shown that the Sujatha distribution (1.2) is a three component mixture of an exponential (θ) distribution, a gamma (2,θ) distribution, and a gamma (3,θ) distributionwith their mixing proportions 
 , 
 and 
 respectively. Shanker4 has discussed its various mathematical and statistical properties including its shape, moment generating function, moments, skewness, kurtosis, hazard rate function, mean residual life function, stochastic orderings, mean deviations, distribution of order statistics, Bonferroni and Lorenz curves, Renyi entropy measure, stress-strength reliability , amongst others along with the estimation of the parameter and applications for modeling lifetime data. 
Shanker1 has detailed study about various mathematical and statistical properties of PSD including moment generating function, coefficient of variation, skewness, kurtosis, over-dispersion, hazard rate and unimodality along with the estimation of the parameter and applications. Shanker & Hagos5,6 have obtained size-biased Poisson-Sujatha distribution (SBPSD) and zero-truncated Poisson-Sujatha distribution(ZTPSD) and discussed their statistical properties, estimation of the parameter and applications. Further, Shanker & Hagos7 have detailed study about zero-truncation of Poisson, Poisson-Lindley and Poisson-Sujatha distributions and their applications.
The probability mass function of Poisson-Lindley distribution (PLD) given by 	
                    
     
      x = 0, 1, 2,…,θ   > 0.             (1.5)
has been introduced by Sankaran
3 to model count data. The distribution arises from the Poisson distribution when its parameter  follows Lindley 
8 distribution with its probability density function
  
  
               ;   
                             (1.6)
In this paper a simple method for finding moments of Poisson-Sujatha distribution (PSD) introduced by Shanker1 has been suggested and hence the first four moments about origin and the variance has been presented. It seems that not much work has been done on the applications of PSD so far.  The PSD has been fitted to the same data-sets relating to ecology and genetics to which Shanker & Hagos2 has fitted Poisson-Lindley distribution (PLD) introduced by Sankaran3 and Poisson-distribution (PD) and the goodness of fit of PSD shows satisfactory fit in majority of data-sets.
  
 
  
   
  
Moments of poisson-sujatha distribution
Using  (1.3) the 
th moment about origin of PSD (1.1) can be obtained as
      
 
    (2.1)
Clearly  the expression under the bracket in (2.1) is the  	
th moment about origin of the Poisson distribution. Taking 
 in (2.1) and using the  first moment about origin of the Poisson distribution, the first moment about  origin of the PSD (1.1) can be obtained as
                   
                          (2.2)
Again  taking 
 in (2.1) and using  the second moment about origin of the Poisson distribution, the second moment  about origin of the PSD (1.1) is obtained as
          
               (2.3)
Similarly,  taking 
 in (2.1) and using the  third and the fourth moment about origin of the Poisson distribution, the third  and the fourth moment about origin of the PSD (1.1) are obtained as
            
                                                                    (2.4)
           
                                               (2.5)
  Thus  the variance of the PSD (1.1) can be obtained as
  
                                   (2.6)
  Shanker1 has shown that the PSD is always  over-dispersed, has increasing hazard rate and unimodal. Further, Shanker1 has also shown that the graphs of  coefficient of variation, skewness, and kurtosis of PSD are increasing for  increasing values of the parameter.
 
  
Estimation of the parameter
 Maximum  likelihood estimate (MLE) of the parameter: Let 
be a random sample of size 
 from the PSD (1.1) and  let 
 be the observed  frequency in the sample corresponding to 
such that 
, where 
 is the largest  observed value having non-zero frequency. The likelihood function 
 of the PSD (1.1) is given by
  
  
    The log  likelihood function is thus obtained as
 
    The  first derivative of the log likelihood function is given by 
  
    Where 
 is the sample mean.
    The  maximum likelihood estimate (MLE), 
 of 
 of PSD (1.1) is the  solution of the equation  
 and is given by the  solution of the following non-linear equation
    
      
    This  non-linear equation can be solved by any numerical iteration methods such as  Newton- Raphson, Bisection method, Regula–Falsi method etc.
  
   Method  of moment estimate (MOME) of the parameter: Let 
be a random sample of size 
 from the PSD (1.1).  Equating the population mean to the corresponding sample mean, the MOME 
 of 
 of PSD (1.1) is the  solution of the following cubic equation
   
                             
                                       
    Where 
 is the sample mean.
 
  
Applications of poisson-sujatha distribution
The  Poisson distribution is a suitable statistical model for the situations where  events seem to occur at random including the number of customers arriving at a  service point, the number of telephone calls arriving at an exchange, the  number of fatal traffic accidents per week in a given state, the number of  radioactive particle emissions per unit of time, the number of meteorites that  collide with a test satellite during a single orbit, the number of organisms  per unit volume of some fluid, the number of defects per unit of some  materials, the number of flaws per unit length of some wire, are some amongst  others. Since the condition for the applications for Poisson distribution is  the independence of events and the equality of mean and variance, this  condition is rarely satisfied completely in biological and medical science due  to the fact that the occurrences of successive events are dependent. Further, the  negative binomial distribution is a possible alternative to the Poisson  distribution when successive events are possibly dependent Johnson et al.,9 but for fitting negative binomial  distribution (NBD) to the count data, mean should be less than the variance. In  biological and medical sciences, these conditions are also not fully satisfied.  Generally, the count data in biological science and medical science are either  over-dispersed or under-dispersed. The main reason for selecting PLD and PSD to  fit biological science data is that these two distributions are always  over-dispersed and PSD has some flexibility over PLD.
Applications  in ecology
Ecology  is the branch of biology dealing with the relations and interactions between  organisms and their environment, including other organisms. The organisms and  their environment in the nature are complex, dynamic, interdependent, mutually  reactive and interrelated. Ecology deals with the various principles which  govern such relationship between organisms and their environment. It was Fisher et al.10 who have firstly discussed the  applications of Logarithmic series distribution (LSD) to model count data in  the science of ecology. Later, Kempton11 who  fitted the generalized form of Fisher’s Logarithmic series distribution (LSD)  to model insect data and concluded that it gives a superior fit as compared to  ordinary Logarithmic series distribution (LSD). He also concluded that it gives  better explanation for the data having exceptionally long tail. Tripathi & Gupta12 proposed another  generalization of the Logarithmic series distribution (LSD) which is flexible  to describe short-tailed as well as long-tailed data and fitted it to insect  data and found that it gives better fit as compared to ordinary Logarithmic  series distribution. Mishra & Shanker13  have discussed applications of generalized logarithmic series distributions  (GLSD) to models data in ecology. Shanker & Hagos2  have tried to fit PLD for data relating to ecology and observed that PLD gives  satisfactory fit.
In this  section we have tried to fit Poisson distribution (PD), Poisson-Lindley  distribution (PLD) and Poisson-Sujatha distribution (PSD) to many count data  from biological sciences using maximum likelihood estimates. The data were on  haemocytometer yeast cell counts per square, on European red mites on apple  leaves and European corn borers per plant (Table 1-3).
It is obvious from above tables that both PSD and PLD give much closer fit  than Poisson distribution. Further, in some data-sets PSD gives much closer fit  than PLD while in some data-sets PLD gives much closer fit than PSD and thus  both PSD and PLD can be considered as important tools for modeling data in  ecology. 
  
    	
    Number of Cells per    Square  | 
    Observed Frequency  | 
    Expected Frequency  | 
  
  
    PD  | 
    PLD  | 
    PSD  | 
  
  
    0  | 
    128  | 
    118.1  | 
    127.4  | 
    127.5  | 
  
  
    1  | 
    37  | 
    54.3  | 
    41.1  | 
    40.9  | 
  
  
    2  | 
    18  | 
     | 
     | 
     | 
  
  
    3  | 
    3  | 
  
  
    4  | 
    1  | 
  
  
    5+  | 
    0  | 
  
  
    Total  | 
    187  | 
    187  | 
    187  | 
    187  | 
  
  
    Estimate of Parameter  | 
     | 
    =0.459893  | 
    =2.751579  | 
    =3.186657  | 
  
  
     | 
     | 
    9.9  | 
    1.43  | 
    0.99  | 
  
  
    d.f.  | 
     | 
    1  | 
    1  | 
    1  | 
  
  
    p-value  | 
     | 
    0.0016  | 
    0.2317  | 
    0.3197  | 
  
  Table 1  Observed and expected number of Haemocytometer yeast cell counts per square observed by ‘Student’17
 
 
 
  
    
      Number Mites per Leaf  | 
      Observed Frequency  | 
      Expected Frequency  | 
    
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      38  | 
      25.3  | 
      35.8  | 
      35.3  | 
    
    
      1  | 
      17  | 
      29.1  | 
      20.7  | 
      20.9  | 
    
    
      2  | 
      10  | 
      16.7  | 
      11.4  | 
      11.6  | 
    
    
      3  | 
      9  | 
      
  | 
      6 
       
  | 
      6.1 
        
  | 
    
    
      4  | 
      3  | 
    
    
      5  | 
      2  | 
    
    
      6  | 
      1  | 
    
    
      7+  | 
      0  | 
    
    
      Total  | 
      80  | 
      80  | 
      80  | 
      80  | 
    
    
      Estimate of Parameter  | 
       | 
      
=1.15  | 
      
=1.255891  | 
      
=1.64683  | 
    
    
      
  | 
       | 
      18.27  | 
      2.47  | 
      2.52  | 
    
    
      d.f.  | 
       | 
      2  | 
      3  | 
      3  | 
    
    
      p-value  | 
       | 
      0.0001  | 
      0.4807  | 
      0.4719  | 
    
  
  Table 2 Observed  and expected number of red mites on Apple leaves
 
 
 
    
      Number    of Bores per Plant  | 
      Observed    Frequency  | 
      Expected    Frequency  | 
    
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      188  | 
      169.4  | 
      194.0  | 
      193.6  | 
    
    
      1  | 
      83  | 
      109.8  | 
      79.5  | 
      79.6  | 
    
    
      2  | 
      36  | 
      35.6  | 
      31.3  | 
      31.6  | 
    
    
      3  | 
      14  | 
      
  | 
      
  | 
      
  | 
    
    
      4  | 
      2  | 
    
    
      5  | 
      1  | 
    
    
      Total  | 
      324  | 
      324.0  | 
      324.0  | 
      324.0  | 
    
    
      Estimate of parameter  | 
       | 
      
  | 
      
  | 
      
  | 
    
    
      
  | 
       | 
      15.19  | 
      1.29  | 
      1.16  | 
    
    
      d.f.  | 
       | 
      2  | 
      2  | 
      2  | 
    
    
      p-value  | 
       | 
      0.0005  | 
      0.5247  | 
      0.5599  | 
    
  
  Table 3 Observed  and expected number of European corn- borer of Mc Guire  et al18
It is obvious from above tables that in table 1,  PD gives better fit than PLD and PSD; in table 2 PLD gives better fit than PD  and PSD while in table 3, PSD gives better fit than PD and PLD
 
 
 
  
Application  in genetics
Genetics  is the branch of biological science which deals with heredity and variation.  Heredity includes those traits or characteristics which are transmitted from  generation to generation, and is therefore fixed for a particular individual.  Variation, on the other hand, is mainly of two types, namely hereditary and  environmental. Hereditary variation refers to differences in inherited traits  whereas environmental variations are those which are mainly due to environment.  The segregation of chromosomes has been studied using statistical tool, mainly  chi-square (
  
).  In the analysis of  data observed on chemically induced chromosome aberrations in cultures of human  leukocytes, Loeschke & Kohler14 suggested  the negative binomial distribution while Janardan &  Schaeffer15 suggested modified Poisson distribution. Mishra and Shanker13 have discussed applications of  generalized Logarithmic series distributions (GLSD) to model data in mortality,  ecology and genetics. Shanker & Hagos2  have detailed study on the applications of PLD to model data from genetics. Much  quantitative works seem to be done in genetics but so far no works has been  done on fitting of PSD to data relating to genetics. In this section an attempt  has been made to fit to data relating to genetics using PSD, PLD and PD using  maximum likelihood estimate. Also an attempt has been made to fit PSD, PLD, and  PD to the data of Catcheside et al.16 in Table 4-7.
It is  obvious from the fitting of PSD, PLD, and PD that both PSD and PLD gives much  satisfactory fit than PD while in some data-sets PSD gives much closer fit than  PLD whereas PLD gives much closer fit than PSD in some data-sets. Thus both PSD  and PLD can be considered as important tools for modeling data in genetics
    
      Number of Aberrations  | 
      Observed Frequency  | 
      Expected Frequency  | 
    
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      268  | 
      231.3  | 
      257  | 
      257.6  | 
    
    
      1  | 
      87  | 
      126.7  | 
      93.4  | 
      93  | 
    
    
      2  | 
      26  | 
      34.7  | 
      32.8  | 
      32.7  | 
    
    
      3  | 
      9  | 
      
  | 
      11.2 
        
  | 
      11.2 
        
  | 
    
    
      4  | 
      4  | 
    
    
      5  | 
      2  | 
    
    
      6  | 
      1  | 
    
    
      7+  | 
      3  | 
    
    
      Total  | 
      400  | 
      400  | 
      400  | 
      400  | 
    
    
      Estimate of Parameter  | 
       | 
      
=0.5475  | 
      
=2.380442  | 
      
=2.829241  | 
    
    
      
  | 
       | 
      38.21  | 
      6.21  | 
      6.28  | 
    
    
      d.f.  | 
       | 
      2  | 
      3  | 
      3  | 
    
    
      p-value  | 
       | 
      0  | 
      0.1018  | 
      0.0987  | 
    
  
  Table 4 Distribution  of number of Chromatid aberrations (0.2 g chinon 1, 24 hours)
 
 
 
    
      Class/Exposure 
  | 
      Observed Frequency  | 
      Expected Frequency  | 
    
     
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      413  | 
      374  | 
      405.7  | 
      406.1  | 
    
    
      1  | 
      124  | 
      177.4  | 
      133.6  | 
      132.9  | 
    
    
      2  | 
      42  | 
      42.1  | 
      42.6  | 
      42.7  | 
    
    
      3  | 
      15  | 
      
  | 
      13.3 
       
  | 
      13.4 
        
  | 
    
    
      4  | 
      5  | 
    
    
      5  | 
      0  | 
    
    
      6  | 
      2  | 
    
    
      Total  | 
      601  | 
      601  | 
      601  | 
      601  | 
    
    
      Estimate of parameter  | 
       | 
      
=0.47421  | 
      
=2.685373  | 
      
=3.125788  | 
    
    
      
  | 
       | 
      48.17  | 
      1.34  | 
      1.1  | 
    
    
      d.f.  | 
       | 
      2  | 
      3  | 
      3  | 
    
    
      p-value  | 
       | 
      0  | 
      0.7196  | 
      0.7771  | 
    
  
  Table 5 Mammalian  cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin  (NSC-45383), Exposure -60 
      
 
 
 
    
      Class/Exposure 
  | 
      Observed Frequency  | 
      Expected Frequency  | 
    
     
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      200  | 
      172.5  | 
      191.8  | 
      192  | 
    
    
      1  | 
      57  | 
      95.4  | 
      70.3  | 
      70.1  | 
    
    
      2  | 
      30  | 
      26.4  | 
      24.9  | 
      24.9  | 
    
    
      3  | 
      7  | 
      
  | 
      
  | 
      
  | 
    
    
      4  | 
      4  | 
    
    
      5  | 
      0  | 
    
    
      6  | 
      2  | 
    
    
      Total  | 
      300  | 
      300  | 
      300  | 
      300  | 
    
    
      Estimate of parameter  | 
       | 
      
=0.55333  | 
      
=2.353339  | 
      
=2.795745  | 
    
    
      
  | 
       | 
      29.68  | 
      3.91  | 
      3.81  | 
    
    
      d.f.  | 
       | 
      2  | 
      2  | 
      2  | 
    
    
      p-value  | 
       | 
      0  | 
      0.1415  | 
      0.1488  | 
    
  
  Table 6 Mammalian  cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin  (NSC-45383), Exposure -70 
    
 
 
 
    
      Class/Exposure 
   | 
      Observed Frequency  | 
      Expected Frequency  | 
    
     
    
      PD  | 
      PLD  | 
      PSD  | 
    
    
      0  | 
      155  | 
      127.8  | 
      158.3  | 
      157.5  | 
    
    
      1  | 
      83  | 
      109  | 
      77.2  | 
      77.5  | 
    
    
      2  | 
      33  | 
      46.5  | 
      35.9  | 
      36.4  | 
    
    
      3  | 
      14  | 
      
  | 
      16.1 
        
  | 
      16.4 
       
  | 
    
    
      4  | 
      11  | 
    
    
      5  | 
      3  | 
    
    
      6  | 
      1  | 
    
    
      Total  | 
      300  | 
      300  | 
      300  | 
      300  | 
    
    
      Estimate of parameter  | 
       | 
      
=0.853333  | 
      
=1.617611  | 
      
=2.034077  | 
    
    
      
  | 
       | 
      24.97  | 
      1.51  | 
      1.74  | 
    
    
      d.f.  | 
       | 
      2  | 
      3  | 
      3  | 
    
    
      p-value  | 
       | 
      0  | 
      0.6799  | 
      0.6281  | 
    
  
  Table 7 Mammalian  cytogenetic dosimetry lesions in rabbit lymphoblast induced by streptonigrin  (NSC-45383), Exposure -90 
    
 
 
 
 
Acknowledgments
 Conflicts of interest
  Author declares that there are no conflicts of  interest.
 
 
References
  
    - Shanker R. The discrete Poisson–Sujatha distribution. International Journal of Probability and  Statistics. 2016;5(1).
 
    - Shanker R, Hagos F. On  Poisson–Lindley distribution and Its applications to Biological Sciences. Biometrics and Biostatistics International  Journal. 2015;2(4):1–5.
 
    - Sankaran M. The discrete  Poisson–Lindley distribution. Biometrics. 1970;26(1):145–149.
 
    - Shanker R. Sujatha distribution and Its Applications. Statistics in Transition new Series.  2015.
 
    - Shanker R, Hagos F. Size–biased Poisson–Sujatha distribution  with Applications. Communicated.  2016.
 
    - Shanker R, Hagos F. Zero–truncated Poisson–Sujatha  distribution with Applications. Communicated.  2016.
 
    - Shanker R, Hagos F. On zero–truncation of Poisson, Poisson–Lindley,  and Poisson–Sujatha distribution and their Applications. Communicated. 2016.
 
    - Lindley DV. Fiducial  distributions and Bayes theorem. Journal  of the Royal Statistical Society. 1958;20(1):102–107.
 
    - Johnson NL, Kotz S, Kemp AW. Univariate Discrete Distributions  2nd edition John Wiley & sons Inc, USA. 1992.
 
    - Fisher RA, Corpet AS,  Williams CB. The relation between the number of species and the number of  individuals in a random sample of an animal population. Journal of Animal Ecology. 1943;12(1):42–58.
 
    - Kempton RA. A generalized  form of Fisher’s logarithmic series. Biometrika.  1975;62(1):29–38.
 
    - Tripathi RC, Gupta RC. A  generalization of the log–series distribution. Theory and Methods. 1985;14(8):1779–1799.
 
    - Mishra A, Shanker R. Generalized logarithmic series  distribution–Its nature and applications. Proceedings  of the Vth International Symposium on Optimization and Statistics. 2002;28–30:155–168. 
 
    - Loeschke V, Kohler W.  Deterministic and Stochastic models of the negative binomial distribution and  the analysis of chromosomal aberrations in human leukocytes. Biometrische Zeitschrift. 1976;18(6):427–451.
 
    - Janardan KG, Schaeffer  DJ. Models for the analysis of chromosomal aberrations in human leukocytes. Biometrical Journal. 1977;19(8):599–612.
 
    - Catcheside DG, Lea DE, Thoday JM. Types of  chromosome structural change induced by the irradiation on Tradescantia  microspores. J Genet. 1946;47:113–136.
 
    - Sankaran M. The discrete  Poisson–Lindley distribution. Biometrics.  1970;26(1):145–149.
 
- Mc Guire JU, Brindley TA,  Bancroft TA. The distribution of European corn–borer larvae pyrausta in field  corn. Biometrics. 1957;13(1):65–78.
 
 
  
  ©2016 Shanker, et al. This is an open access article distributed under the terms of the, 
 which 
permits unrestricted use, distribution, and build upon your work non-commercially.