A discrete Pranav distribution and its applications

doi:10.15406/bbij.2019.08.00267

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 8 Issue 1

A discrete Pranav distribution and its applications

Berhane Abebe, Kamlesh Kumar Shukla

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Department of Statistics, College of Science, Eritrea

Correspondence:

Received: December 28, 2018 | Published: February 26, 2019

Citation: Abebe B, Shukla Kk. A discrete Pranav distribution and its applications. Biom Biostat Int J. 2019;8(1):33-37. DOI: 10.15406/bbij.2019.08.00267

Download PDF

Abstract

In the recent decades, the discretization of continuous distribution has been attracting to the attention of researchers because it generates distributions that can be used for strictly discrete data. In this paper, a discrete Pranav distribution, which is a discrete analogue of continuous Pranav distribution, has been carried out. It’s important properties including coefficient of variation, skewness, kurtosis and index of dispersion have been obtained and discussed graphically. The method of maximum likelihood estimation has been used for estimating its parameter. The goodness of fit of the proposed distribution have been illustrated using some real count datasets and it was found better fit as compared to other one parameter discrete distributions.

Keywords: Pranav distribution, discretization, moment generating function, moments, estimation, goodness of fit

Introduction

In recent past years, the use of discrete analogue of a continuous distribution avoids the use of a continuous distribution in the case of strictly available discrete data.

In many cases, it is not easy to get samples from continuous distributions. The observed values, in the most of cases, are collected actually discrete in nature for the reason that they are measured to only finite number of decimal places and cannot completely presents all points in a continuum. According to Lai,¹ discretization of a continuous lifetime model is an appealing approach to derive a discrete lifetime model corresponding to the continuous one. Therefore, it is reasonable and convenient to model the situation by an appropriate discrete distribution generated from the underlying continuous distribution preserving one or more important characteristics including probability density function (pdf), mean residual life function etc. and important statistical properties of the distribution.

In Statistics literature, different researchers have used different methods of discritization to propose a discrete type of distribution analogues of continuous distribution. In this study, one of the discretization methods has been used to find discrete analogous of continuous Pranav distribution introduced by Shukla.² Infinite series method has been used to find pmf of Pranav distribution, appropriate definition is given in the next paragraph. This method was firstly used by Good³ who proposed the discrete Good distribution to model for the frequencies of species. It was given as follows.

A random variable is said to have a discrete Good distribution if its pmf can be expressed as

$P (Y = y) = \frac{α^{y} y^{β}}{\sum_{j = 0}^{\infty} α^{j} j^{β}}; y = 0, 1, 2, ....$ (1)

where

β \in R and α \in (0, 1)

The method of infinite series is formulated by the definition which is given as below:

Definition: Let $X$ be a continuous random variable having pdf $f_{X} (x)$ with support on $R$ . Then the corresponding discrete random variable $Y$ has pmf given by

$P (Y = y) = P (y; θ) = \frac{f_{X} (y; θ)}{\sum_{j = - \infty}^{\infty} f_{X} (j; θ)}; y \in Z$ (1.2)

where $θ$ may be the vector of parameters indexing the distribution of $X$ .

This method has been used by many researchers to derive discretization of continuous distribution, such as, Kulasekara & Tonkyn,⁴ Doray & Luong,⁵ Sato et al.,⁶ Nekoukhou et al.,⁷ are some among others, who proposed a version of the method when the continuous random variable of interest is defined on . Thus, if the random variable is defined on , the pmf of can be defined as

$P (Y = y) = P (y; θ) = \frac{f_{X} (y; θ)}{\sum_{j = 0}^{\infty} f_{X} (j; θ)}; y \in Z_{+}$ (1.3)

Josmar et al.,⁸ using infinite series method has derived a discrete Shanker distribution (DSD) with parameter $θ > 0$ and having pmf

$P_{1} (y; θ) = \frac{{(e^{θ} - 1)}^{2} (θ + y) e^{- θ (y + 1)}}{1 + (e^{θ} - 1) θ}; y = 0, 1, 2, ...$ (1.4)

They have discussed its various statistical properties including its applications to model count datasets in their paper. Which is a discrete analogue of continuous Shanker distribution introduced by Shanker⁹ having pdf

$f_{1} (x; θ) = \frac{θ^{2}}{θ^{2} + 1} (θ + x) e^{- θ x}; x > 0, θ > 0$ (1.5)

Using same method of discretization, the pmf of discrete Lindley distribution (DLD) proposed by Berhane & Shanker¹⁰ is given by

$P_{2} (y; θ) = \frac{{(e^{θ} - 1)}^{2}}{e^{2 θ}} (1 + y) e^{- θ y}; y = 0, 1, 2, ...$ (1.6)

where the parameter $θ > 0$ .

They have discussed its important statistical properties including estimation of parameter of DLD and applied on some count datasets from engineering and biology in their paper. They showed its superiority over other discrete one parameter distributions such as Poisson Lindley distribution (PLD) proposed by Shankar,¹¹ Poisson Akash distribution (PAD) proposed by Shanker,¹² and DSD proposed by Josmer et al. ,⁸As mentioned above, DLD is a discrete analogue of continuous Lindley distribution introduced by Lindley¹³ having pdf

$f_{2} (x; θ) = \frac{θ^{2}}{θ + 1} (1 + x) e^{- θ x}; x > 0, θ > 0$ (1.7)

Recently, Berhane & Shanker,¹⁴ proposed a discrete Akash distribution (DAD) using infinite series method, the pmf of a discrete Akash distribution is given as

$P_{3} (y; θ) = \frac{{(e^{θ} - 1)}^{3}}{e^{θ} (e^{2 θ} - e^{θ} + 2)} (1 + y^{2}) e^{- θ y}; y = 0, 1, 2, ...$ (1.8)

They have discussed its important statistical properties including estimation of method and applied on some count datasets, and showed its superiority over DSD, DLD, PLD and PAD in their paper. Which is the discrete analogue of a continuous Akash distribution introduced by Shanker,¹⁵ its pdf is given as:

$f_{3} (x; θ) = \frac{θ^{3}}{θ^{2} + 2} (1 + x^{2}) e^{- θ x}; x > 0, θ > 0$ (1.9)

Shanker¹² proposed PAD, a Poisson mixture of Akash distribution, having pmf

$P_{4} (x; θ) = \frac{θ^{3}}{θ^{2} + 2} . \frac{x^{2} + 3 x + (θ^{2} + 2 θ + 3)}{{(θ + 1)}^{x + 3}}; x = 0, 1, 2, ... θ > 0$ (1.10)

He has discussed its important statistical properties including estimation of parameter along with applications of PAD in his paper. PAD was applied to count datasets and showed its superiority with PLD and other distribution of one parameter.

The PLD is a Poisson mixture of Lindley distribution introduced by Sankaran¹¹ and is defined by its pmf

$P_{5} (x, θ) = \frac{θ^{2} (x + θ + 2)}{{(θ + 1)}^{x + 3}}; x = 0, 1, 2, ..., θ > 0$ (1.11)

The main objective of this paper is to propose a discretization of Pranav distribution for the reason being that it was observed, Pranav distribution gives better fit than one parameter continuous Akash, Shaker, Sujatha, Lindley and Exponential distributions. Keeping this view in mind, it is hoped that it would be better than discrete Akash, discrete Shanker and discrete Lindley distributions and other one parameter discrete distributions.

A discrete Pranav distribution

The pdf and the cdf of a random variable

X

having Pranav distribution proposed by Shukla2 are given by

$f (x; θ) = \frac{θ^{4}}{θ^{4} + 6} (θ + x^{3}) e^{- θ x}; x > 0, θ > 0$ (2.1)

$F (x; θ) = 1 - [1 + \frac{θ x (θ^{2} x^{2} + 3 θ x + 6)}{θ^{4} + 6}] e^{- θ x}; x > 0, θ > 0$ (2.2)

Shukla2 has discussed its various statistical properties including moments based measures, hazard rate function, and other important properties along with Bonferroni and Lorenz curves and stress-strength reliability. Pranav distribution applied for modeling lifetime data from biomedical sciences and engineering and explained its superiority over Akash, Shanker, Ishita, Sujatha, and exponential distributions.

Using the above definition, the pmf of the discrete random variable $Y$ , corresponding to a continuous random variable $X$ follows Pranav distribution (2.1) with parameter $θ > 0$ , can be obtained as

$P (y; θ) = \frac{{(e^{θ} - 1)}^{4}}{e^{θ} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)} (θ + y^{3}) e^{- θ y}; y = 0, 1, 2, ...$ (2.3)

We would call this distribution, a discrete Pranav distribution (DPD). The nature and behavior of DPD for varying values of its parameter has been shown graphically in Figure 1. From the figure it was observed that pmf of DPD is increasing as increased values of (Figure 1).

Figure 1 The pmf plot of DPD for varying values of the parameter $θ$ .

The survival function, $S (y; θ)$ and the cumulative distribution function (cdf), $F (y; θ)$ of DPD can be obtained as

$S (y; θ) = [1 + \frac{y^{3} {(e^{θ} - 1)}^{3} + 3 e^{θ} y^{2} {(e^{θ} - 1)}^{2} + 3 y e^{θ} (e^{2 θ} - 1)}{θ {(e^{θ} - 1)}^{3} + (e^{2 θ} + 4 e^{θ} + 1)}] e^{- θ (y + 1)}; y = 0, 1, 2, ..., θ > 0$ (2.5)

$F_{2} (y; θ) = 1 - [1 + \frac{y^{3} {(e^{θ} - 1)}^{3} + 3 e^{θ} y^{2} {(e^{θ} - 1)}^{2} + 3 y e^{θ} (e^{2 θ} - 1)}{θ {(e^{θ} - 1)}^{3} + (e^{2 θ} + 4 e^{θ} + 1)}] e^{- θ (y + 1)}; y = 0, 1, 2, ..., θ > 0$

cdf graphs of DPD are presented in Figure 2.

Figure 2 The cdf plot of DPD for varying values of the parameter $θ$ .

Since $\frac{P (y + 1; θ)}{P (y; θ)} = [1 + \frac{3 y^{2} + 3 y + 1}{θ + y^{2}}] e^{- θ}$ is a decreasing function of $y \geq 3$ , $P (y; θ)$ is log-concave and therefore, the DPD has an increasing hazard rate. Further, ${[P (y; θ)]}^{2} \geq P (y - 1; θ) \cdot P (y + 1; θ)$ for $y \geq 3$ , which implies unimodality, by theorem 3 of Keilson & Gerber.¹⁶ The detailed about interrelationship between log-concavity, unimodality and increasing hazard rate of discrete distributions can be shown in Grandell.¹⁷

Mean variance and statistical constants

The probability generating function and the moment generating function (mgf) of DPD can be obtained as

$G (t) = \frac{{(e^{θ} - 1)}^{4}}{(θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)} [\frac{θ {(e^{θ} - t)}^{3} + t (e^{2 θ} + 4 t e^{θ} + t^{2})}{{(e^{θ} - t)}^{4}}], for t \neq e^{θ}$ and

$M (t) = \frac{{(e^{θ} - 1)}^{4}}{θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1} [\frac{θ {(e^{θ} - e^{t})}^{3} + e^{t} (e^{2 θ} + 4 e^{t} e^{θ} + e^{2 t})}{{(e^{θ} - e^{t})}^{4}}], for t \neq θ$

The first four moments about origin of DPD can thus be obtained as

$μ_{1} = \frac{θ {(e^{θ} - 1)}^{3} + e^{3 θ} + 11 e^{2 θ} + 11 e^{θ} + 1}{(e^{θ} - 1) (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}$

$μ_{2} = \frac{θ e^{4 θ} + e^{4 θ} - 2 θ e^{3 θ} + 26 e^{3 θ} + 66 e^{2 θ} + 2 θ e^{θ} + 26 e^{θ} - θ + 1}{{(e^{θ} - 1)}^{2} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}$

$μ_{3} = \frac{θ e^{5 θ} + e^{5 θ} + θ e^{4 θ} + 57 e^{4 θ} - 8 θ e^{3 θ} + 302 e^{3 θ} + 8 θ e^{2 θ} + 302 e^{2 θ} - θ e^{θ} + 57 e^{θ} - θ + 1}{{(e^{θ} - 1)}^{3} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}$

$μ_{4} = \frac{θ e^{6 θ} + e^{6 θ} + 8 θ e^{5 θ} + 120 e^{5 θ} - 19 θ e^{4 θ} + 1191 e^{4 θ} + 2416 e^{3 θ} + 19 θ e^{2 θ} + 1191 e^{2 θ} - 8 θ e^{θ} + 120 e^{θ} - θ + 1}{{(e^{θ} - 1)}^{4} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}$

Using the relationship

μ_{r} = E {(Y - μ_{1})}^{r} = \sum_{k = 0}^{r} (\begin{matrix} r \\ k \end{matrix}) μ_{k} {(- μ_{1})}^{r - k}

between central moments and moments about origin, the central moments of DPD are derived as

$μ_{2} = \frac{e^{θ} (\begin{array}{l} (θ + 1) θ e^{6 θ} + (22 - 6 θ) θ e^{5 θ} + (15 θ^{2} - 23 θ + 8) e^{4 θ} - (20 θ^{2} + 64 θ - 28) e^{3 θ} \\ + (15 θ^{2} + 95 θ + 72) e^{2 θ} - (6 θ^{2} + 22 θ - 28) e^{θ} + θ^{2} - 9 θ + 8 \end{array})}{{(e^{θ} - 1)}^{2} {(θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}^{2}}$

$μ_{3} = \frac{e^{θ} (\begin{array}{l} θ^{2} (θ + 1) e^{10 θ} - (8 θ^{3} - 47 θ^{2} + θ) e^{9 θ} + (27 θ^{3} - 96 θ^{2} + 15 θ) e^{8 θ} \\ - (48 θ^{3} + 342 θ^{2} + 60 θ - 8) e^{7 θ} + (42 θ^{3} + 1320 θ^{2} + 300 θ + 32) e^{6 θ} \\ - (1536 θ^{2} + 390 θ - 288) e^{5 θ} - (42 θ^{3} - 438 θ^{2} + 78 θ - 536) e^{4 θ} \\ + (48 θ^{3} + 486 θ^{2} + 204 θ + 536) e^{3 θ} - (27 θ^{3} + 393 θ^{2} - 132 θ - 288) e^{2 θ} \\ + (8 θ^{3} + 65 θ^{2} - 150 θ + 32) e^{θ} - θ^{3} + 10 θ^{2} - 17 θ + 8 \end{array})}{{(θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}^{3} {(e^{θ} - 1)}^{3}}$

$μ_{4} = \frac{e^{θ} (\begin{array}{l} θ^{3} (θ + 1) e^{14 θ} - (5 θ^{4} - 106 θ^{3} + θ^{2}) e^{13 θ} - (17 θ^{4} + 41 θ^{3} - 111 θ^{2} + θ) e^{12 θ} \\ + (230 θ^{4} - 3184 θ^{3} + 606 θ^{2} + 92 θ) e^{11 θ} - (979 θ^{4} - 12857 θ^{3} + 578 θ^{2} + 1136 θ) e^{10 θ} \\ + (2453 θ^{4} - 19414 θ^{3} - 6363 θ^{2} + 5620 θ + 208) e^{9 θ} \\ - (4125 θ^{4} - 2391 θ^{3} - 6813 θ^{2} - 165 θ + 2232) e^{8 θ} \\ + (4884 θ^{4} + 34800 θ^{3} + 22116 θ^{2} - 13416 θ + 8400) e^{7 θ} \\ - (4125 θ^{4} + 55749 θ^{3} + 49788 θ^{2} + 17856 θ - 20544) e^{6 θ} \\ + (2453 θ^{4} + 41054 θ^{3} + 33561 θ^{2} + 26664 θ + 30528) e^{5 θ} \\ - (979 θ^{4} + 13987 θ^{3} + 3047 θ^{2} - 6387 θ - 20544) e^{4 θ} \\ + (230 θ^{4} + 128 θ^{3} - 3522 θ^{2} - 5236 θ + 8400) e^{3 θ} \\ - (17 θ^{4} - 1291 θ^{3} + 642 θ^{2} + 2864 θ - 2232) e^{2 θ} \\ - (5 θ^{4} + 242 θ^{3} - 707 θ^{2} + 668 θ - 208) e^{θ} + θ^{4} - 11 θ^{3} + 27 θ^{2} - 25 θ + 8 \end{array})}{{(θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}^{4} {(e^{θ} - 1)}^{4}}$

The coefficient of variation (C.V), coefficient of skewness

(\sqrt{β_{1}})

, coefficient of kurtosis

(β_{2})

and index of dispersion

(γ)

of DPD can be obtained using the relationships below

$C . V = \frac{σ}{μ_{1}} \sqrt{β_{1}} = \frac{μ_{3}}{{(μ_{2})}^{3 / 2}} β_{2} = \frac{μ_{4}}{μ_{2}} γ = \frac{σ^{2}}{μ_{1}}$

Table 1 exhibits the nature and behavior of coefficient of variation (C.V), coefficient of skewness, coefficient of kurtosis and index of dispersion (ID) for varying values of the parameter $θ$ (Figure 3).

Figure 3 The plot of measures of descriptive statistics of DPD for varying values of the parameter $θ$ .

$θ$	Values of descriptive statistics
$θ$	Mean	Variance	C.V	Skewness	Kurtosis	ID
0.5	7.915	16.3833	0.5114	0.95	4.4328	2.0699
1	3.2844	5.288	0.7001	0.6689	3.6405	1.61
1.5	1.1918	2.2352	1.2545	1.3138	4.5122	1.8755
2	0.4147	0.7018	2.0197	2.4253	9.6591	1.692
2.5	0.1712	0.2418	2.8725	3.4854	17.7216	1.4125
3	0.0825	0.1013	3.8558	4.479	27.2341	1.2272
3.5	0.0438	0.0492	5.0675	5.5959	39.358	1.1239
4	0.0245	0.0261	6.6069	7.0201	57.6163	1.0682

Table 1 Values of descriptive statistics of DPD for varying values of $θ$

It is clear from Table 1 that the mean, variance, and index of dispersion of DPD are decreasing as increased values of the parameter $θ$ , whereas coefficient of variation, coefficient of skewness and coefficient of kurtosis of DPD are increasing as increased values of parameter $θ$ . $σ^{2} > μ$ , indicates that DPD can be a suitable model for over-dispersed data.

Maximum likelihood estimation

The likelihood function,

L

of (2.3) is given by

$L = {(\frac{{(e^{θ} - 1)}^{4}}{e^{θ} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)})}^{n} \prod_{i = 1}^{n} (θ + y_{i}^{3}) e^{- θ y_{i}}$

and its log likelihood function is

$\ln L = n \ln (\frac{{(e^{θ} - 1)}^{4}}{e^{θ} (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1)}) + \sum_{i = 1}^{n} \ln (θ + y_{i}^{3}) - n θ \bar{y}$

$\ln L = 4 n \ln (e^{θ} - 1) - θ - \ln (θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1) + \sum_{i = 1}^{n} \ln (θ + y_{i}^{3}) - n θ \bar{y}$

Differentiating above equation with respect

θ

, we have

$\frac{\partial \ln L}{\partial θ} = \frac{4 n e^{θ}}{e^{θ} - 1} - 1 - \frac{{(e^{θ} - 1)}^{3} + 3 θ e^{θ} {(e^{θ} - 1)}^{2} + 2 e^{2 θ} + 4 e^{θ}}{θ {(e^{θ} - 1)}^{3} + e^{2 θ} + 4 e^{θ} + 1} + \sum_{i = 1}^{n} \frac{1}{θ + y_{i}^{3}} - n \bar{y} = 0$

Above equation can be simplified to solve the value of parameter. In this paper, R-software is used to estimate for value of

θ

Application and goodness of fit

In this section, the goodness of fit of the DPD has been discussed with two count datasets. The dataset in Table 2 has been taken from Kemp & Kemp¹⁸ and dataset in Table 3 has been taken from Beall,¹⁹ detailed about the datasets can been shown in their paper. The proposed model is compared with DSD, DLD, PLD, PAD and DAD (Figure 4&5).²⁰

No. of error per group	Observed	Expected frequency
No. of error per group	Frequency	PLD	PAD	DLD	DSD	DAD	DPD
0	35	33.1	33.5	31	31.7	33.2	36
1	11	15.3	14.7	17.4	16.9	14.2	10.6
2	8	6.7	6.6	7.4	7.2	7.6	7.1
3	4	2.9	3	2.8	2.7	3.3	3.9
4	2	2	2.2	1.4	1.5	1.7	2.4
Total	60	60	60	60	60	60	60
	$\overset{⌢}{θ}$	1.7434	2.078	1.2678	1.2276	1.5404	1.689
	$χ^{2}$	1.8141	1.4185	3.3667	2.9963	1.0398	0.1712
	d.f.	1	2	1	1	2	2
	p-value	0.178	0.492	0.066	0.0837	0.595	0.919

Table 2 Distribution of mistakes in copying groups of random digits

No. of insects	Observed	Expected frequency
No. of insects	Frequency	PLD	PAD	DLD	DSD	DAD	DPD
0	33	31.5	32	29.6	30.3	31.6	34.4
1	12	14.2	13.6	16.2	15.6	13.2	9.8
2	6	6.1	6	6.6	6.4	6.9	6.4
3	3	2.5	2.6	2.4	2.4	2.9	3.4
4	1	1	1.1	0.8	0.8	1	1.4
5	1	0.7	0.7	0.4	0.5	0.4	0.6
Total	56	56	56	56	56	56	56
	$\overset{⌢}{θ}$	1.8115	2.1446	1.2993	1.2535	1.5686	1.7122
	$χ^{2}$	0.4598	0.2541	1.5422	1.1516	0.1747	0.6055
	d.f.	1	1	1	1	1	2
	p-value	0.498	0.614	0.215	0.283	0.676	0.739

Table 3 Observed and expected frequencies for distribution of Pyrausta nublilalis in 1937

Figure 4 Fitted plot of distributions on first data set.

Figure 5 Fitted plot of distributions on second data set..

Conclusion

In this paper, a discrete Pranav distribution (DPD) has been proposed. Its moment generating function, moments and moments based measures including statistical constants have been derived and their nature and behavior has been discussed numerically and graphically. The method of maximum likelihood estimation has been discussed for estimating its parameter. The goodness of fit of DPD has been explained using two real count datasets. The DPD gives better fit as compared to PLD, PAD, DLD, DSD and DAD in the presented datasets.