Second order optimality of sequential designs with application in software reliability estimation

doi:10.15406/bbij.2015.02.00037

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 2 Issue 4

Second order optimality of sequential designs with application in software reliability estimation

Kamel Rekab,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Xing Song

Department of Mathematics and Statistics, University of Missouri-Kansas City, USA

Correspondence: Kamel Rekab, Department of Mathematics and Statistics, University of Missouri-Kansas City, PO Box 32464 Kansas City, MO 64171, USA

Received: April 13, 2015 | Published: April 29, 2015

Citation: Rekab K, Song X. Second order optimality of sequential designs with application in software reliability estimation. Biom Biostat Int J. 2015;2(4):109-113. DOI: 10.15406/bbij.2015.02.00037

Download PDF

Abstract

We propose three efficient sequential designs in the software reliability estimation. The fully sequential design the multistage sequential design and the accelerated sequential design. These designs make allocation decisions dynamically throughout the testing process. We then refine these estimated reliabilities in an iterative manner as we sample. Monte Carlo simulation seems to indicate that these sequential designs are second order optimal.

Keywords: software reliability, partition testing, fully sequential design, multistage sequential design, accelerated sequential design

Introduction

Reliability of a system is an important aspect of any system design since any user of the system would expect some type of guarantee that the system will function to some level of confidence. Failing to meet such guarantee will result in disastrous consequences. On the other hand, overly exceeding such guarantee level may incur additional and unnecessary expense to the developers. Moreover, for any non-trivial software system, an exhaustive testing among the entire input domain can be very expensive. By adopting the partition testing strategy, we attempt to break up the testable input domain of possible test cases into partitions, which must be non-overlapping, such that if test case $i$ belongs to partition $j$ , then no partition other than $j$ will contain $i$ . Sayre and Poore^1–11 have given several possible mechanics to partition the domain into finitely many subdomains, $X_{i j} = {\begin{matrix} 1, if test j taken from partition i is processed correctly \\ 0, otherwise \end{matrix}$ , such that :
$D = \cup_{i = 1}^{k} D_{i}; D_{i} \cap D_{j} = \emptyset, i \neq j$ which allows us to define the system reliability by a weighted sum of reliabilities of these subdomains, i.e.
$R = \sum_{i = 1}^{k} p_{i} R_{i}$ Where $R$ denotes the system reliability $R_{i}$ and is the reliability of each subdomain $D_{i}$ ; and $p_{i}$ , parameters of the operational profile is the likelihood of this test case belongs to partition $D_{i}$ , which are assumed to be known.¹² As mentioned above, a complete testing of any software system of non-trivial size is practically impossible, $R_{i}$ are usually unknown parameters to us. So as to gain knowledge about $R_{i}$ , we must distribute the $k$ test cases among these $k$ partitions, and generate reasonable estimates for each. Specifically, we denote $n_{1}, n_{2}, \dots, n_{k}$ as sizes of the samples which are taken from sub domain $D_{1}, D_{2}, \dots,' D_{k}$

, respectively, where $\sum_{i = 1}^{k} n_{i} = N$

We model the outcome of the $j^{t h}$ taken from the $i^{t h}$ partition as a Bernoulli random variable $X_{i j}$ such that:

X_{i j} = {\begin{matrix} 1, if test j taken from partition i is processed correctly \\ 0, otherwise \end{matrix}

and each $X_{i j}$ follows a Bernoulli distribution with parameter $R_{i}$ . Then, the estimate of the overall system reliability, $R$ denoted by $\hat{R}$ can thus be defined as:

\hat{R} = 〉_{i = 1}^{k} p_{i} {\hat{R}}_{i}

where

{\hat{R}}_{i}

is the estimate of

R_{i}

after

n_{i}

test cases have been allocated to partition such that:

{\hat{R}}_{i} = \frac{\sum_{j = 1}^{n_{i}} X_{i j}}{n_{i}}

and

V a r (\hat{R}) = \sum_{i = 1}^{k} \frac{p_{i}^{2} R_{i} (1 - R_{i})}{n_{i}}

Optimal sampling scheme

Ideally, we would like to execute all possible test paths through the software and determine the true overall reliability of the system. In practice though, resources are often limited, sample test cases must be chosen and allocated strategically to attain the best reliability estimate possible given all kinds of constraints. One of the criteria of distributing test cases among the partitions, which proceeds from rewriting (1.1) as follows:

$V a r (\hat{R}) = \frac{{[\sum_{i = 1}^{k} p_{i} \sqrt{R_{i} (1 - R_{i})}]}^{2}}{N} + \frac{1}{N} \sum_{i = 1}^{k - 1} \sum_{j = i + 1}^{k} \frac{{[n_{i} p_{j} \sqrt{R_{j} (1 - R_{j})} - n_{j} p_{i} \sqrt{R_{i} (1 - R_{i})}]}^{2}}{n_{i} n_{j}}$

which is bounded below by the first term:

$V a r (\hat{R}) \geq \frac{{[\sum_{i = 1}^{k} p_{i} \sqrt{R_{i} (1 - R_{i})}]}^{2}}{N}$

with equality of (2.1) to hold is and only if:

$\frac{n_{i}}{n_{j}} = \frac{p_{i} \sqrt{R_{i} (1 - R_{i})}}{p_{j} \sqrt{R_{j} (1 - R_{i})}}$

for all . Hence, the optimal allocation is determined by:

$\frac{n_{i}}{N} = \frac{p_{i} \sqrt{R_{i} (1 - R_{i})}}{\sum_{j = 1}^{k} p_{j} \sqrt{R_{j} (1 - R_{i})}}$

for $i = 1, 2, \dots, k - 1$ , and

$n_{k} = N - 〉_{i = 1}^{k - 1} n_{i}$

and the variance incurred by this optimal allocation is:

$V a r (O) = \frac{{[\sum_{j = 1}^{k} p_{j} \sqrt{R_{j} (1 - R_{i})}]}^{2}}{N}$

Note that the optimal allocation depends on the actual reliabilities $R_{1}, R_{2}, \dots, R_{k},$ which are unknown. Therefore the optimal sampling scheme is not practical. It is this shortcoming that motivates us to adopt such dynamic allocation approaches that will be discussed in the following three sections.

Fully sequential sampling scheme

By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei⁵ proposed a fully sequential design shown to be first order optimal. The fully sequential design is defined as follows;

We first test one case from each of the partitions and estimate the reliability for each of these partitions.

Stage 1:

After cases have been allocated, $l \geq k,$ the next test will be taken from partition $i$ if for all $j$ ,

$\frac{n_{l, i}}{n_{l, j}} < \frac{p_{i} \sqrt{{\hat{R}}_{l, i} (1 - {\hat{R}}_{l, i})}}{p_{j} \sqrt{{\hat{R}}_{l, j} (1 - {\hat{R}}_{l, j})}}$

where $n_{l, i}$ is the cumulative test cases allocated to partition $i$ after $l$ tests have been allocated and the current estimated reliability for partition $i$ is determined by:

${\hat{R}}_{l, i} = \frac{\sum_{m = 1}^{n_{l, i}} X_{i m}}{n_{l, i}}$

Stage 2:

Repeat step 2 sequentially until all the test cases are allocated, and the final estimate of reliability for partition is:

${\hat{R}}_{i} = \frac{\sum_{m = 1}^{n_{N, i}} X_{i m}}{n_{N, i}}$

And thus, the estimate of the overall reliability of the system is:

$\hat{R} = 〉_{i = 1}^{k} p_{i} {\hat{R}}_{i}$

Multistage sequential sampling

By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson & Wei⁶ proposed a multistage sequential design shown to be first order optimal. Instead of making an allocation decision for each test at a time, the multistage sequential sampling allocates test cases among the partitions in stages in batches, where and are both pre-specified. The multistage sequential design is defined as follows:

Stage 1:

We start with an initial sample of test cases, which are allocated among the partitions with a balanced allocation scheme, such that:

$S_{1, i} = \frac{S_{1}}{k}$

and estimate the reliability for partition by:

${\hat{R}}_{1, i} = \frac{\sum_{m = 1}^{s_{1, i}} X_{i m}}{S_{1, i}}$

Therefore,

${\hat{C}}_{i} (S_{1}) = \frac{p_{i} \sqrt{{\hat{R}}_{1, i} (1 - {\hat{R}}_{1, i})}}{\sum_{j = 1}^{k} p_{j} \sqrt{{\hat{R}}_{1, j} (1 - {\hat{R}}_{1, j})}}$

Stage 2 through L:

At stage $j,$ for partition the test cases are distributed by the following way:

2 \leq j \leq L,

for partition

i = 1, 2, \dots, k - 1,

the test cases are distributed by the following way:

$S_{j, i} = (〉_{l = 1}^{j} S_{l}) {\hat{C}}_{i} ({\bar{S}}_{j - 1});$

and

$S_{j, k} = 〉_{l = 1}^{j} S_{l} - 〉_{i = 1}^{k - 1} S_{j, i}$

where

${\bar{S}}_{j - 1} = 〉_{y = 1}^{j - 1} S_{y}$

At the final stage , the cumulative number of tests allocated to partition is:

$N_{i} = \min {N - 〉_{j = 1, j \neq i}^{k - 1} S_{L - 1, j}, \max (N {\hat{C}}_{i} ({\bar{S}}_{L - 1}), S_{L - 1, i})}$

and

$N_{k} = N - 〉_{i = 1}^{k - 1} N_{i}$

Among several ways of determining the number of cases at each sampling stage, the simplest one is to select:

$S_{1} = S_{2} = \dots = N / L$

However, choosing stage sizes, especially the initial stage size, can be done by following some general criteria, a good initial stage size can be $\sqrt{N}$ for when a two stage sampling scheme is considered, and more generally, Rekab⁹ shows that for a two stage procedure, must be chosen such that:

$\lim_{N \to \infty} S_{1} = \infty, a n d \lim_{N \to \infty} \frac{S_{1}}{N} = 0$

Accelerated sampling scheme

By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei⁶ proposed an accelerated sequential design shown to be first order optimal. The accelerated sampling scheme combines the fully sequential sampling scheme and the multistage sampling scheme. It is defined as follows:

Stage 1:

The procedure starts with an initial sample $S_{1}$ , which satisfies the conditions specified in (4.1). Then, allocate $S_{1}$ equally among partitions

Stage 2 through $L - 1$

At stage $j$ $2 \leq j \leq L - 1,$ for partition $i = 1, 2, \dots, k - 1,$ the test cases are distributed by the following way:

$S_{j, i} = (〉_{l = 1}^{j} S_{l}) {\hat{C}}_{i} ({\bar{S}}_{j - 1});$

and

$S_{j, k} = 〉_{l = 1}^{j} S_{l} - 〉_{l = 1}^{k - 1} S_{j, i}$

where

${\bar{S}}_{j - 1} = 〉_{y = 1}^{j - 1} S_{y}$

At each stage, $S_{j}$ must satisfy the two conditions as $S_{1}$ .

Stage L:

In the final stage, we adopt a fully sequential approach by allocating one test from partition , if for all ,

$\frac{n_{l, i}}{n_{l, j}} < \frac{p_{i} \sqrt{{\hat{R}}_{l, i} (1 - {\hat{R}}_{l, i})}}{p_{j} \sqrt{{\hat{R}}_{l, j} (1 - {\hat{R}}_{l, j})}}$

until all the test cases have been allocated. Note that $n_{l, i}$ , $n_{l, i}$ are the cumulative test cases allocated to partition and after the allocations of a total of test cases, where

$\sum_{j = 1}^{L - 1} S_{j, i} \leq l \leq \sum_{j = 1}^{L - 1} S_{j, i} + N - \sum_{j = 1}^{L - 1} S_{j}$

Therefore, the estimate for the whole system is finally obtained as:

Optimality of sequential designs

First order optimality of these three sequential designs was established by Rekab, Thompson & Wu,^5–7 although the focus here is on minimizing the variance incurred by the sequential designs rather than minimizing the Bayes risk incurred by these designs. For estimating the mean difference of two independent normal populations, Woodroofe and Hardwick¹² adopted a quasi-Bayesian approach. They determined an asymptotic lower bound for the integrated risk and proposed a three-stage design that is second-order optimal. For estimating the mean difference of two general one -parameter exponential family, Rekab and Tahir¹⁰ adopted a fully Bayesian approach with conjugate priors. They determined an asymptotic second order lower bound for the Bayes risk.

Monte carlo simulations

We consider the case where the test domain is partitioned into two subdomains $D_{1}$ and $D_{2}$ with reliability $R$ and $R_{2}$ respectively with equal usage probabilities $p_{1}, p_{2} .$ Second order optimality of the three sequential designs is investigated through Monte Carlo simulations.

(R₁,R₂)	N=300	N=500	N=800	N=2000	N=5000	N=8000
0.1,0.9	28.1562	3.2104	0.739	2.4225	9.304	0.7133
0.5,0.2	29.038	6.102	0.63	7.9375	4.67	4.05
0.5,0.5	1.5972	0.146	0.212	0.9125	0.512	0.4933
0.5,0.9	80.7747	71.9032	19.951	7.96	5.796	1.2666
0.9,0.3	64.0246	53.8008	5.5497	3.2565	15.2546	2.6517

Table 1 $N^{2} * (V a r (Δ) - V a r (O))$ by Fully Sequential Scheme

(R₁,R₂)	N=300	N=500	N=800	N=2000	N=5000	N=8000
0.1,0.9	10.9835	0.0288	4.0718	55.7962	2.332	1.8422
0.5,0.2	11.8285	19.172	2.0555	5.51	0.8432	7.1258
0.5,0.5	0.02749	0.0459	0.175	0.4568	0.47324	0.1064
0.5,0.9	37.6665	31.6257	17.6417	3.6377	6.8483	9.2952
0.9,0.3	20.2034	2.86352	7.6161	3.1589	11.7869	5.2018

Table 2 $N^{2} * (V a r (Δ) - V a r (O))$ by Multistage Scheme

(R₁,R₂)	N=300	N=500	N=800	N=2000	N=5000	N=8000
0.1,0.9	30.5713	6.587	6.4568	17.4023	2.1138	1.0098
0.5,0.2	8.1424	6.1346	7.1683	0.8968	0.1462	2.1009
0.5,0.5	0.03801	0.23284	0.0319	0.7025	0.064	0.0392
0.5,0.9	37.6665	31.6257	17.6417	3.6377	6.8483	9.2952
0.9,0.3	48.4148	13.817	7.8473	8.4656	5.8734	1.6708

Table 3 $N^{2} * (V a r (Δ) - V a r (O))$ by Accelerated Scheme

Table I, II, III seem to indicate that the speed $N^{2} * (V a r (Δ) - V a r (O))$ is bounded.

Conclusion

Second optimal designs are more efficient than the first order optimal designs especially when the total number of cases is very large. This is the main argument that led us to investigate the second order optimality of these three dynamic designs. Simulation studies seem to indicate that these designs are second order optimal. We conjecture that second order optimally may be obtained theoretically as well.

It is very common in parametric estimation to use the squared error loss. However, in reliability estimation one should distinguish between the cost of overestimating and underestimating the system reliability. Examples of practical loss functions were presented by Stüger:²

$l (\hat{R}, R) = c_{o} {(\hat{R} - R)}^{2} I_{{\hat{R} > R}} + c_{u} {(R - \hat{R})}^{2} I_{{\hat{R} < R}}$

and by Granger:¹

$l (\hat{R}, R) = c_{o} (\hat{R} - R) I_{{\hat{R} > R}} + c_{u} (R - \hat{R}) I_{{\hat{R} < R}}$

where represents the overestimation and underestimation costs, respectively.

Acknowledgments

None.

Conflicts of interest

Authors declare that there are no conflicts of interests.

References

Sankaran M. The discrete Poisson-Lindley distribution. Biometrics. 1970;26(1):145–149.
Ghitany ME, Al-Mutairi DK. Estimation Methods for the discrete Poisson-Lindley distribution. Journal of Statistical Computation and Simulation. 2009;79(1):1–9.
Shanker R, Mishra A. A two-parameter Poisson-Lindley distribution. International Journal of Statistics and Systems. 2014;9(1):79–85.
Shanker R, Mishra A. A two-parameter Lindley distribution. Statistics in Transition new Series. 2013;14(1):45–56.
Shanker R, Mishra A. A quasi Poisson-Lindley distribution (submitted). 2015.
Shanker R, Mishra A. A quasi Lindley distribution. African journal of Mathematics and Computer Science Research. 2013;6(4):64–71.
Shanker R, Sharma S, Shanker R. A Discrete two-Parameter Poisson Lindley Distribution. Journal of Ethiopian Statistical Association. 2012;21:15–22.
Shanker R, Sharma S, Shanker R. A two-parameter Lindley distribution for modeling waiting and survival times data. Applied Mathematics. 2013;4:363–368.
Shanker R, Tekie AL. A new quasi Poisson-Lindley distribution. International Journal of Statistics and Systems. 2014;9(1):87–94.
Shanker R, Amanuel AG. A new quasi Lindley distribution. International Journal of Statistics and Systems. 2013;8(2):143–156.
Johnson NL, Kotz S, Kemp AW. Univariate Discrete Distributions, 2nd ed. John Wiley & sons Inc; 1992.
Fisher RA, Corpet AS, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology. 1943;12(1):42–58.
Kempton RA. A generalized form of Fisher’s logarithmic series. Biometrika. 1975;62(1):29–38.
Tripathi RC, Gupta RC. A generalization of the log-series distribution. Comm. in Stat. (Theory and Methods). 1985;14(8):1779–1799.
Mishra A, Shanker R. Generalized logarithmic series distribution-Its nature and applications. Proceedings of the Vth International Symposium on Optimization and Statistics. 2002:155–168.
Loeschke V, Kohler W. Deterministic and Stochastic models of the negative binomial distribution and the analysis of chromosomal aberrations in human leukocytes. Biometrische Zeitschrift. 1976;18:427–451.
Janardan KG, Schaeffer DJ. Models for the analysis of chromosomal aberrations in human leukocytes. Biometrical Journal. 1977;(8):599–612.
Mc Guire JU, Brindley TA, Bancroft TA. The distribution of European corn-borer larvae pyrausta in field corn. Biometrics. 1957;13:65–78.
Catcheside DG, Lea DE, Thoday JM. ypes of chromosome structural change induced by the irradiation on Tradescantia microspores. Journal of Genetics. 1946;47:113–136.
Catcheside DG, Lea DE, Thoday JM. The production of chromosome structural changes in Tradescantia microspores in relation to dosage, intensity and temperature. Journal of Genetics. 1946;47:137–149.
Lindley DV. Fiducial distributions and Bayes theorem. Journal of Royal Statistical Society Ser. B. 1958;20:102–107.