
 
 
Research Article Volume 2 Issue 4
     
 
	Second order optimality of sequential designs with application in software reliability estimation
 Kamel Rekab,  
    
 
   
    
    
  
    
    
   
      
      
        
        Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.
        
        
 
 
 
          
     
    
    
    
    
    
        
        
       
     
   
 
    
   Xing Song  
  
Department of Mathematics and Statistics, University of Missouri-Kansas City, USA
Correspondence: Kamel Rekab, Department of Mathematics and Statistics, University of Missouri-Kansas City, PO Box 32464 Kansas City, MO 64171, USA
Received: April 13, 2015 | Published: April 29, 2015
Citation: Rekab K, Song X. Second order optimality of sequential designs with application in software reliability estimation. Biom Biostat Int J. 2015;2(4):109-113. DOI: 10.15406/bbij.2015.02.00037
 Download PDF
        
       
 
 
 
 
 
Abstract
  We propose three efficient sequential designs in the software  reliability estimation. The fully sequential design the multistage sequential  design and the accelerated sequential design. These designs make allocation  decisions dynamically throughout the testing process. We then refine these  estimated reliabilities in an iterative manner as we sample. Monte Carlo  simulation seems to indicate that these sequential designs are second order  optimal.
Keywords: software  reliability, partition testing, fully sequential design, multistage sequential  design, accelerated sequential design
 
Introduction
  Reliability of a system is an important aspect of any system  design since any user of the system would expect some type of guarantee that  the system will function to some level of confidence. Failing to meet such  guarantee will result in disastrous consequences. On the other hand, overly  exceeding such guarantee level may incur additional and unnecessary expense to  the developers. Moreover, for any non-trivial software system, an exhaustive  testing among the entire input domain can be very expensive. By adopting the  partition testing strategy, we attempt to break up the testable input domain of  possible test cases into partitions, which must be non-overlapping, such that  if test case 
 belongs to partition 
 , then no partition other than 
will contain 
. Sayre and Poore1–11 have given several possible  mechanics to partition the domain into finitely many subdomains, 
, such that :
which allows us to define the system reliability by a weighted sum  of reliabilities of these 
 subdomains, i.e.
Where 
 denotes the system reliability 
and is the reliability of each subdomain 
 ; and  
, parameters of the operational profile is the  likelihood of this test case belongs to partition 
 , which are assumed to be known.12  As mentioned above, a complete testing of any  software system of non-trivial size is practically impossible, 
 are usually  unknown parameters to us. So as to gain knowledge about 
 , we must distribute the 
 test cases among these 
partitions, and generate reasonable estimates for each
. Specifically, we denote 
 as sizes of the samples which are taken from sub domain 
  , respectively, where 
  
  We model the outcome of the  
taken from the 
 partition as a Bernoulli random variable 
 such that:
 
  
    and each 
 follows a Bernoulli distribution with parameter 
. Then, the estimate of the overall system reliability,  
denoted by 
 can thus be defined as:
  
    where 
 is the estimate of 
 after 
 test cases have been allocated to partition  such that:
    
    and 
 
Optimal sampling scheme
  Ideally, we would like to execute all possible test paths through  the software and determine the true overall reliability of the system. In  practice though, resources are often limited, sample test cases must be chosen  and allocated strategically to attain the best reliability estimate possible  given all kinds of constraints. One of the criteria of distributing test cases  among the 
 partitions, which proceeds from rewriting  (1.1) as follows:
  
  which is bounded below by the first term:
  
  with equality of (2.1) to hold is and only if:
  
  for all 
 .  Hence, the optimal allocation is determined by:
  
  for 
 , and 
  
  and the variance incurred by this optimal allocation is:
  
  Note that the optimal allocation depends on the actual  reliabilities 
 which are unknown. Therefore the optimal  sampling scheme is not practical.  It is  this shortcoming that motivates us to adopt such dynamic allocation approaches  that will be discussed in the following three sections.
 
Fully sequential sampling scheme
  By adopting a fully Bayesian approach with Beta priors, Rekab,  Thompson and Wei5 proposed a fully sequential design shown to be first order  optimal.  The fully sequential design is  defined as follows;
  
    We first test one case from each of the  partitions and estimate the reliability for  each of these partitions.
Stage 1:
  After 
 cases have been allocated, 
 the next test will be taken from  partition 
 if for all ,  
  
  where 
 is the cumulative test cases  allocated to partition 
 after 
 tests have been allocated and the  current  estimated reliability for  partition 
 is determined by:
  
Stage 2: 
  Repeat step 2 sequentially until all the  test 
cases are allocated, and the final  estimate of reliability for partition
  is:
  
And thus, the estimate of the overall reliability of the system  is:
 
Multistage sequential sampling
  By adopting a fully Bayesian approach with Beta priors, Rekab,  Thompson & Wei6 proposed a multistage sequential design shown to be first  order optimal.  Instead of making an allocation decision  for each test at a time, the multistage sequential sampling allocates 
 test cases among the partitions in 
 stages in batches, where 
 and 
  are both pre-specified. The multistage  sequential design is defined as follows:
Stage 1:
  We start with an initial sample of 
 test cases, which are allocated among the 
 partitions with a balanced allocation scheme,  such that:
  
  and estimate the reliability for partition 
  by: 
  
  Therefore,
  
Stage 2 through L:
At stage
 for partition the test cases are distributed by the  following way:
for  partition  
the test cases are distributed by the following way:
and
where
At the final stage 
 ,  the cumulative number of tests allocated to partition 
 is:
and 
Among several ways of determining the number of cases at each sampling  stage, the simplest one is to select:
However, choosing stage sizes, especially the  initial stage size, can be done by following  some general criteria,  a good initial  stage size can be 
 for when a   two stage sampling scheme is considered, and more generally, Rekab9  shows that for  a two stage procedure, 
 must be chosen such that:
 
Accelerated sampling scheme
  By adopting a fully Bayesian approach with Beta priors, Rekab,  Thompson and Wei6 proposed an accelerated sequential design shown to be  first order optimal. The accelerated  sampling scheme combines the fully sequential sampling scheme and the  multistage sampling scheme. It is defined as follows:
Stage 1:
  The procedure starts with an initial sample 
, which satisfies the conditions  specified in (4.1). Then, allocate 
 equally among 
  partitions   
Stage 2 through 
At stage 
  
  for partition 
 the test cases are distributed by the  following way:
and
where
At each stage, 
 must satisfy the two conditions as 
 .
Stage L:
In the final stage, we adopt a fully sequential approach by  allocating one test from partition
 ,  if for all 
,
until all the 
test  cases have been allocated. Note that 
 , 
 are the cumulative test cases  allocated to partition 
  and 
 after the allocations of a total of 
 test cases, where
Therefore, the estimate for the whole system is finally obtained  as:
 
Optimality of sequential designs
  First  order optimality of these three sequential designs was established by Rekab,  Thompson & Wu,5–7 although the focus here is on minimizing the variance  incurred by the sequential designs rather than minimizing the Bayes risk  incurred by these designs. For estimating the mean difference of two  independent normal populations, Woodroofe and Hardwick12 adopted a  quasi-Bayesian approach. They determined an asymptotic lower bound for the  integrated risk and proposed a three-stage design that is second-order  optimal.  For estimating the mean  difference of two general one -parameter exponential family, Rekab and Tahir10 adopted a fully Bayesian approach with conjugate priors. They determined  an asymptotic second order lower bound for the Bayes risk.
 
Monte carlo simulations
  We consider the case where the test domain is partitioned into two  subdomains 
 and 
 with reliability 
 and 
 respectively with equal usage  probabilities 
Second order optimality of the three  sequential designs is investigated through Monte Carlo simulations.
    	
| 
 (R1,R2) 
 | 
 N=300 
 | 
 N=500 
 | 
 N=800 
 | 
 N=2000 
 | 
 N=5000 
 | 
 N=8000 
 | 
| 
 0.1,0.9 
 | 
 28.1562 
 | 
 3.2104 
 | 
 0.739 
 | 
 2.4225 
 | 
 9.304 
 | 
 0.7133 
 | 
| 
 0.5,0.2 
 | 
 29.038 
 | 
 6.102 
 | 
 0.63 
 | 
 7.9375 
 | 
 4.67 
 | 
 4.05 
 | 
| 
 0.5,0.5 
 | 
 1.5972 
 | 
 0.146 
 | 
 0.212 
 | 
 0.9125 
 | 
 0.512 
 | 
 0.4933 
 | 
| 
 0.5,0.9 
 | 
 80.7747 
 | 
 71.9032 
 | 
 19.951 
 | 
 7.96 
 | 
 5.796 
 | 
 1.2666 
 | 
| 
 0.9,0.3 
 | 
 64.0246 
 | 
 53.8008 
 | 
 5.5497 
 | 
 3.2565 
 | 
 15.2546 
 | 
 2.6517 
 | 
 
  Table 1   
    by Fully Sequential Scheme 
 
 
 
  
    	
| 
 (R1,R2) 
 | 
 N=300 
 | 
 N=500 
 | 
 N=800 
 | 
 N=2000 
 | 
 N=5000 
 | 
 N=8000 
 | 
| 
 0.1,0.9 
 | 
 10.9835 
 | 
 0.0288 
 | 
 4.0718 
 | 
 55.7962 
 | 
 2.332 
 | 
 1.8422 
 | 
| 
 0.5,0.2 
 | 
 11.8285 
 | 
 19.172 
 | 
 2.0555 
 | 
 5.51 
 | 
 0.8432 
 | 
 7.1258 
 | 
| 
 0.5,0.5 
 | 
 0.02749 
 | 
 0.0459 
 | 
 0.175 
 | 
 0.4568 
 | 
 0.47324 
 | 
 0.1064 
 | 
| 
 0.5,0.9 
 | 
 37.6665 
 | 
 31.6257 
 | 
 17.6417 
 | 
 3.6377 
 | 
 6.8483 
 | 
 9.2952 
 | 
| 
 0.9,0.3 
 | 
 20.2034 
 | 
 2.86352 
 | 
 7.6161 
 | 
 3.1589 
 | 
 11.7869 
 | 
 5.2018 
 | 
 
  Table 2   
    by Multistage Scheme 
 
 
 
  
    	| 
 (R1,R2) 
 | 
 N=300 
 | 
 N=500 
 | 
 N=800 
 | 
 N=2000 
 | 
 N=5000 
 | 
 N=8000 
 | 
| 
 0.1,0.9 
 | 
 30.5713 
 | 
 6.587 
 | 
 6.4568 
 | 
 17.4023 
 | 
 2.1138 
 | 
 1.0098 
 | 
| 
 0.5,0.2 
 | 
 8.1424 
 | 
 6.1346 
 | 
 7.1683 
 | 
 0.8968 
 | 
 0.1462 
 | 
 2.1009 
 | 
| 
 0.5,0.5 
 | 
 0.03801 
 | 
 0.23284 
 | 
 0.0319 
 | 
 0.7025 
 | 
 0.064 
 | 
 0.0392 
 | 
| 
 0.5,0.9 
 | 
 37.6665 
 | 
 31.6257 
 | 
 17.6417 
 | 
 3.6377 
 | 
 6.8483 
 | 
 9.2952 
 | 
| 
 0.9,0.3 
 | 
 48.4148 
 | 
 13.817 
 | 
 7.8473 
 | 
 8.4656 
 | 
 5.8734 
 | 
 1.6708 
 | 
 
  Table 3  
    by Accelerated Scheme 
  
 
 
  Table I, II, III seem to indicate  that the speed 
 is bounded.
 
 
Conclusion
  Second optimal designs are more efficient than the first order  optimal designs especially when the total number of cases is very large. This  is the main argument that led us to investigate the second order optimality of  these three dynamic designs. Simulation studies seem to indicate that these  designs are second order optimal.  We  conjecture that second order optimally may be obtained theoretically as well.
  
  It is very common in parametric estimation to use the squared  error loss. However, in reliability estimation one should distinguish between  the cost of overestimating and underestimating the system reliability.    Examples of practical loss functions were  presented by Stüger:2
  
  and by Granger:1
  
  where 
represents the overestimation and  underestimation costs, respectively.  
 
Acknowledgments
 Conflicts of interest
  Authors declare that there are no conflicts of interests.
 
References
  
    - Sankaran M. The discrete Poisson-Lindley distribution. Biometrics. 1970;26(1):145–149.
 
- Ghitany ME, Al-Mutairi DK. Estimation Methods for  the discrete Poisson-Lindley distribution. Journal  of Statistical Computation and Simulation. 2009;79(1):1–9.
 
- Shanker R, Mishra A. A two-parameter  Poisson-Lindley distribution. International  Journal of Statistics and Systems. 2014;9(1):79–85.
 
- Shanker  R, Mishra A. A two-parameter Lindley distribution. Statistics in Transition new Series. 2013;14(1):45–56.
 
- Shanker  R, Mishra A. A quasi Poisson-Lindley distribution (submitted). 2015.
 
- Shanker R, Mishra A. A quasi Lindley  distribution. African journal of  Mathematics and Computer Science Research. 2013;6(4):64–71.
 
- Shanker R, Sharma S, Shanker R. A Discrete  two-Parameter Poisson Lindley Distribution. Journal  of Ethiopian Statistical Association. 2012;21:15–22.
 
- Shanker  R, Sharma S, Shanker R. A two-parameter Lindley distribution for modeling waiting  and survival times data. Applied  Mathematics. 2013;4:363–368.
 
- Shanker  R, Tekie AL. A new quasi Poisson-Lindley distribution. International Journal of Statistics and Systems. 2014;9(1):87–94.
 
- Shanker  R, Amanuel AG. A new quasi Lindley distribution. International Journal of Statistics and Systems. 2013;8(2):143–156.
 
- Johnson  NL, Kotz S, Kemp AW. Univariate Discrete Distributions, 2nd ed. John Wiley  & sons Inc; 1992.
 
- Fisher RA, Corpet AS, Williams CB. The relation  between the number of species and the number of individuals in a random sample  of an animal population. Journal of  Animal Ecology. 1943;12(1):42–58.
 
- Kempton RA. A generalized form of Fisher’s  logarithmic series. Biometrika. 1975;62(1):29–38.
 
- Tripathi  RC, Gupta RC. A generalization of the log-series distribution. Comm. in Stat. (Theory and Methods). 1985;14(8):1779–1799.
 
- Mishra  A, Shanker R. Generalized logarithmic series distribution-Its nature and  applications. Proceedings of the Vth  International Symposium on  Optimization and Statistics. 2002:155–168.
 
- Loeschke  V, Kohler W. Deterministic and Stochastic models of the negative binomial  distribution and the analysis of chromosomal aberrations in human leukocytes. Biometrische Zeitschrift. 1976;18:427–451.
 
- Janardan  KG, Schaeffer DJ. Models for the analysis of chromosomal aberrations in human  leukocytes. Biometrical Journal. 1977;(8):599–612.
 
- Mc  Guire JU, Brindley TA, Bancroft TA. The distribution of European corn-borer  larvae pyrausta in field corn. Biometrics.  1957;13:65–78.
 
- Catcheside  DG, Lea DE, Thoday JM. ypes of chromosome structural change induced by the  irradiation on Tradescantia microspores. Journal  of Genetics. 1946;47:113–136.
 
- Catcheside  DG, Lea DE, Thoday JM. The production of chromosome structural changes in  Tradescantia microspores in relation to dosage, intensity and temperature. Journal of Genetics. 1946;47:137–149.
 
- Lindley  DV. Fiducial distributions and Bayes theorem. Journal of Royal Statistical Society Ser. B. 1958;20:102–107. 
 
  
 
  
  ©2015 Rekab, et al. This is an open access article distributed under the terms of the, 
 which 
permits unrestricted use, distribution, and build upon your work non-commercially.