Parametric and some nonparametric statistical inferences and modeling are valid only under certain assumptions. One of most common assumptions in the literature is that of symmetry of the underlying distribution. If the underlying distribution is not symmetric the question becomes how to define the appropriate location and scale measures. Thus to choose the appropriate statistical analysis, we need to check for underlying assumptions, including symmetry. Most tests of symmetry available in the literature typically have low statistical power and fail to detect a small but meaningful asymmetry in the population. Examples of those tests have been suggested by Butler,1 Rothman and Woodroofe,2 Hill and Rao,3 McWilliams4 and Ozturk.5 McWilliams’s4 runs test of symmetry is more powerful than those provided by Butler,1 Rothman and Woodroofe,2 Hill and Rao3 against various asymmetric alternatives. Tajuddin,6 proposed a test for symmetry based on the Wilcoxon two-sample test and found his test to be more powerful than the runs test.
Baklizi7 suggested a runs test of symmetry based on the conditional distribution and demonstrated that it performed slightly better than the unconditional test by McWilliams.4 Baklizi‘s test is also very robust for misspecification of the median. Modarres and Gastwirth8-9 provided a modification to McWilliams4 runs test based on Wilcoxon scores to weigh the runs. Their procedure improved the power for testing symmetry when the center of the distribution is known. However, their test did not perform well when asymmetry is focused on regions close to the median. Samawi10 investigated the use of extreme ranked set sample (ERSS). Samawi et al.11 used (ERSS) to provide a more powerful runs test of symmetry. Finally, Samawi et al.12 used the overlap coefficient to test for symmetry and showed that their test procedure is competitive with the other available tests of symmetry. This paper uses ERSS to provide a more powerful overlap coefficient test of symmetry.
The overlap measure (OVL) is defined as the area of intersection of the graphs of two probability density functions. It measures the similarity, which is the agreement or the closeness of the two probability distributions. The OVL measure was originally introduced by Weitzman.13 Recently, several authors including Bradley and Piantadosi 14, Inman and Bradley 15, Clemons,16 Reiser and Faraggi,17 Clemons and Bradley,18 Mulekar and Mishra,19 Al-Saidy, et al.,20 Schmid and Schmidt,21 Al-Saleh and Samawi,22 and Samawi and Al-Saleh23 considered this measure. The sampling behavior of a nonparametric estimator of using naive kernel density estimation was examined by Clemons and Bradley,18 using Monte Carlo and bootstrap techniques.
Let
be two probability density functions. Assume samples of observations are drawn from continuous distributions. The overlap measure used in the literature is defined by Weitzman’s Measure (1970) as
The overlap measure of two densities assumes values between 0 and 1. An overlap value close to 0 indicates extreme inequality of the two density functions, and an overlap value of 1 indicates exact equality. In most statistical applications the data used is assumed to consist of a simple random sample (SRS). Cost savings of quantifying sampling units can be achieved by using ranked set sampling (RSS) methods as described by McIntyre24 to estimate the population mean. The procedure introduced by McIntyre was later called RSS. As a variation of RSS, an extreme ranked set sample (ERSS) is introduced and investigated by Samawi et al.10
First we describe the RSS procedure as follows: First, randomly sample a group of sampling units, of size
from the target population. Randomly partition the group into disjoint subsets each having a pre-assigned size r. In most practical situations, the size r will be two, three or four. Rank the elements in each subset by a suitable method of ranking such as prior information, visual inspection or by the subject-matter experimenter himself. The ith order statistic from the i-th subset,
, i = 1, …, r, will be quantified (actual measurement). Finally,
constitutes the RSS. This will represents one complete cycle. The procedure can be repeated m-times as needed, to get a RSS of size n=Mr. A detailed explanation of uni variate RSS and its variations may be found in Kaur et al.,25 Patil et al.,26 Kaur et al.27 and Sinha 28. However, an extreme ranked set sample (ERSS) of size n=2m, is described by Samawi et al.,10 is similar to RSS procedure except that we quantify only the minima or the maxima from each subject to get an ERSS as
.Now consider testing the null hypothesis of symmetry for an underlying absolutely continuous distribution
with density denoted by
:
for some x. It is clear that under the null hypothesis of symmetry, if we let
then the overlap measure is equal to one
, which will be our focus in this paper.
Samawi et al. 12 used the overlap measure Δ to develop a new test of symmetry based on kernel density estimation of Δ. The availability of kernel density estimation in some of statistical software also, motivated us to use Δ when the sample in hand is ERSS. This paper will introduce a powerful test of symmetry based on ERSS overlap measure. The overlap test of symmetry using ERSS and its asymptotic properties are introduced is Section 2. A simulation study is given in Section 3. Illustrations using cardiac output data from neonates along with final comments are given in Section 4.
Test of Symmetry Based on ERSS Overlap Measure
Let be
an ERSS random sample from an absolutely continuous and differentiable distribution
having known median. Without loss of generality, we will assume the median to be zero. When the median or the center of the distribution is unknown, the data can be centered by a consistent estimate of the median. The implications on the asymptotic properties resulting from centering the data on a consistent estimator of the median are not intuitively clear. Therefore, further investigations are needed to study the robustness of the proposed test of symmetry and compare it with other available tests of symmetry in case of an unknown median. In this paper we will discuss only the case when the median of the underlying distribution is assumed known.
Consider the test for symmetry
Under the assumption of symmetry,
. Let
be the density function of the first order statistics
and
be the density function of the rth order statistics
from random samples of size r respectively. Under the assumption of symmetry, it can be shown that
If we let
the null hypothesis of symmetry is equivalent to
and under the null hypothesis
Therefore, an equivalent hypothesis for testing the symmetry is
we propose using
as it is a consistent nonparametric estimator of Δ using ERSS. Under the null hypothesis of symmetry and some mild regularity assumptions, which will be discussed later in this paper, we will derive the asymptotic distribution
to use it as a test of symmetry, say:
, (1)
for large n=2m. An asymptotic significant test procedure at level
is to reject
if
,
where is the upper
percentile of the standard normal distribution.
Kernel estimation of ∆using ERSS
Based on the results of Schmid and Schmidt21 and Anderson et al.29 we will study the asymptotic properties using ERSS. Using one of the several available nonparametric density estimation procedures, see for example Wegman,30-31 Van Kerm32 and Chen and Kelton,33 one can use the overlap coefficient estimators for inferential purposes.
Let
be ERSS random sample from a differentiable distribution
having known median. Without a loss of generality assume the median to be zero. Let
denote a random sample of minimums in ERSS and
denote a random sample of maximums in ERSS, where
. We will use a kernel function K that satisfies the condition
(2)
The kernel K is normally considered as a symmetric density function with mean 0 and finite variance; an example is the standard normal density. The kernel estimators of
,respectively are
(3)
and
(4)
where
is the number of bins that depends on the sample size. In practice, we suggest to take
(or the default setting of the software). Also,
are the bandwidths of the kernel estimators satisfying the conditions that
as
There are many choices of the bandwidths
however, in our procedure we use [34] Silverman’s [33] suggestion as follows: Using the normal distribution as the parametric family, the bandwidths of the kernel estimators are
, (5)
where
=min{standard deviation of
interquantile range of
} and
=min{standard deviation of
interquantile range of (
}. These were found to be adequate choices of the bandwidth for minimizing the integrated mean squared error (IMSE),
(6)
The bins used are as follows: Let
, where
and
The bins will be selected as
where
,
is an initial value chosen based on the minimum value used in R’s calculation and
Using the aforementioned kernel estimator the nonparametric kernel estimator of
is given by
(7)
which can be approximated by a trapezoidal rule, resulting in
Asymptotic properties of
The nonparametric kernel estimator of
is based on the univariate kernel for density estimation,
. Some of the necessary regularity conditions imposed on the univariate kernel for density estimation sees for example Silverman,34 Wand and Jones35 and Schmid and Schmidt21 are stated below:
1.
2.
3.
4.
To show consistency of
, we use some of the kernel density asymptotic properties from Silverman33 and Wand and Jones34 under simple random sample (SRS). Under the assumptions 1-4 and assuming that the density
is continuous at each
, i=1, 2, …, C , and F(x) is absolutely continuous. Note that
are also absolutely continuous, therefore, the following apply:
(8)
(9)
Since both variances converge to zero and for
as
, then
for all continuous points,
of
Also, if
are uniformly continuous, then the kernel densities estimate are strongly consistent.
As in Samawi et al.12 we can redefine
as follows: for any two numbers a and b
then
Thus
can be written as
Using the above results and under the null hypothesis of symmetry, i.e.
, then,
(10)
To prove that
, where
=1 under the null hypothesis, we need to show that
.Now
Since,
for all continuous points w of
we have
Now for a given
, let
, we have
then
Clearly
in probability, i.e.
. Hence
The asymptotic distribution of
under the null hypothesis, using the results derived by Anderson et al.29 is as follows: Let
,
,
and
Let
,
,
and
.Under the above assumption, we have the following asymptotic result:
where,
,
are independent standard normal variables and
However, under the null hypothesis that
then the above result is reduced to
Simulation study
To get some insights about the performance of our new test of symmetry based on
we conducted the following simulation. We compared our proposed test of symmetry with its counterpart used by McWilliams,4 Modarres and Gastwirth 8 and Samawi et al.12 (using overlap measure) tests of symmetry.
McWilliams4 runs test is described as follows: For any random sample of size n, let
denote the sample values ordered from the smallest to largest according to their absolute value (signs are retained), and
denote indicator variables designating the sign of the
values [
]. Thus, the test statistic used for testing symmetry is
= the number of runs in
sequence=
, where .
The test is to reject the null hypothesis if
is smaller than a critical value
at level of significant
. However, the Modarres and Gastwirth8 test is
, where,
If p=0, terms are Wilcoxon scores. Otherwise, they are percentile-modified scores. The Modarres and Gastwirth9 test is the hybrid test of sign test in the first stage and a percentile-modified two-sample Wilcoxon test in the second stage
In this simulation, SAS version 9.2 {proc kde; method=srot} is used. The generalized lambda distribution see, Ramberg and Schmeiser36 is used in our simulation with the following set of parameters:
1-
2-
3-
4-
5-
6-
7-
8-
9-
To generate the observations we used
where
a uniform random number is. The significance level considered is
with sample sizes n=30, 50, and 100. Our simulation is based on 1,000 simulated samples. It is clear that 95% and 99% confidence intervals of the true probability of type I error under the null hypothesis with
are (0.0457, 0.0543) and (0.0435, 0.0575) respectively. Note that in the below tables, the values of skewnees
and kurtosis
are from McWilliams.4
Table 1a first showcases the estimated probability of type I error. Our test is an asymptotic test with a slight bias in
estimation and in the variance estimation for a small sample size. For sample sizes more than 30, the test seems to have an estimated probability of type I error close to the nominal value 0.05. Table 1a and Table 1b show that using
based test is more powerful than McWilliams4 and Baklizi.7 In all cases, our proposed procedure is even more efficient than the tests of symmetry proposed by Modarres and Gastwirth,8 Modarres and Gastwirth9 and Samawi et al.12 In all of the tests within the comparison, the power of all tests of symmetry increases as the sample size increases. Finally the power of
based tests increases as the set size r increases.
Illustration using noninvasive measurement of cardiac output by electrical velocimetry in neonate data
The samples selected in our illustration is from a study designed to evaluate the effectiveness of a new technology, Electrical Velocimetry (E.V.) for a non-invasive cardiac output (CO) and stroke volume (SV) in neonates.37 One of the research questions is whether the CO measure is the same for low birth-weight infants and non-low birth-weight infants. Low birth weight is defined as less than 1.5 kg. Thus we compared CO for neonates with birth-weight less than 1.5kg to neonates with birth-weight greater or equal to 1.5kg.
As it is frequently the case for this type of study, the underlying distribution is assumed “normal”, or at least symmetric. In either case, a test of symmetry is almost never considered in determining how to proceed in the analysis. Based on the conclusions of a test of symmetry, the analyst can choose the most powerful test for location. However, before deciding on the test procedure, we need to check the assumption of symmetry of underlying distribution of CO for the premature and term infants, with birth weight less than 1.5 gk and with birth weight greater or equal to 1.5 kg.
Distribution
|
n
|
|
Samawi et al.12
|
|
|
r=2
|
r=3
|
r=4
|
r=5
|
(1)
|
30
|
0.047
|
0.069
|
0.053
|
0.055
|
0.046
|
0.056
|
0.056
|
0.044
|
50
|
0.050
|
0.054
|
0.048
|
0.048
|
0.054
|
0.056
|
0.049
|
0.049
|
100
|
0.064
|
0.053
|
0.050
|
0.052
|
0.048
|
0.053
|
0.046
|
0.051
|
(2)
|
30
|
0.297
|
0.495
|
0.583
|
0.656
|
0.751
|
0.906
|
0.973
|
0.999
|
50
|
0.476
|
0.836
|
0.846
|
0.949
|
0.997
|
1.000
|
1.000
|
1.000
|
100
|
0.776
|
0.999
|
0.990
|
0.999
|
1.000
|
1.000
|
1.000
|
1.000
|
(3)
|
30
|
0.438
|
0.852
|
0.761
|
0.762
|
0.960
|
0.975
|
0.999
|
0.999
|
50
|
0.683
|
0.966
|
0.950
|
0.992
|
1.000
|
1.000
|
1.000
|
1.000
|
100
|
0.927
|
1.000
|
0.999
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
(4)
|
30
|
0.117
|
0.375
|
0.172
|
0.280
|
0.384
|
0.398
|
0.413
|
0.482
|
50
|
0.131
|
0.512
|
0.251
|
0.544
|
0.689
|
0.706
|
0.766
|
0.808
|
100
|
0.223
|
0.767
|
0.414
|
0.883
|
0.929
|
0.940
|
0.958
|
0.985
|
(5)
|
30
|
0.145
|
0.459
|
0.234
|
0.407
|
0.484
|
0.569
|
0.616
|
0.716
|
50
|
0.192
|
0.580
|
0.356
|
0.736
|
0.832
|
0.846
|
0.889
|
0.921
|
100
|
0.338
|
0.846
|
0.588
|
0.972
|
0.983
|
0.985
|
0.991
|
0.997
|
Table 1A Probability of Type I Error under the Null Hypothesis. ( )
Table 2a consists of two selected SRS samples of CO for neonates with birth weight less than 1.5 gk and neonates with birth weight more than 1.5 kg. Also, Table 2a consists of two selected ERSS samples of CO for both neonates with birth weight less than 1.5 kg and neonates with birth weight more than or equal to1.5 kg. Since CO measure and birth weight are positively correlated, the ranking was performed based on birth weights. ERSS samples in Table 2a consist of first half as the first order statistics and the second half as the third order statistics (the maximum).
Table 2b has the results of the runs and overlap tests of symmetry for the underlying distribution for CO patients. From all samples, we reject the assumption that the underlying distribution is symmetric. Table 2b shows the results of the Mann-Whitney test for two-independent samples. Table 2c shows that there is a significant difference on average in the CO measures between the low birth weight neonates (less than 1.5kg) and the non-low-birth-weight neonates (greater than or equal to 1.5kg).
Based on our simulation and real data example, the proposed test of symmetry based on ERSS sample overlap measure, appears to outperform the other tests of symmetry in the literature in terms of power. Our test is more sensitive to detect a slight asymmetry in the underlying distribution than other tests proposed in the literature. Drawing an ERSS is easier than the ordinary RSS and other RSS variations. Also, the kernel density estimation literature is very rich and many of the proposed and the improved methods are available on statistical software, such as SAS™, S-plus, Stata and R. Since overlap measures can be used in multivariate cases as well as in univariate cases, our proposed test of symmetry can be extended to multivariate cases for diagonal symmetry, conditional symmetry and other types of symmetry. In addition, our test procedure and kernel density estimation are valid under large sample size (n>30) and some regular conditions such as light tail underlying distribution functions. However, our simulation indicates that our procedure is still perform better than other test even for a sample size n=30 or larger and different underlying distributions.
Case #
|
n
|
|
|
|
|
r=2
|
r=3
|
r=4
|
r=5
|
(6)
|
30
|
0.050
|
0.155
|
0.055
|
0.068
|
0.253
|
0.304
|
0.327
|
0.431
|
50
|
0.056
|
0.166
|
0.060
|
0.077
|
0.500
|
0.717
|
0.739
|
0.747
|
100
|
0.051
|
0.207
|
0.068
|
0.130
|
0.793
|
0.875
|
0.911
|
0.942
|
(7)
|
30
|
0.090
|
0.196
|
0.096
|
0.166
|
0.325
|
0.410
|
0.458
|
0.520
|
50
|
0.097
|
0.236
|
0.125
|
0.284
|
0.656
|
0.667
|
0.758
|
0.835
|
100
|
0.124
|
0.354
|
0.176
|
0.589
|
0.891
|
0.946
|
0.963
|
0.974
|
(8)
|
30
|
0.534
|
1.000
|
0.830
|
0.806
|
1.000
|
1.000
|
1.000
|
1.000
|
50
|
0.744
|
1.000
|
0.972
|
0.995
|
1.000
|
1.000
|
1.000
|
1.000
|
100
|
0.972
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
(9)
|
30
|
0.560
|
1.000
|
0.865
|
0.808
|
1.000
|
1.000
|
1.000
|
1.000
|
50
|
0.816
|
1.000
|
0.985
|
0.997
|
1.000
|
1.000
|
1.000
|
1.000
|
100
|
0.976
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
1.000
|
Table 1B (continue) Power of Overlap based test and Run Tests under Alternative Hypotheses. ()
*Results are taken from Modarres and Gastwirth (1996) and Modarres and Gastwirth (1998) respectively
SRS
|
ERSS
|
Birth-weight <1.5
|
Birth-weight 1.5
|
Birth-weight <1.5
|
Birth-weight 1.5
|
0.05
|
0.21
|
0.050
|
0.100
|
0.06
|
0.16
|
0.050
|
0.120
|
0.06
|
0.12
|
0.050
|
0.132
|
0.05
|
0.20
|
0.060
|
0.220
|
0.05
|
0.13
|
0.060
|
0.150
|
0.06
|
0.22
|
0.060
|
0.228
|
0.06
|
0.22
|
0.090
|
0.178
|
0.08
|
0.15
|
0.080
|
0.182
|
0.10
|
0.22
|
0.080
|
0.215
|
0.08
|
0.20
|
0.080
|
0.158
|
0.08
|
0.23
|
0.070
|
0.155
|
0.08
|
0.25
|
0.070
|
0.218
|
0.07
|
0.23
|
0.080
|
0.208
|
0.11
|
0.16
|
0.128
|
0.220
|
0.08
|
0.22
|
0.128
|
0.350
|
0.13
|
0.16
|
0.208
|
0.350
|
0.16
|
0.22
|
0.165
|
0.360
|
0.12
|
0.33
|
0.162
|
0.420
|
0.18
|
0.26
|
0.188
|
0.270
|
0.16
|
0.31
|
0.188
|
0.440
|
0.16
|
0.29
|
0.262
|
0.510
|
0.19
|
0.27
|
0.372
|
0.510
|
0.37
|
0.51
|
0.358
|
0.520
|
0.16
|
0.51
|
0.222
|
0.540
|
0.18
|
0.52
|
0.202
|
0.520
|
0.11
|
0.52
|
0.182
|
0.520
|
|
0.53
|
|
0.530
|
|
0.73
|
|
0.730
|
Table 2A Selected samples of CO data
|
CO Measure
|
N
|
Runs test
|
P-value
|
OVL test
|
P-value
|
SRS
|
Birth-weight <1.5
|
26
|
6
|
0.003
|
-6.414
|
<0.00001
|
Birth-weight 1.5
|
28
|
4
|
<0.00001
|
-4.356
|
<0.00001
|
ERSS
|
Birth-weight <1.5
|
26
|
2
|
<0.00001
|
-14.540
|
<0.00001
|
Birth-weight 1.5
|
28
|
2
|
<0.00001
|
-4.729
|
<0.00001
|
Table 2B Runs test of symmetry with summary statistics
Table 2B has the results of the runs and overlap tests of symmetry for the underlying distribution for CO patients. From all samples, we reject the assumption that the underlying distribution is symmetric
|
Mann-Whitney Utest (difference of medians of CO between<1.5 kg and1.5kg weight)
|
Sample Type
|
Test
|
P-value
|
SRS
|
59.5
|
<0.00001
|
ERSS
|
118
|
<0.00001
|
Table 2C Mann-Whitney test for two-Independent Samples
Shows that there is a significant difference on average in the CO measures between the low birth weight neonates (less than 1.5kg) and the non-low-birth-weight neonates (greater than or equal to 1.5kg)