A more efficient nonparametric test of symmetry based on overlapping coefficient

doi:10.15406/bbij.2014.01.00015

eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 1 Issue 3

A more efficient nonparametric test of symmetry based on overlapping coefficient

Hani M Samawi,

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

Robert Vogel

Jiann-Ping Hsu College of Public Health, Georgia Southern University, USA

Correspondence: Hani M Samawi, Jiann-Ping Hsu College of Public Health, Georgia Southern University, USA, Tel 912 478 1345

Received: November 14, 2014 | Published: December 16, 2014

Citation: Samawi HM, Vogel R. A more efficient nonparametric test of symmetry based on overlapping coefficient. Biom Biostat Int J. 2014;1(3):75-83. DOI: 10.15406/bbij.2014.01.00015

Download PDF

Abstract

In this paper we provide a more efficient nonparametric test of symmetry based on the empirical overlap coefficient using kernel density estimation applied to an extreme order statistics, namely extreme ranked set sampling. Our simulation investigation reveals that our proposed test of symmetry is at least as powerful as currently available tests of symmetry. Intensive simulation is conducted to examine the power of the proposed test. An illustration is provided using cardiac output and body weight of neonates in a neonatal intensive care unit.

Keywords: test of symmetry, power of the test, bootstrap method, overlap coefficients, weitzman’s measure, extreme ranked set sample, kernel density estimation, AMS, 62G10

Abbreviations

ERSS, extreme ranked set sample; SRS, simple random sample; RSS, ranked set sampling

Introduction

Parametric and some nonparametric statistical inferences and modeling are valid only under certain assumptions. One of most common assumptions in the literature is that of symmetry of the underlying distribution. If the underlying distribution is not symmetric the question becomes how to define the appropriate location and scale measures. Thus to choose the appropriate statistical analysis, we need to check for underlying assumptions, including symmetry. Most tests of symmetry available in the literature typically have low statistical power and fail to detect a small but meaningful asymmetry in the population. Examples of those tests have been suggested by Butler,¹ Rothman and Woodroofe,² Hill and Rao,³ McWilliams⁴ and Ozturk.⁵ McWilliams’s⁴ runs test of symmetry is more powerful than those provided by Butler,¹ Rothman and Woodroofe,² Hill and Rao³ against various asymmetric alternatives. Tajuddin,⁶ proposed a test for symmetry based on the Wilcoxon two-sample test and found his test to be more powerful than the runs test.

Baklizi⁷ suggested a runs test of symmetry based on the conditional distribution and demonstrated that it performed slightly better than the unconditional test by McWilliams.⁴ Baklizi‘s test is also very robust for misspecification of the median. Modarres and Gastwirth^8-9 provided a modification to McWilliams⁴ runs test based on Wilcoxon scores to weigh the runs. Their procedure improved the power for testing symmetry when the center of the distribution is known. However, their test did not perform well when asymmetry is focused on regions close to the median. Samawi¹⁰ investigated the use of extreme ranked set sample (ERSS). Samawi et al.¹¹ used (ERSS) to provide a more powerful runs test of symmetry. Finally, Samawi et al.¹² used the overlap coefficient to test for symmetry and showed that their test procedure is competitive with the other available tests of symmetry. This paper uses ERSS to provide a more powerful overlap coefficient test of symmetry.

The overlap measure (OVL) is defined as the area of intersection of the graphs of two probability density functions. It measures the similarity, which is the agreement or the closeness of the two probability distributions. The OVL measure was originally introduced by Weitzman.¹³ Recently, several authors including Bradley and Piantadosi ¹⁴, Inman and Bradley ¹⁵, Clemons,¹⁶ Reiser and Faraggi,¹⁷ Clemons and Bradley,¹⁸ Mulekar and Mishra,¹⁹ Al-Saidy, et al.,²⁰ Schmid and Schmidt,²¹ Al-Saleh and Samawi,²² and Samawi and Al-Saleh²³ considered this measure. The sampling behavior of a nonparametric estimator of using naive kernel density estimation was examined by Clemons and Bradley,¹⁸ using Monte Carlo and bootstrap techniques.

Let $f_{1} (x) and f_{2} (x)$ be two probability density functions. Assume samples of observations are drawn from continuous distributions. The overlap measure used in the literature is defined by Weitzman’s Measure (1970) as $Δ = \int \min {f_{1} (x), f_{2} (x)} d x .$

The overlap measure of two densities assumes values between 0 and 1. An overlap value close to 0 indicates extreme inequality of the two density functions, and an overlap value of 1 indicates exact equality. In most statistical applications the data used is assumed to consist of a simple random sample (SRS). Cost savings of quantifying sampling units can be achieved by using ranked set sampling (RSS) methods as described by McIntyre²⁴ to estimate the population mean. The procedure introduced by McIntyre was later called RSS. As a variation of RSS, an extreme ranked set sample (ERSS) is introduced and investigated by Samawi et al.¹⁰

First we describe the RSS procedure as follows: First, randomly sample a group of sampling units, of size $r^{2}$ from the target population. Randomly partition the group into disjoint subsets each having a pre-assigned size r. In most practical situations, the size r will be two, three or four. Rank the elements in each subset by a suitable method of ranking such as prior information, visual inspection or by the subject-matter experimenter himself. The ith order statistic from the i-th subset, $X_{i (i)}$ , i = 1, …, r, will be quantified (actual measurement). Finally, $X_{1 (1)}, X_{2 (2)}, ..., X_{r (r)}$ constitutes the RSS. This will represents one complete cycle. The procedure can be repeated m-times as needed, to get a RSS of size n=Mr. A detailed explanation of uni variate RSS and its variations may be found in Kaur et al.,²⁵ Patil et al.,²⁶ Kaur et al.²⁷ and Sinha ²⁸. However, an extreme ranked set sample (ERSS) of size n=2m, is described by Samawi et al.,¹⁰ is similar to RSS procedure except that we quantify only the minima or the maxima from each subject to get an ERSS as $X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}; X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}$ .Now consider testing the null hypothesis of symmetry for an underlying absolutely continuous distribution $F (.)$ with density denoted by $f (.)$ : $H_{0} : f (x) = f (- x)$ $versus H_{a} : f (x) \neq f (- x);$ for some x. It is clear that under the null hypothesis of symmetry, if we let $f_{1} (x) = f (x) and f_{2} (x) = f (- x)$ then the overlap measure is equal to one $(Δ = 1)$ , which will be our focus in this paper. Samawi et al. ¹² used the overlap measure Δ to develop a new test of symmetry based on kernel density estimation of Δ. The availability of kernel density estimation in some of statistical software also, motivated us to use Δ when the sample in hand is ERSS. This paper will introduce a powerful test of symmetry based on ERSS overlap measure. The overlap test of symmetry using ERSS and its asymptotic properties are introduced is Section 2. A simulation study is given in Section 3. Illustrations using cardiac output data from neonates along with final comments are given in Section 4.

Test of Symmetry Based on ERSS Overlap Measure Let be $X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}; X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}$ an ERSS random sample from an absolutely continuous and differentiable distribution $F (.)$ having known median. Without loss of generality, we will assume the median to be zero. When the median or the center of the distribution is unknown, the data can be centered by a consistent estimate of the median. The implications on the asymptotic properties resulting from centering the data on a consistent estimator of the median are not intuitively clear. Therefore, further investigations are needed to study the robustness of the proposed test of symmetry and compare it with other available tests of symmetry in case of an unknown median. In this paper we will discuss only the case when the median of the underlying distribution is assumed known. Consider the test for symmetry $H_{0} : f (x) = f (- x)$ $versus H_{a} : f (x) \neq f (- x); for some x .$ Under the assumption of symmetry, $F (x) = 1 - F (- x)$ . Let $f_{(1)} (x)$ be the density function of the first order statistics $X_{(1)}$ and $f_{(r)} (x)$ be the density function of the rth order statistics $X_{(r)} 9$ from random samples of size r respectively. Under the assumption of symmetry, it can be shown that $f_{(1)} (- x) = f_{(r)} (x) .$ If we let $f_{1} (x) = f_{(1)} (- x) and f_{2} (x) = f_{(r)} (x),$ the null hypothesis of symmetry is equivalent to $H_{o} : f_{(1)} (- x) = f_{(r)} (x)$ and under the null hypothesis $Δ = 1.$ Therefore, an equivalent hypothesis for testing the symmetry is $H_{0} : Δ = 1$ $versus H_{a} : Δ < 1.$ we propose using ${\hat{Δ}}_{E R S S}$ as it is a consistent nonparametric estimator of Δ using ERSS. Under the null hypothesis of symmetry and some mild regularity assumptions, which will be discussed later in this paper, we will derive the asymptotic distribution ${\hat{Δ}}_{E R S S}$ to use it as a test of symmetry, say: $z_{0} \sim N (0, 1)$ , (1)

for large n=2m. An asymptotic significant test procedure at level $α$ is to reject $H_{0}$ if $z_{0} < - z_{α}$ , $z_{α}$ where is the upper $α$ percentile of the standard normal distribution.

Kernel estimation of ∆using ERSS

Based on the results of Schmid and Schmidt²¹ and Anderson et al.²⁹ we will study the asymptotic properties using ERSS. Using one of the several available nonparametric density estimation procedures, see for example Wegman,^30-31 Van Kerm³² and Chen and Kelton,³³ one can use the overlap coefficient estimators for inferential purposes.

Let $X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}; X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}$ be ERSS random sample from a differentiable distribution $F (.)$ having known median. Without a loss of generality assume the median to be zero. Let $X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}$ denote a random sample of minimums in ERSS and $X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}$ denote a random sample of maximums in ERSS, where $n = 2 m$ . We will use a kernel function K that satisfies the condition
$\int_{- \infty}^{\infty} K (x) d x = 1$ (2)

The kernel K is normally considered as a symmetric density function with mean 0 and finite variance; an example is the standard normal density. The kernel estimators of $f_{(1)} (- w_{i}) and f_{(r)} (w_{i}), i = 1, 2, ..., C$ ,respectively are
${\hat{f}}_{(1)} (- w_{i}) = \frac{1}{m h_{-}} \sum_{j = 1}^{m} K (\frac{- w_{i} - X_{j (1)}}{h_{-}})$ (3)
and ${\hat{f}}_{(r)} (w_{i}) = \frac{1}{m h_{+}} \sum_{j = 1}^{m} K (\frac{w_{i} - X_{j (r)}}{h_{+}})$ (4) where $C$ is the number of bins that depends on the sample size. In practice, we suggest to take $C = integer of \sqrt{m}$ (or the default setting of the software). Also, $h_{-} and h_{+}$ are the bandwidths of the kernel estimators satisfying the conditions that $h_{-}, h_{+} > 0, h_{-}, h_{+} \to 0 and (m h_{-} \to \infty, m h_{+} \to \infty)$ as $m \to \infty .$ There are many choices of the bandwidths $(h_{-}, h_{+}),$ however, in our procedure we use [34] Silverman’s [33] suggestion as follows: Using the normal distribution as the parametric family, the bandwidths of the kernel estimators are $h_{-} = 0.9 A_{-} {(m)}^{- 1 / 5} and h_{+} = 0.9 A_{+} {(m)}^{- 1 / 5}$ , (5)

where $A_{-}$ =min{standard deviation of $(X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}),$ interquantile range of $(X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)}) / 1.349$ } and $A_{+}$ =min{standard deviation of $(X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}),$ interquantile range of ( $(X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}) / 1.349$ }. These were found to be adequate choices of the bandwidth for minimizing the integrated mean squared error (IMSE),
$I M S E = \int E {[{\hat{f}}_{(i)} (x) - f_{(i)} (x)]}^{2} d x, where i = 1, r .$ (6)

The bins used are as follows: Let $R_{1} = M a x (R_{(1)}, R_{(r)})$ , where $R_{(1)} = r a n g e (X_{1 (1)}, X_{2 (1)}, ..., X_{m (1)})$ and $R_{(r)} = r a n g e (X_{1 (r)}, X_{2 (r)}, ..., X_{m (r)}) .$ The bins will be selected as $w_{i} = w_{i - 1} + δ_{x},$ where $i = 2, ..., C$ , $w_{1}$ is an initial value chosen based on the minimum value used in R’s calculation and $δ_{x} = \frac{R_{1}}{C} .$

Using the aforementioned kernel estimator the nonparametric kernel estimator of $Δ$ is given by
${\hat{Δ}}_{E R S S} = \int_{ℝ} \min ({\hat{f}}_{(1)} (- w_{i}), {\hat{f}}_{(r)} (w_{i})) d w,$ (7)

which can be approximated by a trapezoidal rule, resulting in
${\hat{Δ}}_{E R S S} \approx \sum_{i = 1}^{C} \frac{δ_{x}}{2} [\min ({\hat{f}}_{(1)} (- w_{i}), {\hat{f}}_{(r)} (w_{i})) + \min ({\hat{f}}_{(1)} (- w_{i - 1}), {\hat{f}}_{(r)} (w_{i - 1}))] .$ Asymptotic properties of ${\hat{Δ}}_{E R S S}$

The nonparametric kernel estimator of $Δ$ $({\hat{Δ}}_{E R S S})$ is based on the univariate kernel for density estimation, $K : ℝ \to ℝ$ . Some of the necessary regularity conditions imposed on the univariate kernel for density estimation sees for example Silverman,³⁴ Wand and Jones³⁵ and Schmid and Schmidt²¹ are stated below:
1. $\int_{ℝ} K (z) d z = 1. 9$
2. $\int_{ℝ} z^{β} K (z) d z = 0 for any β = 1, ..., d - 1, and \int_{ℝ} | z |^{d} K (z) d z < \infty, 1 < d \in ℕ .$
3. $R = \int_{ℝ} K^{2} (z) d z < \infty .$
4. $h_{-}, h_{+} > 0, h_{-}, h_{+} \to 0, (m h_{-} \to \infty, m h_{+} \to \infty) and (\frac{m h_{-}}{\log m} \to \infty, \frac{m h_{+}}{\log m} \to \infty) .$

To show consistency of ${\hat{Δ}}_{E R S S}$ , we use some of the kernel density asymptotic properties from Silverman³³ and Wand and Jones³⁴ under simple random sample (SRS). Under the assumptions 1-4 and assuming that the density $f : ℝ \to ℝ$ is continuous at each $w_{i}$ , i=1, 2, …, C , and F(x) is absolutely continuous. Note that $f_{(1)} (x) = r {[1 - F (x)]}^{r - 1} f (x) and f_{(r)} (x) = r {[F (x)]}^{r - 1} f (x)$ are also absolutely continuous, therefore, the following apply:
$B i a s ({\hat{f}}_{(1)} (- w_{i})) = o {(1)}_{-} and B i a s ({\hat{f}}_{(r)} (w_{i})) = o {(1)}_{+}$ (8)
$\begin{array}{l} V a r ({\hat{f}}_{(1)} (- w_{i})) = \frac{f_{(1)} (- w_{i})}{m h_{-}} \int_{ℝ} K^{2} (z) d z + o (\frac{1}{m h_{-}}) and \\ V a r ({\hat{f}}_{(r)} (w_{i})) = \frac{f_{(r)} (w_{i})}{m h_{+}} \int_{ℝ} K^{2} (z) d z + o (\frac{1}{m h_{+}}) . \end{array}$ (9) Since both variances converge to zero and for $h_{-}, h_{+} > 0, h_{-}, h_{+} \to 0 and (m h_{-} \to \infty, m h_{+} \to \infty)$ as $m \to \infty$ , then ${\hat{f}}_{(1)} (- w_{i}) \to^{P} f_{(1)} (- w_{i}) and {\hat{f}}_{(r)} (w_{i}) \to^{P} f_{(r)} (w_{i}),$ for all continuous points, $w_{i}$ of $f_{(1)} and f_{(r)} .$ Also, if $f_{(i)} (.), i = 1, r$ are uniformly continuous, then the kernel densities estimate are strongly consistent. As in Samawi et al.¹² we can redefine $Δ$ as follows: for any two numbers a and b $\min (a, b) = \frac{a + b}{2} - \frac{| a - b |}{2},$ then $Δ = 1 - \frac{1}{2} \int | f_{(1)} (- x) - f_{(r)} (x) | d x .$ Thus ${\hat{Δ}}_{E R S S} \approx \sum_{i = 1}^{C} \frac{δ_{x}}{2} [\min ({\hat{f}}_{(1)} (- w_{i}), {\hat{f}}_{(r)} (w_{i})) + \min ({\hat{f}}_{(1)} (- w_{i - 1}), {\hat{f}}_{(r)} (w_{i - 1}))],$ can be written as
$\begin{array}{l} {\hat{Δ}}_{E R S S} = 1 - \frac{1}{2} \int | {\hat{f}}_{K (1)} (- w) - {\hat{f}}_{K (r)} (w) | d w \\ \approx 1 - \frac{1}{2} \sum_{i = 1}^{C} \frac{δ_{x}}{2} [| {\hat{f}}_{(1)} (- w_{i}) - {\hat{f}}_{(r)} (w_{i}) | + | {\hat{f}}_{(1)} (- w_{i - 1}) - {\hat{f}}_{(r)} (w_{i - 1}) |] . \end{array}$

Using the above results and under the null hypothesis of symmetry, i.e. $f_{(1)} (- w_{i}) = f_{(r)} (w_{i})$ , then,
$\begin{array}{l} | {\hat{f}}_{(1)} (- w) - {\hat{f}}_{(r)} (w) | = | ({\hat{f}}_{(1)} (- w) - f_{(1)} (- w)) + (f_{(r)} (w) - {\hat{f}}_{(r)} (w)) | \\ < | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | + | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | \to^{P} 0, \end{array}$ (10)

To prove that ${\hat{Δ}}_{E R S S} \overset{P}{\to} Δ$ , where $Δ$ =1 under the null hypothesis, we need to show that
$| {\hat{Δ}}_{E R S S} - Δ | \overset{p}{\to} 0$ .Now
$\begin{array}{l} | {\hat{Δ}}_{E R S S} - Δ | = \frac{1}{2} | \int | {\hat{f}}_{(1)} (- w) - {\hat{f}}_{(r)} (w) | | d w - | \int | f_{(1)} (- w) - f_{(r)} (w) | d w \\ \leq \frac{1}{2} \int [| | {\hat{f}}_{(1)} (- w) - {\hat{f}}_{(r)} (w) | - | f_{(1)} (- w) - f_{(r)} (w) | |] d w \\ \leq \frac{1}{2} \int | [{\hat{f}}_{(1)} (- w) - {\hat{f}}_{(r)} (w)] - [f_{(1)} (- w) - f_{(r)} (w)] | d w \\ \leq \frac{1}{2} \int | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | d w + \frac{1}{2} \int | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | d w . \end{array}$ Since, ${\hat{f}}_{(1)} (- w) \overset{p}{\to} f_{(1)} (- w) and {\hat{f}}_{(r)} (w) \overset{p}{\to} f_{(r)} (w)$ for all continuous points w of $f_{(1)} and f_{(r)},$ we have
$\lim_{m \to \infty} | {\hat{Δ}}_{E R S S} - Δ | \leq \frac{1}{2} \lim_{m \to \infty} \int | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | d w + \frac{1}{2} \lim_{m \to \infty} \int | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | d w .$

Now for a given $ε > 0$ , let $A = {w : | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | > ε / 2} and A_{1} = {w : | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | > ε / 2}$ , we have
$\begin{array}{l} \lim_{m \to \infty} | {\hat{Δ}}_{E R S S} - Δ | \leq \frac{1}{2} \lim_{m \to \infty} \int | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | d w + \frac{1}{2} \lim_{m \to \infty} \int | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | d w \\ = \frac{1}{2} \lim_{m \to \infty} {\int_{A} | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | d w + \int_{A^{c}} | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | d w \\ + \int_{A_{1}} | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | d w + \int_{A_{1}^{c}} | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | d w}, \end{array}$
then
$\begin{array}{l} \lim_{m \to \infty} | {\hat{Δ}}_{E R S S} - Δ | \leq \frac{1}{2} \lim_{m \to \infty} {P ({w : | {\hat{f}}_{(1)} (- w) - f_{(1)} (- w) | > ε / 2}) + ε / 2 \\ + P ({w : | f_{(r)} (w) - {\hat{f}}_{(r)} (w) | > ε / 2}) + ε / 2} . \end{array}$

Clearly $\lim_{m \to \infty} | {\hat{Δ}}_{E R S S} - Δ | \to 0$ in probability, i.e. $| {\hat{Δ}}_{E R S S} - Δ | \overset{p}{\to} 0$ . Hence

{\hat{Δ}}_{E R S S} \overset{p}{\to} Δ .

The asymptotic distribution of

${\hat{Δ}}_{E R S S}$ under the null hypothesis, using the results derived by Anderson et al.²⁹ is as follows: Let $f_{1} (x) = f_{(1)} (- x) and f_{2} (x) = f_{(r)} (x)$ ,
$C_{f_{1}, f_{2}} = {x \in ℝ : f_{1} (x) = f_{2} (x) > 0},$ , $C_{f_{1}} = {x \in ℝ : f_{1} (x) < f_{2} (x) > 0}$ and $C_{f_{2}} = {x \in ℝ : f_{1} (x) > f_{2} (x) > 0} .$ Let $n_{1} = n_{2} = m$ , $h_{-} = h_{+} = h$ $p_{0} = P (X \in C_{f_{1}, f_{2}})$ , $p_{1} = P (X \in C_{f_{1}})$ and $p_{2} = P (X \in C_{f_{2}})$ .Under the above assumption, we have the following asymptotic result:
$\sqrt{m} (\hat{Δ} - Δ) - a_{m} \Rightarrow N (0, v),$ where, $v = p_{0} σ_{0}^{2} + σ_{1}^{2}$ , $σ_{1}^{2} = p_{1} (1 - p_{1}) + p_{2} (1 - p_{2}),$ $a_{m} = \sqrt{\frac{R}{h}} \int_{C_{f_{1,} f_{2}}} f_{(r)}^{1 / 2} (x) d x . E (\min {Z_{1}, Z_{2}}),$
$σ_{0}^{2} = R \int_{T_{0}} (cov (\min {Z_{1}, Z_{2}}, \min {ρ (t) Z_{1} + \sqrt{1 - ρ {(t)}^{2}} Z_{3}, ρ (t) Z_{2} + \sqrt{1 - ρ {(t)}^{2}} Z_{4}}) d t,$
$ρ (t) = \frac{1}{R} \int_{ℝ} K (u) K (u + t) d u, R = \int_{ℝ} K^{2} (u) d u,$ $Z_{1}, Z_{2}, Z_{3} and Z_{4}$ are independent standard normal variables and $T_{0} = {t \in ℝ : | t | < 1} .$ However, under the null hypothesis that $f_{1} (x) = f_{(1)} (- x) = f_{2} (x) = f_{(r)} (x)$ then the above result is reduced to
$\sqrt{m} {({\hat{Δ}}_{E R S S} - Δ)}_{H_{0}} - a_{m} \Rightarrow N (0, σ_{0}^{2}) .$

Simulation study

To get some insights about the performance of our new test of symmetry based on ${\hat{Δ}}_{E R S S}$ we conducted the following simulation. We compared our proposed test of symmetry with its counterpart used by McWilliams,⁴ Modarres and Gastwirth ⁸ and Samawi et al.¹² (using overlap measure) tests of symmetry.

McWilliams⁴ runs test is described as follows: For any random sample of size n, let $Y_{(1)}, Y_{(2), ...,} Y_{(n)}$ denote the sample values ordered from the smallest to largest according to their absolute value (signs are retained), and $S_{1}, S_{2, ...,} S_{n}$ denote indicator variables designating the sign of the $Y_{(j)}$ values [ $S_{j} = 1 if Y_{(j)} is nonnegative, 0 otherwise$ ]. Thus, the test statistic used for testing symmetry is $R^{*}$ = the number of runs in $S_{1}, S_{2, ...,} S_{n}$ sequence= $1 + \sum_{j = 2}^{n} I_{j}$ , where .
$I_{j} = {\begin{matrix} 0 if S_{j} = S_{j - 1} \\ 1 if S_{j} \neq S_{j - 1} \end{matrix}$

The test is to reject the null hypothesis if $R^{*}$ is smaller than a critical value $(c_{α})$ at level of significant $α$ . However, the Modarres and Gastwirth⁸ test is $M_{p} = 1 + \sum_{j = 2 + n_{p}}^{n} ϕ (j) I_{j}$ , where,
$ϕ (j) = {\begin{matrix} j - n_{p} if j> n_{p} \\ 0 otherwise, \end{matrix}$
$and n_{p} is an interger .$

If p=0, terms are Wilcoxon scores. Otherwise, they are percentile-modified scores. The Modarres and Gastwirth⁹ test is the hybrid test of sign test in the first stage and a percentile-modified two-sample Wilcoxon test in the second stage $(W_{0.80}) .$

In this simulation, SAS version 9.2 {proc kde; method=srot} is used. The generalized lambda distribution see, Ramberg and Schmeiser³⁶ is used in our simulation with the following set of parameters:
1- $λ_{1} = 0, λ_{2} = 0.197454, λ_{3} = 0.134915, λ_{4} = 0.134915, (Symmetric)$
2- $λ_{1} = 0, λ_{2} = 1, λ_{3} = 1.4, λ_{4} = 0.25,$
3- $λ_{1} = 0, λ_{2} = 1, λ_{3} = 0.00007, λ_{4} = 0.1,$
4- $λ_{1} = 3.586508, λ_{2} = 0.04306, λ_{3} = 0.025213, λ_{4} = 0.094029,$
5- $λ_{1} = 0, λ_{2} = - 1, λ_{3} = - 0.0075, λ_{4} = - 0.03,$
6- $λ_{1} = - 0.116734, λ_{2} = - 0.351663, λ_{3} = - 0.13, λ_{4} = - 0.16,$
7- $λ_{1} = 0, λ_{2} = - 1, λ_{3} = - 0.1, λ_{4} = - 0.18,$
8- $λ_{1} = 0, λ_{2} = - 1, λ_{3} = - 0.001, λ_{4} = - 0.13,$
9- $λ_{1} = 0, λ_{2} = - 1, λ_{3} = - 0.0001, λ_{4} = - 0.17.$ To generate the observations we used $x_{i} = λ_{1} + \frac{1}{λ_{2}} (u_{i}^{λ_{3}} - {(1 - u_{i})}^{λ_{4}}, i = 1, ..., m,$ where $u_{i}$ a uniform random number is. The significance level considered is $α = 0.05,$ with sample sizes n=30, 50, and 100. Our simulation is based on 1,000 simulated samples. It is clear that 95% and 99% confidence intervals of the true probability of type I error under the null hypothesis with $α = 0.05$ are (0.0457, 0.0543) and (0.0435, 0.0575) respectively. Note that in the below tables, the values of skewnees $(α_{3})$ and kurtosis $(α_{4})$ are from McWilliams.⁴

Table 1a first showcases the estimated probability of type I error. Our test is an asymptotic test with a slight bias in $Δ$ estimation and in the variance estimation for a small sample size. For sample sizes more than 30, the test seems to have an estimated probability of type I error close to the nominal value 0.05. Table 1a and Table 1b show that using ${\hat{Δ}}_{E R S S}$ based test is more powerful than McWilliams⁴ and Baklizi.⁷ In all cases, our proposed procedure is even more efficient than the tests of symmetry proposed by Modarres and Gastwirth,⁸ Modarres and Gastwirth⁹ and Samawi et al.¹² In all of the tests within the comparison, the power of all tests of symmetry increases as the sample size increases. Finally the power of ${\hat{Δ}}_{E R S S}$ based tests increases as the set size r increases.

Illustration using noninvasive measurement of cardiac output by electrical velocimetry in neonate data

The samples selected in our illustration is from a study designed to evaluate the effectiveness of a new technology, Electrical Velocimetry (E.V.) for a non-invasive cardiac output (CO) and stroke volume (SV) in neonates.³⁷ One of the research questions is whether the CO measure is the same for low birth-weight infants and non-low birth-weight infants. Low birth weight is defined as less than 1.5 kg. Thus we compared CO for neonates with birth-weight less than 1.5kg to neonates with birth-weight greater or equal to 1.5kg.

As it is frequently the case for this type of study, the underlying distribution is assumed “normal”, or at least symmetric. In either case, a test of symmetry is almost never considered in determining how to proceed in the analysis. Based on the conclusions of a test of symmetry, the analyst can choose the most powerful test for location. However, before deciding on the test procedure, we need to check the assumption of symmetry of underlying distribution of CO for the premature and term infants, with birth weight less than 1.5 gk and with birth weight greater or equal to 1.5 kg.

Distribution	n	$R^{*}$	${\hat{Δ}}_{S R S}$ Samawi et al.¹²	$M_{0.25}$	$W_{0.80}$	$Δ^E R S S r=2$	$Δ^E R S S r=3$	$Δ^E R S S r=4$	$Δ^E R S S r=5$
(1) $\begin{array}{l} λ_{1} = 0, λ_{2} = 0.197454, \\ λ_{3} = 0.134915, \\ λ_{4} = 0.134915, \\ α_{3} = 0, α_{4} = 3.0 \end{array}$	30	0.047	0.069	0.053	0.055	0.046	0.056	0.056	0.044
	50	0.050	0.054	0.048	0.048	0.054	0.056	0.049	0.049
	100	0.064	0.053	0.050	0.052	0.048	0.053	0.046	0.051
(2) $\begin{array}{l} λ_{1} = 0, λ_{2} = 1, \\ λ_{3} = 1.4, λ_{4} = 0.25 \\ α_{3} =0.5, α_{4} = 2.2 \end{array}$	30	0.297	0.495	0.583	0.656	0.751	0.906	0.973	0.999
	50	0.476	0.836	0.846	0.949	0.997	1.000	1.000	1.000
	100	0.776	0.999	0.990	0.999	1.000	1.000	1.000	1.000
(3) $\begin{array}{l} λ_{1} = 0, λ_{2} = 1, \\ λ_{3} = 0.00007, \\ λ_{4} = 0.1, \\ α_{3} = 1.5, α_{4} = 5.8 \end{array}$	30	0.438	0.852	0.761	0.762	0.960	0.975	0.999	0.999
	50	0.683	0.966	0.950	0.992	1.000	1.000	1.000	1.000
	100	0.927	1.000	0.999	1.000	1.000	1.000	1.000	1.000
(4) $\begin{array}{l} λ_{1} = 3.586508, \\ λ_{2} = 0.04306, \\ λ_{3} = 0.025213, \\ λ_{4} = 0.094029 \\ α_{3} = 0.9, α_{4} = 4.2 \end{array}$	30	0.117	0.375	0.172	0.280	0.384	0.398	0.413	0.482
	50	0.131	0.512	0.251	0.544	0.689	0.706	0.766	0.808
	100	0.223	0.767	0.414	0.883	0.929	0.940	0.958	0.985
(5) $\begin{array}{l} λ_{1} = 0, λ_{2} = - 1, \\ λ_{3} = - 0.0075, \\ λ_{4} = - 0.03, \\ α_{3} = 1.5, α_{4} = 7.5 \end{array}$	30	0.145	0.459	0.234	0.407	0.484	0.569	0.616	0.716
	50	0.192	0.580	0.356	0.736	0.832	0.846	0.889	0.921
	100	0.338	0.846	0.588	0.972	0.983	0.985	0.991	0.997

Table 1A Probability of Type I Error under the Null Hypothesis. ( $α = 0.05$ )

Table 2a consists of two selected SRS samples of CO for neonates with birth weight less than 1.5 gk and neonates with birth weight more than 1.5 kg. Also, Table 2a consists of two selected ERSS samples of CO for both neonates with birth weight less than 1.5 kg and neonates with birth weight more than or equal to1.5 kg. Since CO measure and birth weight are positively correlated, the ranking was performed based on birth weights. ERSS samples in Table 2a consist of first half as the first order statistics and the second half as the third order statistics (the maximum).

Table 2b has the results of the runs and overlap tests of symmetry for the underlying distribution for CO patients. From all samples, we reject the assumption that the underlying distribution is symmetric. Table 2b shows the results of the Mann-Whitney test for two-independent samples. Table 2c shows that there is a significant difference on average in the CO measures between the low birth weight neonates (less than 1.5kg) and the non-low-birth-weight neonates (greater than or equal to 1.5kg).

Based on our simulation and real data example, the proposed test of symmetry based on ERSS sample overlap measure, appears to outperform the other tests of symmetry in the literature in terms of power. Our test is more sensitive to detect a slight asymmetry in the underlying distribution than other tests proposed in the literature. Drawing an ERSS is easier than the ordinary RSS and other RSS variations. Also, the kernel density estimation literature is very rich and many of the proposed and the improved methods are available on statistical software, such as SAS™, S-plus, Stata and R. Since overlap measures can be used in multivariate cases as well as in univariate cases, our proposed test of symmetry can be extended to multivariate cases for diagonal symmetry, conditional symmetry and other types of symmetry. In addition, our test procedure and kernel density estimation are valid under large sample size (n>30) and some regular conditions such as light tail underlying distribution functions. However, our simulation indicates that our procedure is still perform better than other test even for a sample size n=30 or larger and different underlying distributions.

Case #	n	$R^{*}$	${\hat{Δ}}_{S R S}$	$M_{0.25}$	$W_{0.80}$	$Δ^E R S S r=2$	$Δ^E R S S r=3$	$Δ^E R S S r=4$	$Δ^E R S S r=5$
(6) $\begin{array}{l} λ_{1} = - 0.116734, \\ λ_{2} = - 0.351663, \\ λ_{3} = - 0.13, λ_{4} = - 0.16, \\ α_{3} = 0.8, α_{4} = 11.4 \end{array}$	30	0.050	0.155	0.055	0.068	0.253	0.304	0.327	0.431
	50	0.056	0.166	0.060	0.077	0.500	0.717	0.739	0.747
	100	0.051	0.207	0.068	0.130	0.793	0.875	0.911	0.942
(7) $\begin{array}{l} λ_{1} = 0, λ_{2} = - 1, \\ λ_{3} = - 0.1, \\ λ_{4} = - 0.18, \\ α_{3} = 2.0, α_{4} = 21.2 \end{array}$	30	0.090	0.196	0.096	0.166	0.325	0.410	0.458	0.520
	50	0.097	0.236	0.125	0.284	0.656	0.667	0.758	0.835
	100	0.124	0.354	0.176	0.589	0.891	0.946	0.963	0.974
(8) $\begin{array}{l} λ_{1} = 0, λ_{2} = - 1, \\ λ_{3} = - 0.001, \\ λ_{4} = - 0.13, \\ α_{3} = 3.16, α_{4} = 23.8 \end{array}$	30	0.534	1.000	0.830	0.806	1.000	1.000	1.000	1.000
	50	0.744	1.000	0.972	0.995	1.000	1.000	1.000	1.000
	100	0.972	1.000	1.000	1.000	1.000	1.000	1.000	1.000
(9) $\begin{array}{l} λ_{1} = 0, λ_{2} = - 1, \\ λ_{3} = - 0.0001, \\ λ_{4} = - 0.17 \\ α_{3} = 3.88, α_{4} = 40.7 \end{array}$	30	0.560	1.000	0.865	0.808	1.000	1.000	1.000	1.000
	50	0.816	1.000	0.985	0.997	1.000	1.000	1.000	1.000
	100	0.976	1.000	1.000	1.000	1.000	1.000	1.000	1.000

Table 1B (continue) Power of Overlap based test and Run Tests under Alternative Hypotheses. ( $α = 0.05$ )
*Results are taken from Modarres and Gastwirth (1996) and Modarres and Gastwirth (1998) respectively

SRS		ERSS
Birth-weight <1.5	Birth-weight 1.5	Birth-weight <1.5	Birth-weight 1.5
0.05	0.21	0.050	0.100
0.06	0.16	0.050	0.120
0.06	0.12	0.050	0.132
0.05	0.20	0.060	0.220
0.05	0.13	0.060	0.150
0.06	0.22	0.060	0.228
0.06	0.22	0.090	0.178
0.08	0.15	0.080	0.182
0.10	0.22	0.080	0.215
0.08	0.20	0.080	0.158
0.08	0.23	0.070	0.155
0.08	0.25	0.070	0.218
0.07	0.23	0.080	0.208
0.11	0.16	0.128	0.220
0.08	0.22	0.128	0.350
0.13	0.16	0.208	0.350
0.16	0.22	0.165	0.360
0.12	0.33	0.162	0.420
0.18	0.26	0.188	0.270
0.16	0.31	0.188	0.440
0.16	0.29	0.262	0.510
0.19	0.27	0.372	0.510
0.37	0.51	0.358	0.520
0.16	0.51	0.222	0.540
0.18	0.52	0.202	0.520
0.11	0.52	0.182	0.520
	0.53		0.530
	0.73		0.730

Table 2A Selected samples of CO data

	CO Measure	N	Runs test	P-value	OVL test	P-value
SRS	Birth-weight <1.5	26	6	0.003	-6.414	<0.00001
SRS	Birth-weight 1.5	28	4	<0.00001	-4.356	<0.00001
ERSS	Birth-weight <1.5	26	2	<0.00001	-14.540	<0.00001
ERSS	Birth-weight 1.5	28	2	<0.00001	-4.729	<0.00001

Table 2B Runs test of symmetry with summary statistics

Table 2B has the results of the runs and overlap tests of symmetry for the underlying distribution for CO patients. From all samples, we reject the assumption that the underlying distribution is symmetric

	Mann-Whitney Utest (difference of medians of CO between<1.5 kg and1.5kg weight)
Sample Type	Test	P-value
SRS	59.5	<0.00001
ERSS	118	<0.00001

Table 2C Mann-Whitney test for two-Independent Samples

Shows that there is a significant difference on average in the CO measures between the low birth weight neonates (less than 1.5kg) and the non-low-birth-weight neonates (greater than or equal to 1.5kg)