The variance of the variance of samples from a finite population

Preparing to load PDF file. please wait...

0 of 0
100%
The variance of the variance of samples from a finite population

Transcript Of The variance of the variance of samples from a finite population

The variance of the variance of samples from a finite population
Eungchun Cho, Kentucky State University∗ Moon Jung Cho, Bureau of Labor Statistics
John Eltinge, Bureau of Labor Statistics
Key Words: Sample Variance; Randomization Variance; Polykays; Moments of Finite Population
Abstract A direct derivation of the randomization variance of the sample variance V (x) and related formulae are presented. Examples of the special cases of uniformly distributed population are given.
1 Introduction
Introductory courses in the randomization approach to survey inference generally begin with a relatively parsimonious development based on withoutreplacement selection of simple random samples. For an arbitrary finite population one establishes the unbiasedness of the sample mean x for the corresponding finite population mean; evaluates the randomization variance of the sample mean; and develops as unbiased estimator, V (x) for this randomization variance. One subsequently uses similar developments for related randomized designs, e.g., stratified random sampling and some forms of cluster sampling. In applications of this material to practical problems, it is often important to evaluate V (V (x)), the variance of the variance estimator. For example, some cluster sample designs may be considered problematic if the resulting V (x) is unstable, i.e., has an unreasonably large variance.
This note presents a relatively simple, direct derivation of the randomization variance of V (x) and related quantities. This derivation is pedagogically appealing because it builds directly on the standard whole sample approach used in introductory texts like Cochran [1], and does not require students to work with the more elaborate ”polykay” approach used by Tucky [5], and Wishart [8],
[email protected]
1

2 Functions on Simple Random Samples

Consider a finite population of N numbers A = [a1, a2, . . . , aN ]. Let Ln,A be

the list of all possible samples of n elements selected without replacement from

A.

N

N!

Ln,A = [S1, S2, . . . , Sα] , α = n = n!(N − n)! (1)

One selects a without-replacement simple random sample of size n from A by selecting one element from Ln,A in such a way that each sample Sj has probability 1/α of being selected. Consider a function f on Ln,A, that is, f assigns a each sample S ∈ Ln,A a value f (S). Two prominent examples of f are the sample mean and the sample variance:

1 a(S) = n ai (2)
ai ∈S

1 v(S) =

{ai − a(S)}2

(3)

n−1

ai ∈S

Evaluation of the randomization properties of f (S) for S ∈ Ln,A is conceptu-
ally straightforward. For example, E{f (S)}, the expectated value of f (S), is obtained by computing its arithmetic average taken over the Nn equally likely samples in Ln,A:

1

E{f (S)} = N

f (S)

(4)

n S∈Ln,A

V {f (S)}, the variance of f (S), is defined to be the expectation of the squared deviations [f (S) − E{f (S)}]2 ,

1 V {f (S)} =

[f (S) − E{f (S)}]2

(5)

N

n S∈Ln,A

3 The Variance of the Sample Variance

Routine arguments (e.g., Cochran [1, Theorems 2.1, 2.2 and 2.4]) show

E{a(S)} = A

(6)

E{v(S)} = V (A)

(7)

V {a(S)} = 1 − Nn V (A) (8) n

where a(S) is the mean of the sample S, v(S) is the variance of the sample S,
N
A = ai/N , the mean of A, and
i=1

1N

2

V (A) = N − 1 ai − A (9)

i=1

is the full finite-population analogue of the sample variance v(S). The principal task in this paper is to obtain a relatively simple expression (formula) for the variance of v(S),

1 V {v(S)} =

{v(S) − V (A)}2

(10)

N

n S∈Ln,A

in terms of ai’s in the underlying population. The formula will be useful for estimating the variance of the variance when the straight forward computation by the definition is practically impossible due the combinatorial explosion.

4 The Main Result

The following theorem gives a formula for the variance of the sample variance

under simple random sampling without replacement.

Theorem. Let A = [a1, a2, . . . , aN ] be a list of N numbers, (N ≥ 4). Let Ln,A be the list of all possible samples of n numbers selected without replacement from

A, ( 2 ≤ n ≤ N − 1 ).

Ln,A = [S1, S2, . . . , Sα]

where α =

N n

= N !/n!(N − n)!, Si ⊂ A, and |Si| = n Let Vn be the list of the

sample variance v(Si) for each Si ∈ Ln,A,

Vn = [v(S1), v(S2), . . . , v(Sα)]

Then E [v(S) − E{v(S)}]2 , the variance of the variances of all the samples of A of size n, is given by

N

V {v(S)} = C1 a4i + C2 a3i aj + C3 a2i a2j +

i=1

i=j

i
+ C4

a2i aj ak + C5

aiaj akal

i=j,i=k
j
i
where
N −n C1 = N 2n
N −n C2 = −4 N 2 (N − 1) n

N (N − 1)((n − 1)2 + 2) − n(n − 1)((N − 1)2 + 2)

C3 = 2

N 2 (N − 1)2 n (n − 1)

N (N − 1)(n − 2)(n − 3) − n(n − 1)(N − 2)(N − 3)

C4 = 4

N 2 (N − 1)2 (N − 2) n (n − 1)

N (N − 1)(n − 2)(n − 3) − n(n − 1)(N − 2)(N − 3)

C5 = 24

N 2 (N − 1) (N − 2) (N − 3)2 n (n − 1)

Sketch of Proof. The proof involves determining the coefficients of all the fourth degree terms ai4, ai2aj 2, ai2aj ak, ai3aj , and aiaj akal that appears in the summation. The summations in the formula are such that all like terms are combined thus appear only once. For example, aiajakal appears only for the indices arranged in increasing order i < j < k < l. Recall that α is the number of without-replacement samples S of A of size n and the variance of the variances of the samples S of A of size n is

{v(S) − V (A)}2 α
= S v(S)2 − V (A)2 α
where the sum is taken over all the samples S in Ln,A. For the second term, S∈Ln,A V (A)2. it follows that

V (A)2 =

a4i + 2 a2i a2j

 i=j a3i aj + i=j,i=k a2i aj ak 

i

i
j
N2

− 4 

N 2 (N − 1)

 +

 i
+4

j


N 2 (N − 1)2



Similarly, for the term S v(S)2 it follows

a4i + 2

a2i a2j

ai ∈S

i
 i=j a3i aj + i=j,i=k,j
v(S)2 =

ai,aj ∈S

− 4 ai,aj ∈S

ai,aj ,ak ∈S

+

n2



n2 (n − 1)



  i
a2i a2j + 2

a2i aj ak + 6

i=j,i=k,j
i
 aiajakal 

ai,aj ∈S

ai,aj ,ak ∈S

ai,aj ,ak ,al∈S

+ 4 

n2 (n − 1)2



Each term in
v(S)2 =
S∈Ln,A

S v(S)2 is transformed to give

N −1 n−1

a4i + 2

N −2 n−2

a2i a2j

i i
n2

 4 Nn−−22 i=j a3i aj + 4 Nn−−33 i=j,i=k a2i aj ak 



j
+



n2 (n − 1)



 4 Nn−−22 i
+

j


n2 (n − 1)2



Simplification of the binomial coefficients leads to,

1

v(S)2 =

α

S∈Ln,A

a4i

a2i a2j

a3i aj

i + 2 (n − 1) i
− 4 i=j

nN

n (N − 1) N n (N − 1) N

a2i aj ak

i=j,i=k

− 4 (n − 2)

j
+

n (N − 2) (N − 1) N

a2i a2j

a2i aj ak
i=j,i=k

+ 4 i
n (n − 1) (N − 1) N

n (n − 1) (N − 2) (N − 1) N

aiaj akal

+ 24 (n − 2) (n − 3)

i
n (n − 1) (N − 3) (N − 2) (N − 1) N

Substitution of the expressions of V (A)2 and of v(S)2 into the expression of the variance of the samples S of A leads to the result of the theorem.
The formula for V {v(S)} becomes considerably simpler for the population of a discrete uniform distribution on a finite interval. In this case, A is a finite arithmetic sequence. This occurs, for example, in the important special cases of equal-probability systematic sampling (Cochran [1], Chapter 8). Corollary 1. Let A = [1, 2, . . . , N ], N ≥ 3. Let S, Ln,A and v(S) be as in the theorem above. The variance of the sample variances

N (N + 1)(N − n) (2 nN + 3n + 3N + 3)

V {v(S)} =

(11)

360 n(n − 1)

For a more general arithmetic sequence, Corollary 2. Let A = [a0, a0 + d, . . . , a0 + (N − 1) d], N ≥ 3. Then the variance of v(S),

V {v(S)} = N (N + 1)(N − n) (2 nN + 3n + 3N + 3) d4 360 n(n − 1)

Corollary 3. Let A be a list of numbers uniformly distributed on the interval [1/N, 1], A = N1 , N2 , . . . , NN−1 , NN , N ≥ 3. The variance of v(S),
V {v(S)} = (1 + N1 )(1 − Nn ) 2 n + 3 + 3Nn + N3 (12) 360 n(n − 1)

V {v(S)} approaches 2 n + 3/360 n (n − 1) as N approaches ∞. For simplest two special cases when n = 2 and n = N − 1, we have
V {v(S)} = (1 + N1 )(1 − N2 )(7 + N9 ) , if n = 2 (13) 720
and
V {v(S)} = (1 + N1 )(1 + N2 ) , if n = N − 1 (14) 180
References
1. W. G. Cochran, Sampling Techniques (3rd ed. ), John Wiley, 1977. 2. R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1989. 3. J. W. Tukey, Some sampling simplified, Journal of the American Statistical Association, 45 (1950), 501-519. 4. J. W. Tukey, Keeping moment-like sampling computation simple, The Annals of Mathematical Statistics, 27 (1956), 37-54. 5. J. W. Tukey, Variances of variance components: I. Balanced designs. The Annals of Mathematical Statistics, 27 (1956), 722-736. 6. J. W. Tukey, Variances of variance components: II. Unbalanced single classifications. The Annals of Mathematical Statistics, 28 (1957), 43-56. 7. J. W. Tukey, Variance components: III. The third moment in a balanced single classification. The Annals of Mathematical Statistics, 28 (1957), 378-384. 8. J. Wishart, Moment Coefficients of the k-Statistics in Samples from a Finite Population. Biometrika, 39 (1952), 1-13.
VarianceSamplesSamplePopulationFormula