Solutions to Estimation Exercises

=20

January = 2020

=20

Exercise 1 (optional)

This exercise is for those wishing to understand stratified = estimation a bit=20 better. It is to be calculated by hand (or using R, a spreadsheet or a=20 calculator). Note: You might have to round up or down. Consider this = (very=20 small) population:

=20

Each stratum has 12 units, i.e. \(N =3D 48, N_1 =3D N_2 =3D N_3 = =3D N_4 =3D=20 12\)

And we have that:

\(n_1 =3D 6\), \(n_2 =3D 4\), \(n_3 =3D 3\), \(n_2 =3D 2\),

The units drawn satisfy (for ease of computation):

\(y_1 =3D y_{1_1} =3Dy_{1_2} =3D y_{1_3} =3D y_{1_4} =3D = y_{1_5}=3Dy_{1_6} =3D 4\), \(y_2 =3D=20 y_{2_1} =3Dy_{2_2} =3D y_{2_3} =3D y_{2_4} =3D 6\),
\(y_3 =3D y_{3_1} = =3Dy_{3_2} =3D=20 y_{3_3} =3D 8\),
\(y_4 =3D y_{4_1} =3Dy_{4_2} =3D 12\).

Calculate an estimate of the population total \(\hat{t}_y =3D \sum_{i = =3D 1}^4=20 \sum_{j=3D1}^{n_i} d_i y_{i_j}\)

We have that:

\(d_1 =3D \frac{N_1}{n_1} =3D \frac{12}{6} =3D 2\), \(d_2 =3D = \frac{N_2}{n_2} =3D=20 \frac{12}{4} =3D 3\),

\(d_3 =3D \frac{N_3}{n_3} =3D \frac{12}{3} =3D 4\), \(d_4 =3D = \frac{N_4}{n_4} =3D=20 \frac{12}{2} =3D 6\),

Thus,

\(\hat{t}_y =3D \sum_{i =3D 1}^4 \sum_{j=3D1}^{n_i} d_i y_{i_j} = =3D\)

\(\sum_{j=3D1}^6 2 \cdot 4 + \sum_{k=3D1}^4 3\cdot 6 + \sum_{l =3D = 1}^3 4 \cdot 8 +=20 \sum_{m=3D1}^2 6 \cdot 12 =3D\)

\((6 \cdot 2 \cdot 4) + (4 \cdot 3 \cdot 6) + (3 \cdot 4 \cdot 8) + = (2 \cdot=20 6 \cdot 12) =3D 360\)

Exercise 2

Use the dataset mydata_estimation.

Given your proportional sample from last session, calculate a direct = and a=20 calibrated estimate of the number of emnployees in 2020. Use Nr_empl_reg = for=20 calibration.

Dont=E2=80=99 forget to check your \(g_i\)=E2=80=99s.

What is the ratio of the standard error of the proportional estimator = to the=20 standard error of the calibrated = estimator?

Solution to exercise 2

First, load package survey and your data:

library(survey)

load(file=3D"mydata_estimation.Rda")=0A=
=0A=
load(file=3D"mysampprop.Rda")=0A=
=0A=
load(file=3D"mysamopt.Rda")

Make them into survey objects:

mysampprop$dwgt <- 1/mysampprop$Prob=0A=
=0A=
=0A=
srs.design.prop <- svydesign(ids=3D~1,=0A=
                        strata=3D~stratum,=0A=
                        weights=3D~dwgt,=0A=
                        fpc =3D ~Prob,=0A=
                        data=3Dmysampprop)=0A=
=0A=
=0A=
mysampopt$dwgt <- 1/mysampopt$Prob=0A=
=0A=
=0A=
srs.design.opt <- svydesign(ids=3D~1,=0A=
                        strata=3D~stratum,=0A=
                        weights=3D~dwgt,=0A=
                        fpc =3D ~Prob,=0A=
                        =
data=3Dmysampopt)

Direct estimate:

(emp20.hat <- =
svytotal(~Nr_empl_2020, srs.design.prop))

##                total    SE=0A=
## Nr_empl_2020 2605997 43373

Calibrated estimate:

(cgoal <- =
c('(Intercept)'=3Dnrow(mydata_estimation), =
'Nr_empl_reg'=3Dsum(mydata_estimation$Nr_empl_reg)))

## (Intercept) Nr_empl_reg =0A=
##       10000     2627857

cal.svyobj <- calibrate =
(srs.design.prop,=0A=
                         formula=3D~Nr_empl_reg,=0A=
                         population=3Dcgoal)=0A=
=0A=
(empl20.cal <- svytotal (~Nr_empl_2020, cal.svyobj))

##                total    SE=0A=
## Nr_empl_2020 2619382 12389

mysampprop$cw <- weights(cal.svyobj)=0A=
=0A=
mysampprop$gw <- mysampprop$cw / =
mysampprop$dw

Check your \(g_i\)=E2=80=99s (they look OK):

hist(mysampprop$gw, col =3D "red", =
xlim =3Dc(0.85, 1.06))

The ratio is

SE(emp20.hat)/SE(empl20.cal)

##              Nr_empl_2020=0A=
## Nr_empl_2020     3.500841

Solution to exercise 3

Repeat exercise 2 for the optimal sample.

In which case do you gain most from calibrating (i.e. in which = case -=20 proportional or optimal - do you have the largest=20 ratio)?

Solution to ecercise 3

We already made a survey object, so all we have to do is:

Direct estimate:

(emp20.opt <- =
svytotal(~Nr_empl_2020, srs.design.opt))

##                total    SE=0A=
## Nr_empl_2020 2617548 12816

Calibrated estimate:

(cgoal <- =
c('(Intercept)'=3Dnrow(mydata_estimation), =0A=
            'Nr_empl_reg'=3Dsum(mydata_estimation$Nr_empl_reg)))

## (Intercept) Nr_empl_reg =0A=
##       10000     2627857

cal.svyobj.opt <- calibrate =
(srs.design.opt,=0A=
                         formula=3D~Nr_empl_reg,=0A=
                         population=3Dcgoal)=0A=
=0A=
(empl20.cal.opt <- svytotal (~Nr_empl_2020, cal.svyobj.opt))

##                total     SE=0A=
## Nr_empl_2020 2618897 6968.6

mysampopt$cw <- =
weights(cal.svyobj.opt)=0A=
=0A=
mysampopt$gw <- mysampopt$cw / =
mysampopt$dw

Check your \(g_i\)=E2=80=99s (they look OK):

hist(mysampopt$gw, col =3D "red", xlim =
=3Dc(0.98, 1.05))

The ratio is

SE(emp20.opt)/SE(empl20.cal.opt)

##              Nr_empl_2020=0A=
## Nr_empl_2020     1.839129

Since the ratio of the standard errors for the proportional sample = was=20 \(3.577605\), we gained most in terms of improving the standard error = from=20 calibrating the proportional = sample.

Exercise 4

Calculate the means of Nr_empl_reg (corresponding to number of = employees in=20 2019), Nr_empl_2020, turnover_2019 and turnover_2020.

Hint: The function svymean(~x, my_syvobject) will give you = the mean=20 of variable x from the survey object my_svyobject)

Which mean has changed most percentage-wise from 2019 to=20 2020?

Solution to ecercise 4

First the means:

(empl19_m <- svymean(~Nr_empl_reg, =
srs.design.opt))

##               mean     SE=0A=
## Nr_empl_reg 262.64 1.2985

(empl20_m <- svymean(~Nr_empl_2020, =
srs.design.opt))

##                mean     SE=0A=
## Nr_empl_2020 261.75 1.2816

(to19_m <- svymean(~turnover_2019, =
srs.design.opt))

##                  mean     SE=0A=
## turnover_2019 1312889 6988.4

(to20_m <- svymean(~turnover_2020, =
srs.design.opt))

##                  mean     SE=0A=
## turnover_2020 1312616 7087.7

Then the changes:

(perc_empl <- abs(1 - =
empl19_m[1]/empl20_m[1])*100)

## Nr_empl_reg =0A=
##   0.3392753

(perc_to <- abs(1 - =
to19_m[1]/to20_m[1])*100)

## turnover_2019 =0A=
##    0.02081124

The average number of employees has changed more then the average = turnover=20 percentage-wise.