Tải bản đầy đủ
1 Make no adjustments, publish ‘value unknown’

1 Make no adjustments, publish ‘value unknown’

Tải bản đầy đủ

ESTIMATION METHODS – MISSING VALUES

211

Comparing small areas
The Swedish Employment Register contains data on gainfully employed persons,
with industrial classification for the local unit where a person is employed, as well
as the person’s highest level of completed education. These variables are imported
from the Business Register and the Education Register. Both sources contain
missing values.
According to the data in Chart 12.1, missing values regarding educational level
are 1.7%, and missing values regarding industry are 79/5647 = 1.4%. Of the population in the Employment Register regarding the entire population aged 16–64,
(79 + 93 – 2)/5647 = 3% lack values for industrial classification and/or educational
level.
Chart 12.1
Population aged 16–64 by educational level and industrial classification year t
000s persons

Not Total
empl pop.

Employed within Industry …

Employed within Industry …

Not Total
empl pop.

A–F
G–K
L–Q Industry
prod of private
public
ungoods services services known
<9 yrs, Comp. school 9 yrs
273
273
141
17

611 1315

A–F G–K L–Q Industry
unknown
%
%
%
%
%
%
24.7 18.9 9.7
21.8 39.3 23.3

Upper secondary 2 yrs

403

404

416

22

324 1570

36.6 27.9 28.5

28.5 20.8 27.8

Upper secondary 3 yrs

229

369

217

15

252 1081

20.7 25.5 14.9

18.7 16.2 19.1

University < 3 yrs

107

188

273

11

177

757

9.7 13.0 18.7

13.6 11.4 13.4

University t 3 yrs

86

204

410

12

119

830

7.8 14.1 28.0

15.4

5

10

4

2

73

93

0.4

0.7

0.3

2.0

4.7

1.7

1103

1448

1462

79 1556 5647

100

100

100

100

100

100

Education unknown
Entire pop. aged 16–64

7.7 14.7

Chart 12.1 illustrates the way in which register-based statistics with missing values
have traditionally been reported by Statistics Sweden. There are several different
reasons for this kind of reporting: the missing value rate is considered small and it
is judged to be too complicated to adjust. Our opinion is that the missing value rate
is not small at all; that it is quite possible to adjust, and that it is the responsibility
of the statistical office to adjust for the effects of missing values.
In the two-way table above, there are missing values in both spanning variables.
Even if the total missing value rate only amounts to 3%, this results in a table that
is difficult to interpret. The patterns are disturbed by rows and columns with missing values in the table. By adjusting for missing values, the tables will be easier to
interpret by users of the statistics.
However, the size of this missing value rate varies when different municipalities
or other small categories are compared. Chart 12.2 shows the highest and lowest
shares of missing values for the country’s municipalities.
Chart 12.2 Employment Register year t and t+1 – lowest and highest values of missing values rates for population aged 16–64 in Sweden’s municipalities
Missing values:
Lowest year t Highest year t Lowest year t+1 Highest year t+1
Education unknown
0.3%
3.7%
0.4%
4.6%
Industry unknown
0.5%
6.0%
0.4%
3.7%
Both education and industry unknown
1.0%
7.0%
1.0%
7.5%

212

ESTIMATION METHODS – MISSING VALUES

Even if the missing value rate is not large at the national level, it can be significant
at the municipality level, and it can vary between years. Chart 12.2 confirms this,
which means that missing values make comparisons between municipalities and
other small categories difficult. Many users will forget the category ‘industry
unknown’ when they compare different municipalities with regard to, for example,
the size of the service sector.
How can a municipality with a 1% missing value rate be compared to another
municipality where the missing value rate is 7%? To make such comparisons,
adjustments must be made for missing values; otherwise the level estimates for
different municipalities will not be comparable. In Chart 12.3, we have adjusted for
missing values shown in Chart 12.1. Those with ‘education unknown’ were proportionately allocated across the educational levels in the same column, and then those
with ‘industry unknown’ were proportionately allocated across the three industries
on the same row.
Chart 12.3 Population aged 16–64 by educational level and industrial classification year t
Estimates adjusted for missing values
000s persons

Employed within industry

Not
empl

Total
pop.

A–F
G–K
L–Q
prod of private
public
goods services services

Employed within
industry
A–F
G–K

Not
empl

Total
pop.

L–Q

%

%

%

%

%

<9 yrs, Comp.school 9 yrs

281

282

145

641

1337

25.0

19.1

9.8

41.2

23.7

Upper secondary 2 yrs

413

415

425

340

1597

35.7

28.1

28.6

21.8

28.3

Upper secondary 3 yrs

234

378

222

264

1100

20.8

25.6

14.9

17.0

19.5

University < 3 yrs

110

193

279

186

770

9.8

13.1

18.8

12.0

13.6

University t 3 yrs

88

209

418

125

844

7.8

14.1

28.1

8.0

14.9

1125

1476

1490

1556

5647

100.0

100.0

100.0

100.0

100.0

Entire population aged 16–64

The table adjusted for missing values is easier to interpret, and the corresponding
tables regarding different municipalities can also be compared. The argument that
the missing value rate is small at an overall level does not justify the practice of not
adjusting, as the missing value rate can differ substantially between different small
areas.
Missing values vary over time
Attention should be paid to missing values within register-based statistics, and
these should be adjusted in a way corresponding to that for sample surveys. For
register surveys, missing values are usually not adjusted and are shown as a missing value instead. If the extent of the missing value rate varies over time and if no
corrections are made, the comparability over time will have low quality.
Example: Missing values in the Patient Register
The Patient Register kept by the Swedish National Board of Health and Welfare
contains data on those who have received care in hospitals. The diagnoses should
be registered, but missing values for this variable result in an underestimation of
the number of patients with a particular diagnosis. Missing value rates can vary
strongly between years and from region to region, depending on how well the
administrative systems in different hospitals function.

ESTIMATION METHODS – MISSING VALUES

213

Chart 12.4 shows the difference between a time series that has not been adjusted
for missing values in the variable ‘diagnosis’ and a time series that has been adjusted.
Chart 12.4 Falling accidents among boys aged 0–12 years in Norrbotten County
Number of accidents per 1 000 boys per year
8

6

4

Adjusted

2

Unadjusted

The time series pattern becomes incorrect when the
missing value rate varies over
time.
Comparisons of uncorrected
values for different regions are
misleading if the regions have
different missing value rates.

0
87 88 89 90 91 92 93 94

95 96 97 98 99 00

Example: Missing values in the Swedish Education Register
Missing value rates in the Education Register have varied greatly over time, and
these variations produce apparent changes in the various time series as seen in
Chart 12.5. Adjustments should be made for the effects of the missing values to
give users of these register-based statistics a correct picture of the time series
patterns.
Chart 12.5 Effects of missing values on time series from the Education Register
Population aged 16–74 by educational level 1985–2000
Per cent
35

30

Less than 9 y rs
25

Comp 9 y rs
Upper 2nd 2y rs

20

Upper 2nd 3 y rs
Univ ersity < 3 y rs

15

Univ ersity >= 3 y rs
Postgrad

10

Education unknow n
5

0
85

90

95

00

Between 1989 and 1990, missing
value rates decreased from 5.7%
to 1.4%. This was due to the data
that were collected for the Population Census 1990.
All series apart from compulsory school and postgraduate
education increased between 1989
and 1990, increases which are to a
large extent apparent.
There is also a time series level
shift between 1999 and 2000 due
to changed educational classifications and the addition of new
sources.

ESTIMATION METHODS – MISSING VALUES

214

12.2 Adjustment for missing values using weights
Which methods can be used to adjust for missing values? We show the simplest
method, straight expansion, for fictitious data from the Education Register, in order
to illustrate the principle. A better adjustment could be made for the Education
Register, if consideration were given to other variables such as age and sex. But
even this simple adjustment is better than none at all.
The weights in the register are adjusted or calibrated in accordance with the notation and methods given in Section 11.3. Straight expansion means that only one
calibration condition is used: the total number of observations shall be the number
of observations including those with missing values.
The adjustment is made for the Education Register in Chart 12.6, where the original weight di = 1 (0 for the observations with missing value).
If the population consists of 6 386 015 persons, and the number of missing values
is 106 051 and there are data for 6 279 964 persons, then the adjustment factor gi
would be 6 386 015/6 279 964 = 1.01689.
Chart 12.6 Adjustment for missing values in the Education Register with weights
Person
PIN1
PIN2
PIN3
PIN4
PIN5
PIN6
PIN7
PIN8
PIN9

PIN6386015
Total:

Sex
M
F
M
M
F
F
M
M
F

M

Age
18
72
33
62
71
26
54
67
39

53

Educational level
Compulsory school 9 yrs
Less than 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
Missing value
University t 3 yrs
Postgraduate
Missing value
Less than 9 yrs

University < 3 yrs

di
1
1
1
1
0
1
1
0
1

1
6 279 964

di gi = wi
1.01689
1.01689
1.01689
1.01689
0
1.01689
1.01689
0
1.01689

1.01689
6 386 015

The weights wi are used to calculate estimates, which are adjusted for missing
values. The unadjusted estimates and the adjusted estimates are compared in
Charts 12.7 and 12.8.
Chart 12.7 Unadjusted table
Education Register 2001
Educational level
Less than 9 yrs
Comp. school 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
University < 3 yrs
University t3 yrs
Postgraduate
Education unknown
Population aged 16–74

000s
755
939
1 747
1 142
802
848
48

Chart 12.8 Adjusted for missing values
% of pop.

Education Register 2001
Educational level

000s

% of pop.

11.8
14.7
27.4
17.9
12.6
13.3
0.7

Less than 9 yrs
Comp. school 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
University < 3 yrs
University t 3 yrs
Postgraduate

767
954
1 776
1 162
816
862
48

12.0
14.9
27.8
18.2
12.8
13.5
0.8

106

1.7

Population aged 16–74

6 386

100.0

6 386

100.0

Adjusting weights in a data matrix means that all estimates are consistently adjusted and comparability over time is improved, which is clear from Chart 12.9. Where
the missing value rate varies between years, the adjusted weights would be different for different years.

ESTIMATION METHODS – MISSING VALUES

215

Chart 12.9 Population aged 16–74 by educational level 1985–2000
B. Series adjusted for missing values
A. Unadjusted series
Per cent
35

30

Less than 9 y rs
25

Comp 9 y rs
Upper 2nd 2y rs

20

Upper 2nd 3 y rs
Univ ersity < 3 y rs

15

Univ ersity >= 3 y rs
10

Postgrad
Education unknow n

5

0
85

90

95

00

85

90

95

00

12.3 Adjustment for missing values by imputation
Another way to adjust for the effects of missing values is to form imputed values
when variable values are missing. The missing values are then replaced by synthetic values. There are two different ways of forming such values:
– The value is formed randomly using one or more probability distributions. This
method applies to qualitative variables.
– The value is formed using a (deterministic) model in the same way as with
derived variables, as described in Section 8.2.3.
The advantage of imputation of variable values is that it avoids the need to calculate with weights, and the distributions of all other variables in the register remain
unchanged.
Imputed values for qualitative variables formed randomly
Chart 12.10 shows how to form imputed values for the variable educational level.
According to Swedish law, the value for persons PIN5 and PIN8 should not be
imputed as we are dealing with a register of persons. Instead, synthetic observations that do not have personal identification numbers should be formed.
The imputation, which corresponds to straight expansion, is carried out as follows: the observations with missing values at the educational level are used to form
the same number of synthetic observations. These synthetic observations obtain
values for the educational level variable completely at random. These randomly
chosen educational levels have the same distribution as among those for which data
on educational level are known. The register is increased with random numbers,
and then a data matrix is created without personal identification numbers. These
random numbers are values for a technical variable used internally.

ESTIMATION METHODS – MISSING VALUES

216

Chart 12.10 Adjustment for missing values in the Education Register with imputation
A. Actual register year t
B. Data matrix for analysis year t
Sex Age
PIN1
PIN2
PIN3
PIN4

M
F
M
M

18
72
33
62

PIN5
PIN6
PIN7

F
F
M

71
26
54

PIN8
PIN9

PIN6386015

M
F

M

67
39

53

Educational level
Compulsory school 9 yrs
Less than 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
Missing value

Random
number
0.7771
0.3168
0.3096
0.8667
0.1749

University t 3 yrs
Postgraduate
Missing value

0.4114
0.1605
0.5536

Less than 9 yrs

University < 3 yrs

0.5513

0.7828

Sex Age
M
F
M
M

18
72
33
62

F
F
M

71
26
54

M
F

M

67
39

53

Educational level
Compulsory school 9 yrs
Less than 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
Compulsory school 9 yrs

Educational
level
imputed
No
No
No
No

University t 3 yrs
Postgraduate
Upper secondary 3 yrs

Yes
No
No
Yes
No

No

Less than 9 yrs

University < 3 yrs

C. Probability distribution based on frequency table in Chart 12.8
Educational level

Share of population

Accumulated share

Less than 9 yrs

0.120

0.120

Compulsory school 9 yrs

0.149

0.269

Upper secondary 2 yrs

0.278

0.547

Upper secondary 3 yrs

0.182

0.729

University < 3 yrs

0.128

0.857

University t 3 yrs
Postgraduate

0.135

0.992

0.008

1.000

Population aged 16–74

1.000

Random numbers in the register are uniformly distributed
between 0 and 1. Persons with
a random number between 0
and 0.120 are given the level
less than 9 years, and those
with a random number between
0.120 and 0.269 are given the
level compulsory school 9
years, etc.

By using the relationships between age, sex and educational level, the imputation
can be improved. For different combinations of the age category and sex, different
frequency distributions for educational level are used. Chart 12.11 compares three
such distributions. There are significant differences between these distributions,
which means that it is possible to improve the adjustment for missing values by
using different distributions when the values are randomly distributed for different
combinations of sex and age.
Chart 12.11 Frequency table by age and sex, Education Register year t
Educational level
Less than 9 yrs
Comp. school 9 yrs
Upper secondary 2 yrs
Upper secondary 3 yrs
University < 3 yrs
University t 3 yrs
Postgraduate

Accumulated share
Men aged 65–74

Accumulated share
Women aged 65–74

Accumulated share
Both aged 16–74

0.466
0.507
0.700
0.837
0.901
0.988

0.455
0.532
0.821
0.858
0.918
0.998

0.120
0.269
0.547
0.729
0.857
0.992

1.000

1.000

1.000

In Chart 12.12, the same register is used with the same random numbers as previously. However, the random numbers have been translated here into educational
level by using other frequency tables. Women in the age category 65–74 with a
random number between 0 and 0.455 are given the level less than 9 years. Men in
the age category 65–74 with a random number between 0.507 and 0.700 are given

ESTIMATION METHODS – MISSING VALUES

217

the level upper secondary 2 years. In the same way, younger persons with missing
value are given an imputed value using frequency tables for their age categories
and sex.
Chart 12.12 Adjustment for missing values in the Education Register with imputation
A. Actual register year t
B. Data matrix for analysis year t
Sex

Age

M

18

Random
number
Compulsory school 9 yrs 0.7771

PIN2

F

72

Less than 9 yrs

0.3168

PIN3

M

33

Upper secondary 2 yrs

0.3096

PIN4

M

62

Upper secondary 3 yrs

PIN5

F

71

PIN6

F

PIN7

Educ.level
imputed
No

Sex

Age

Educational level

M

18

Compulsory school 9 yrs

No

F

72

Less than 9 yrs

No

M

33

Upper secondary 2 yrs

0.8667

No

M

62

Upper secondary 3 yrs

Missing value

0.1749

Yes

F

71

Less than 9 yrs

26

University t 3 yrs

0.4114

No

F

26

University t 3 yrs

M

54

Postgraduate

0.1605

No

M

54

Postgraduate

PIN8

M

67

Missing value

0.5536

Yes

M

67

Upper secondary 2 yrs

PIN9


F


39


Less than 9 yrs


0.5513


No


F


39


Less than 9 yrs


PIN6386015

M

53

University < 3 yrs

0.7828

No

M

53

University < 3 yrs

PIN1

Educational level

PIN5, a 71-year-old woman has the educational level less than 9 yrs, which differs
from the imputation in Chart 12.10, where she is given compulsory school 9 yrs.
The imputed level for PIN8 is also changed to a shorter period of education.
When is it appropriate to use randomly imputed values?
The above method is appropriate when describing a qualitative variable with missing values, possibly divided into different categories, such as age, sex and region.
After a high-quality imputation, the levels are more comparable between categories
and over time than if no adjustments had been made for missing values.
If the relationship between a variable y and a qualitative variable x is to be studied, where the x variable has missing values, then randomly imputed values for the
x variable should not be used. For instance, randomly imputed educational levels
would not be appropriate to use when describing the average monthly salaries for
different educational levels. In this case, it would be better to calculate the average
salary only for persons for whom the educational level is known.
Imputed values formed using a deterministic model
The imputation method used in Chart 12.12 above utilises the relationship between
the x variables age and sex, and the y variable educational level. However, the
imputed values are also formed randomly. For a particular combination of age and
sex, educational level is not determined exactly but instead randomly. We give
some examples below of imputation methods where the values of the x variables
determine the imputed values exactly. The models used for this type of imputation
are called deterministic models.
Section 8.2.3 discusses how derived variables can be formed with deterministic
causal models. Imputed variable values can be formed in a similar way. The difference is that derived variable values are calculated for all objects in the data matrix,
while imputed variable values are only formed for those objects that have missing
values.

218

ESTIMATION METHODS – MISSING VALUES

When editing work is carried out, missing values are detected or certain variable
values are found to be implausible and must be rejected. This leads to the calculation of imputed values in close connection with the editing work. The editing case
studies presented in Chapter 9 contain several examples of imputation methods.
When editing the Income Register, it was discovered that social assistance had
not been reported for some municipalities (see Section 9.2.1). For households in
these municipalities, the previous year’s values are therefore imputed. A simple
model, this year’s assistance = previous year’s assistance, is used when imputing.
On a household level, therefore, modelling errors or imputation errors can occur if
the year’s assistance differs from the previous year’s assistance. Attempts should
be made to use models that make imputation errors as small as possible.
A special data collection is advisable when the demands for quality are so high
that imputation errors cannot be accepted. The objects that lack values for an
important variable can then provide the missing values via a questionnaire or
interview. Section 9.2.2 describes the editing of the Income Statement Register.
For all income statements, a local unit identity should be given. When these data
are missing or considered implausible, the employer is contacted.
The editing of enterprise income declarations provides examples of different
types of imputation methods:
– Data on the number of full-year employees are taken from annual reports. If
these data are missing, imputed values are formed by calculating an estimate of
the number of full-year employees by dividing the enterprise’s wage sum by the
average wage per full-year employee in the industry. The average wage for the
industry has previously been calculated using those enterprises for which the
number of full-year employees and wage sums are known.
– The register population in Structural Business Statistics lacks economic variable
values for some enterprises. Data on industry and the number of full-year employees have been imported from the Business Register. For enterprises where
industry, number of full-year employees and economic variables are known, tables are formed with the mean values for the different economic variables, by
industry and number of full-year employees. These tables represent a form of
model, which for given values of industry and number of full-year employees
shows how imputed values should be formed using the calculated mean values.

12.4 Missing values in a system of registers
When different registers are integrated and variables are imported from one register
to other registers, quality flaws such as missing values are also imported into these
other registers.
For example, the industry variable is created in the Business Register and is then
imported into other business registers, activity registers, registers of persons and
into real estate registers. This means that adjustment is not sufficient for missing
values in the industry variable in only one register; the adjustment method must
adjust consistently for missing values of this variable in the whole register system.

ESTIMATION METHODS – MISSING VALUES

219

After trying to reduce the missing values rate by using more sources, and perhaps
also by collecting information from certain categories of objects, the estimates of
register-based statistics should be adjusted for missing values. Here we compare
the two methods of adjustment: using weights or imputing values.
Adjustment for missing values with weights in a system of registers
The Population, Education and Employment Registers relate to the population on
31 December of a particular year. There are no missing values in the Population
Register; the Education Register contains missing values in the educational level
variable; and the Employment Register contains missing values in the educational
level and industrial classification variables.
If each register is adjusted separately for missing values using weights, the
weights for the same person will be different in the three different registers. This is
illustrated in Chart 12.13. Statistics from the three registers will then be inconsistent; for example, the number of 18-year-old men will be different (PIN1 has
different weights in Chart 12.13 parts A, B and C).
If statistics from different registers that relate to the same population are to be
consistent, weights must be calculated jointly, and the same weights must be used
for all the registers. This can be difficult to achieve. Our conclusion is that adjustment for missing values using weights will cause problems for coordination and
consistency within the register system.
Chart 12.13 Adjustment for missing values using weights in a system of registers
A. Population Reg.
Person Sex

B. Education Register

C. Employment Register 16–64 years

Age

di

PIN

Educ. level

d i gi

PIN

Industry

Comp school 9 yrs
Less than 9 yrs
Upper 2nd 2 yrs
Upper 2nd 3 yrs
Missing value

1.01689
1.01689
1.01689
1.01689
0

DM
Missing

University t 3 yrs
Postgraduate
Missing value

1.01689
1.01689
0

PIN1
PIN3
PIN4
PIN6
PIN7
-

1.01689


PIN9


PIN1
PIN2
PIN3
PIN4
PIN5
PIN6
PIN7
PIN8

M
F
M
M
F
F
M
M

18
72
33
62
71
26
54
67

1
1
1
1
1
1
1
1

PIN1
PIN2
PIN3
PIN4
PIN5
PIN6
PIN7
PIN8

PIN9


F


39


1


PIN9 Less than 9 yrs



Educ. level

di g i

DK
-

Comp school 9 yrs 1.02930
0
Upper 2nd 2 yrs
Upper 2nd 3 yrs
1.02183
-

DB
DK
-

University t 3 yrs
Postgraduate
-

1.02326
1.02326
-

DM


Less than 9 yrs


1.02930


Note: Three persons, PIN2, PIN5 and PIN8, are not gainfully employed according
to the Employment Register, and they are not 16–64 years old. The weights di gi in
Chart 12.13B are the same as in Chart 12.6 and the weights di gi in C have been
calculated by comparing the number of persons in different cells in Chart 12.1 and
12.3. For example, 281/273=1.02930.
Adjustment for missing values using imputation in a system of registers
If different registers in the system are adjusted for missing values using imputation
as described in Section 12.4, the statistics from different registers could be completely consistent. At the same time as a variable is imported, the random numbers
(or imputed values) used in the original register are also imported. Imputations can
then be made which are consistent between the different registers.

ESTIMATION METHODS – MISSING VALUES

220

The example below shows how it is possible to import educational level from the
Education Register and industrial classification from the Business Register to the
Employment Register. Missing values in all three of these registers can then be
replaced with the imputed values in a consistent way.
Chart 12.14 Adjustment for missing values in the Education Register using imputation
A. Actual register
B. Data matrix for analysis
Person

Sex

Age

PIN1
PIN2
PIN3
PIN4
PIN5
PIN6
PIN7
PIN8


M
F
M
M
F
F
M
M


18
72
33
62
71
26
54
67


Educational level
Comp school 9 yrs
Less than 9 yrs
Upper 2nd 2 yrs
Upper 2nd 3 yrs
Missing value
University t 3 yrs
Postgraduate
Missing value


Random
number

Sex

Age

0.7771
0.3168
0.3096
0.8667
0.1749
0.4114
0.1605
0.5536


M
F
M
M
F
F
M
M


18
72
33
62
71
26
54
67


Educational level

Educ. level
imputed

Comp school 9 yrs
Less than 9 yrs
Upper 2nd 2 yrs
Upper 2nd 3 yrs
Comp school 9 yrs
University t 3 yrs
Postgraduate
Upper sec 3 yrs


No
No
No
No
Yes
No
No
Yes


Chart 12.15 Adjustment for missing values in the Business Register using imputation
A. Actual register
B. Data matrix for analysis
Enterprise
LegU1
LegU2
LegU3
LegU4
LegU5


Industry
DB
DK
Missing value
DA
DK


Random
number
0.0316
0.6444
0.3978
0.2846
0.2044


Industry
DB
DK
DM
DA
DK


Industry
imputed
No
No
Yes
No
No


Chart 12.16
Adjustment for missing values in the Employment Register with imputation
A. Actual register
B. Data matrix for analysis
Person
PIN1
PIN2
PIN3
PIN4
PIN5
PIN6
PIN7
PIN8


Random
EnterIndustry number
prise
Industry
LegU5
DK
0.2044
LegU3 Missing 0.3978
LegU2
DK
0.6444
LegU1
DB
0.0316
LegU5
DK
0.2044




Random
number
Education
Comp school 9 yrs 0.7771
Less than 9 yrs
0.3168
Upper 2nd 2 yrs
0.3096
Upper 2nd 3 yrs
0.8667
Missing value
0.1749
0.4114
University t 3 yrs
Postgraduate
0.1605
0.5536
Missing value



Educational level

Indu- Industry
Educational level
stry imputed
DK
DM
DK
DB
DK


No
Yes
No
No
No


Comp school 9 yrs
Less than 9 yrs
Upper 2nd 2 yrs
Upper 2nd 3 yrs
Comp school 9 yrs
University t 3 yrs
Postgraduate
Upper 2nd 3 yrs


Educ.
level
imputed
No
No
No
No
Yes
No
No
Yes


12.5 Conclusions
We conclude from this discussion that adjustment for missing values should be
made; adjustments must be coordinated; and imputation is the most appropriate
method for the adjustment of missing values in a register system.
Within the system, the Education and the Business Registers are responsible for
the adjustment of missing values for education and industry, respectively. Other
registers should then use these adjustments.

CHAPTER 13

Estimation Methods –
Coverage Problems
Three issues regarding coverage problems are discussed in this chapter: How can
overcoverage and undercoverage be reduced? How can estimates be adjusted for
overcoverage? How should undercoverage be handled? Weights and the calibration
of weights can be used as supplementary estimation methods, and it may also be
necessary to combine registers with sample surveys.
Coverage problems are often neglected today. If you do not have a register system or do not know how to use a register system, then you may not be aware of
many coverage problems. But once you start using the register system and combine
registers, you will find differences in coverage. This problem is made clear in one
of the first examples in this book, in Section 1.5.5. The general methodology that
should be used is noted in Chart 1.1:
Transformation principle
Administrative registers should be transformed into statistical registers.
All relevant sources should be used and combined during this transformation.

One important aim of the principle to use all relevant sources is to achieve as good
coverage as possible. As the base registers have such a strategic role in the production system, the work of improving the base register’s coverage should have high
priority.

13.1 Reducing overcoverage and undercoverage
What administrative sources should be used and how should they be used to reduce
coverage problems? Earlier in the book, we note that both Statistics Sweden’s
Population Register (Section 7.3.1) and Business Register (Section 7.3.6) have
problems with overcoverage and undercoverage. We will use these two registers
when discussing how coverage problems arise and how coverage can be improved.
13.1.1 Coverage problems in the Population Register
The administrative population register is maintained by, for example, the National
Tax Agency or an authority responsible for national identity cards and voter registration, or the municipalities. This population register should be updated with
Register-based Statistics: Statistical Methods for Administrative Data, Second Edition. Anders Wallgren and Britt Wallgren.
© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.