Tải bản đầy đủ
2 Case studies – editing register data

2 Case studies – editing register data

Tải bản đầy đủ



expertise is further strengthened when documenting the work and the measures
taken to correct data.
Contacts with suppliers have several important effects. Firstly, the staff at the
administrative authority should be informed about how and for what purpose their
data are used at Statistics Sweden. The staff at the authority should have an understanding of the consequences of lack of quality for users of the statistics. Contacts
with the suppliers are also important for subject-matter expertise of the staff at
Statistics Sweden. This is why the staff working with the I&T Register have regular meetings with the National Tax Agency twice a year. These contacts are also
used in the important work of identifying new administrative sources.
The registers within Statistics Sweden’s register system, which are used as
sources for the I&T Register should, in principle, not need to be checked again by
the staff at the I&T Register – the checks should have been carried out on the
primary register.
Step 2 – Final checking of the entire register
In the first step above, all the data from each authority are checked. In the next
step, all variables from all sources are combined in one total register so that the
different sources can be compared through consistency editing. All the derived
variables are then formed. In this way, new consistency checks can be carried out,
i.e. the sum of all variable values from different sources agrees with the sum from
another source. Additional errors can be identified in this way. The total register
consists of around 9 million records with 500 variables.
Example: A subset of four variables was checked by macro-editing with respect to
sums and number of persons with values for these variables. Comparisons were
made with corresponding variables from the previous year. Everything looked
quite normal. A derived variable was created with variables in this subset describing a special kind of income:
Income = Variable1 + Variable2 – Variable3 – Variable4

It was noticed that about 120 000 persons had a negative value regarding this kind
of income. In the previous year, only approximately 1 600 persons had negative
income. After checking, it was found that Variable3 and Variable4 were sums of
monthly values, where the value for April had been counted twice by the administrative authority. An important lesson from this example is that the error was found
through the derived variable. Another lesson is that the work done by an administrative authority when preparing a delivery to the statistical office can generate
errors; close cooperation is necessary to reduce this error source.
Step 3 – Checking estimates
In this step, all important tables are formed using the whole register as the basis.
Estimates are checked and compared with the previous year’s values. In addition, a
number of simulations are carried out using the FASIT model, for the sole purpose
of testing data quality. If, for example, the housing benefit remains unchanged in
the model, then the model should generate model values that agree with the previously produced tables.



9.2.2 Editing work with the Income Statement Register
The register of all income statements is used to calculate both region-specific and
industry-specific wage sums, and is used when the Activity Register and the Employment Register are created. This section gives an account of the editing work
carried out on the definitive income statements, which are received by Statistics
Sweden up to October. The income statements are checked by those responsible for
the Income Statement Register and the edited register is then used as the source for
other registers within the register system.
Checking population definitions
The first step in the editing process involves checking that the number of received
income statements agrees with those sent from the National Tax Agency.
The second step is to create a data matrix with the final income statements according to all the amendments in the consignment. The National Tax Agency does
not change input data – when the data provider (in this case the employer) submits
amendments to it, new records are created equivalent to deletion, amendment or
replacement of previous records. Processing is therefore required in the register to
remove invalid records and to check for duplicate records. The variable values for
around 10 300 records are carried across from the original income statement to the
amendment record, as the amendment record may be incomplete.
The third step in the editing work is to check all identities. As income statements
can contain individual and enterprise identities, both personal identification numbers PIN and organisation numbers BIN should be checked. Around 7 600 personal
identification numbers were incorrect, of which 5 000 could be corrected automatically.
The fourth step involves matching the personal identification numbers in the income statements with those in the Population Register for 31 December, and
matching enterprise identities against the Business Register for March. In both
cases, several non-matching records are found – the Income Statement Register
contains personal identification numbers that are missing in the Population Register
and enterprise identities that are missing in the Business Register.
Checking variable values
In the fifth step, deviation errors are checked using 16 different probability checks.
The relation between earned income and tax is used in several ways; in addition, a
search is made for records with extremely high earned income or tax. Around
5 000 records with extreme values are detected from these checks. These are
checked in a simple way and only a few are checked with the National Tax Agency. After these checking stages, each income statement is accepted, replaced by a
new statement or taken out of the register.
Checking of the most important variable
The most important phase in editing work involves checking that employed persons are linked to the correct local units. This link is crucial for the whole register
system as it makes it possible to report gainfully employed persons by industry
sector and region. Difficulties arise with this link when enterprises have more than



one local unit. Although the employer has a duty to indicate the local unit on every
income statement, this information is often missing and sometimes implausible.
Implausible local unit numbers are identified by comparing the number of employees with corresponding data in the Business Register and with data from the previous year’s version of the Income Statement Register. Plausibility in terms of commuting distance is also considered.
When a local unit is missing, or appears unreasonable, on the income statement
from enterprises with more than one local unit, the employer is contacted via a
special data collection using a register update questionnaire. Those responsible for
the Income Statement Register work together with those responsible for the Business Register to capture changes regarding the local unit’s municipality code and
industrial classification code.
Output editing
The Income Statement Register is used as a source for the Employment Register.
By checking the output from the Employment Register, the quality of the Income
Statement Register is also checked. Detailed tables with employed persons by
industry sector and municipality are assembled and compared with the previous
year’s tables. Deviations are checked and the results of these checks are documented. This documentation is very useful as many users inquire after publishing and
question the results. Where documentation exists, those who are in contact with the
users can respond that ‘we have checked and the results are correct as far as we can
9.2.3 What more can be learned from these examples?
The examples above show that the administrative data received at Statistics Sweden may contain errors that require checking at the micro level. Once these errors
have been detected, they are often easy to correct. The requirements of the checking procedure depend on how the register is to be used. Statistics Sweden’s statistical registers are often used for research. The quality at a micro level needs to be
higher for such advanced analytical needs than when only simple tables are produced, and higher demands are made on the checks. High requirements are primarily made with regard to longitudinal studies.
Subject-matter expertise and contacts with suppliers
An overall conclusion is that subject-matter expertise is of great importance for the
effectiveness of the editing and checks. For surveys with their own data collection,
it is sufficient to be familiar with the survey in question, which is rarely changed.
With register surveys, however, it is necessary to be familiar with the administrative system that generated the data. An administrative system can contain many
complicated variables that are changed often.
The example also shows the importance of cooperation and development of expertise within the working group that receives the administrative registers, and of
having good contacts with the authorities supplying the data. Furthermore, cooperation between different teams working with related registers should be encouraged so that the administrative data are used effectively.



If the staff at the statistical office ‘live with the data at micro level’, the learning
process is ongoing, which leads to better subject-matter expertise. This learning
process is strengthened by close contacts with users.
Additional data collection may be necessary
When a variable in the administrative data is seen to be of insufficient quality to be
used for statistical purposes, it may be necessary to conduct additional data collection to attain a sufficiently high level of quality. One example of this is the editing
work of checking that employed persons are linked to the correct local units in the
Income Statement Register. To achieve sufficient quality, some employers are
contacted via a special data collection using a register update questionnaire.


Editing, quality assurance and survey design 1

Many countries are increasingly using administrative data to produce statistics
describing society. Before any administrative registers were used for statistical
purposes, the production system was based on maps or address lists, and enumerators and interviewers were sent out to interview households and enterprises.
When more and more administrative registers are used, the national statistical
system is gradually changed into a register-based statistical production system.
Sample surveys and traditional censuses are replaced by register surveys that do
not require the collection of statistical data. Sample surveys also become registerbased. The statistical units can be directly sampled from statistical registers – the
sample survey design and the estimation methods are improved as register information can be used.
This transition has consequences for survey design and quality assessment, but
most people at the national statistical institutes may not yet be aware of these
consequences. To understand these consequences it is necessary to fully understand
the requirements and possibilities of the register system that is the basis of almost
all production of statistics after the transition.
The understanding of the role of the register system is today limited – most people at an NSO are fully occupied with their own survey and have little time to study
other surveys and make comparisons between related surveys. Managers and
methodologists may also have a limited understanding of the register system.
9.3.1 Survey design in a register-based production system
When an NSO gains access to microdata from administrative registers, there are
two approaches to survey design:
– With the traditional approach, we start with the survey content we want. For
example, we want to conduct an income survey and then we start planning for an
income register. We search for administrative sources that can be used when an
income register is created and develop methods that should be used. This kind of
survey design is described in Section 7.2.

This section is based on Laitila, Wallgren and Wallgren (2013).



– With the systems approach, introduced in Laitila, Wallgren, Wallgren (2012),

we systematically analyse each administrative source and try to find out how it
should be used within the production system or register system. For example, if
we analyse income self-assessment from persons, we will find that this source
can be used in many ways. It can be used for an Income Register and for sample
surveys regarding income of households. It can also be used to improve coverage of the Population Register, the Job Register and the Business Register. The
Structural Business Statistics survey can also use this source as there is information regarding sole traders.
Survey design consists of the efforts to maximise the quality of estimates generated
by a specific survey, subject to cost or budget constraints. By quality we as a rule
mean accuracy, but other quality dimensions can be included such as relevance,
comparability and coherence. Biemer (2010) uses the term ‘fitness of use’ for this
broader quality concept.
The transition from a production system without registers into a register-based
system will, for example, reduce the costs for a Population and Housing Census
and a Labour Force Survey. It will also be possible to improve quality. Census
information can be produced every year, and the accuracy of the LFS will be
improved when better auxiliary variables can be used.
9.3.2 Quality assessment in a register-based production system
Different kinds of survey errors are utilised as planning criteria when we work with
survey design. For the design of sample surveys, this planning work is well known
and widely discussed.
How should the corresponding planning process for register surveys be structured? In Laitila, Wallgren and Wallgren (2012), we describe the systems approach
to survey design as consisting of the four steps illustrated in Chart 9.9. Each administrative source is analysed in the following way:
1. Metadata regarding the source are analysed. The relevance is determined as
described in Section 7.2.3.
2. Microdata from the source are analysed. Aspects of accuracy are determined as
described in Section 7.2.3.
3. The source is compared with its base register. Some aspects of accuracy of the
source and the base register are determined and a decision is made if the source
can be used to improve the base register. This kind of editing is described in
Section 9.1.2.
4. The source is compared with all surveys in the system containing similar variables. Aspects of accuracy of the source and the surveys used for comparisons
are determined. It is also determined whether the source can be combined with
other sources for a new survey and whether the source can be used to improve
other surveys.