Tải bản đầy đủ
2 Measuring quality – why and how?

2 Measuring quality – why and how?

Tải bản đầy đủ

268

THEORY AND QUALITY OF REGISTER-BASED STATISTICS

Holt maintains that the most important aspect of quality when it comes to registerbased statistics is not accuracy, but relevance.
Nanopoulos (2001) maintains that countries like Denmark that have wellintegrated register systems need a conceptual apparatus regarding errors in statistics, which will be different from that required by countries that mainly carry out
sample surveys and censuses.
We agree with the authors cited above, and our conclusion is that it is important
to consider the following when discussing the quality of register statistics:
– It is necessary to distinguish between surveys with their own data collection and
register surveys. Otherwise, there is a risk of uncritically using the traditional
error models developed for sample surveys and censuses.
– There should be a distinction between the quality of a register survey and that of
a register, as a register has many possible uses.
Sample surveys and censuses are carried out with one particular use in mind and
quality issues generally focus on the estimates. In the case of a statistical register,
many different uses are possible – such a register may serve not only current surveys but also future ones.
Similar to other surveys, the quality of a register survey also relates to one specific use of the register and focuses on the quality of the estimates, particularly
their relevance and accuracy in relation to the purpose of the survey. Describing
quality is here a question of indicating whether the quality of the survey is good or
bad.
However, the quality of an administrative or statistical register is not related to
one particular use and, when describing quality in this respect, it is important to
indicate what characteristics the register has, thereby implying the uses to which it
may be put. The quality of the register will affect the quality of the surveys based
upon it, and is determined by three factors:
– the administrative systems that generate the input data on which the register is
based;
– the possibilities offered by the system of statistical registers with regard to
improving coverage, content of variables and consistency; and
– the processing methods used to produce the register.
Chart 15.4 Input and output data and the production process
Input data:

A. Monthly reports
from employers

B. Yearly income statements from employers

C. Yearly income tax
returns from enterprises

Register-statistical
processing

Register system
Processing methods

Output data:
Quarterly Gross
Job
Employment Income Yearly Gross
Pay Register
Register
Register
Register Pay Register

Structural Business
Statistics survey

Micro simulation
model, enterprises

In Chart 15.4 three administrative sources are used to create seven statistical registers. The output quality of the estimates produced with these seven registers de-

THEORY AND QUALITY OF REGISTER-BASED STATISTICS

269

pends on the input data quality of the three sources, the possibilities offered by the
register system, and the processing methods used when these seven statistical
registers are created. These aspects of the production process are illustrated in
Chart 15.4.
Quality of registers and register surveys
The quality of the register should be described in general terms, so that potential
users can see whether it suits their purposes. The description should relate to the
various areas of application that may be of interest. We distinguish between three
ways of using registers and the corresponding quality aspects:
– Cross-sectional quality: what comparisons can be made within the register?
– Time series quality: what comparisons can be made over time on the aggregated
level?
– Longitudinal quality: what comparisons can be made at the micro level over
time?
The quality of a register survey should be described for one particular use of the
register. Is the quality of the estimates good or bad for this intended use? The
relevance and accuracy of the estimates should be described. Chart 15.5 compares
the ways of describing overall quality.
Chart 15.5 The overall quality of registers and register surveys
Quality

Register

Particular register survey

Relevance

Only definitions are given

Are the definitions adequate and
functional? This is discussed in detail.

Cross-sectional quality

What comparisons can be made?

Time series quality

What comparisons can be made?

Longitudinal quality

What comparisons can be made?

The quality is described only for the
particular use. Is the quality good or
bad? The quality of estimates is
described.

The views of users and producers on quality
How are register-based statistics used? What are the requirements of users regarding quality? Biemer and Lyberg (2003) discuss quality in sample surveys. The
starting point is one estimate, and the total error for this estimate is divided into 12
components. Errors that occur during the different stages of a sample survey can be
either random or systematic. Random errors make the estimates uncertain but do
not cause distortion. On the other hand, systematic errors cause distortion, which is
to say that the value sought is overestimated or underestimated.
Chart 15.6 Risk of random and systematic errors by major error source
Error source
Specification error
Frame error
Nonresponse error
Measurement error
Data processing error
Sampling error

Risk of random errors
Low
Low
Low
High
High
High

Source: Biemer and Lyberg (2003, p. 59)

Risk of systematic errors
High
High
High
High
High
Low

270

THEORY AND QUALITY OF REGISTER-BASED STATISTICS

This is the producer’s view of quality. For the producer it is important to know
which parts of the survey function well and which parts function badly. Based on
this knowledge, the parts where the most serious errors occur can be improved.
Platek and Särndal (2001) discuss the quality of official statistics from the user’s
point of view. The user is interested in answers to such questions as, ‘Can I trust
these statistics?’ and ‘Are they suitable for my purposes?’ Users want a guarantee
of quality. Can the statisticians give it? What form should it assume?
The answers to these questions given by statistical offices are often insufficiently
clear. Platek and Särndal claim that data quality means different things to different
categories of staff in a statistical office:
– statistical methodologists regard it as a question of accuracy;
– subject matter specialists regard it as a question of content and presentation;
– informatics specialists regard it as a question of the efficient functioning of data
systems and processing; and
– managers regard it as a question of the functioning of budgets and time plans.
From detailed knowledge on quality to a comprehensive picture of quality
The discussion in this section focuses on the gap between in-depth knowledge and
a comprehensive overview. The different staff categories have thorough, in-depth
knowledge of a great many different factors that affect quality. The methodologist
thinks in terms of the different phases of the survey and all the sources of error that
may exist in these phases, while the IT specialist thinks of the production system
and all processing errors that can occur. Both have extensive knowledge, but may
lack the comprehensive overview that the user needs.
The subject-matter specialists generally have the closest contact with the users of
statistics, which is why they should provide this comprehensive overview of quality. However, to do so they must be in close contact with the methodologists and the
IT specialists as well as being able to understand the users’ needs.
Detailed knowledge of the different quality components is acquired using the
quality assurance guidelines described in Section 15.3 in relation to the processes
involved in producing the statistical registers. On the basis of this knowledge, an
overall appraisal can be made of the input data quality of the administrative registers that are used as sources in the production processes. The same methods for
quality assurance can also be used to analyse the errors in statistical registers,
register surveys and the nonsampling errors in sample surveys.
Errors in sample surveys and register surveys
The quality of a sample survey or census is primarily determined by how well the
data collection process functions. This process is fairly similar in all surveys which
have their own data collection. Therefore, the following list of the most important
quality factors will apply more or less to all sample surveys or censuses: sampling
errors, nonresponse errors and measurement errors.
The fact that the same methodological problems are encountered in all surveys
facilitates discussions with colleagues, methodology development and the establishment of guidelines. On the other hand, the different sample surveys do not

THEORY AND QUALITY OF REGISTER-BASED STATISTICS

271

affect one another – nonresponse in the Labour Force Survey, for instance, does
not affect the Living Conditions Survey or the survey on Deliveries and New
Orders in Industry. Although the most important quality factors will probably not
be the same for all register surveys, the quality of one register will, in general,
affect the quality of others. Surveys based on data collection and register surveys
can thus be compared in the following way:
Surveys based on data collection:

Register surveys:

– Same quality issues in all surveys

– Different quality issues in different surveys

– Quality of one survey does not affect other
surveys

– Quality of one survey affects many other
surveys

Until now, there has not been much exchange of experience concerning registerstatistical methodology and quality issues; however, it is our hope that a common
terminology and a common perspective will stimulate such exchange.

15.3 Analysing administrative sources – input data quality
The quality of administrative data depends on how the data are generated and how
they have been recorded. The statistical office that receives administrative data
must analyse the quality of each source. This knowledge is needed to decide how a
source should be treated and used; and it is a basis for descriptions of the quality of
the final estimates, the output quality.
From Chart 2.6 Quality of different kinds of administrative data
Statistical data, data not Legally imused for administration portant data
Identities
Handwritten
Very bad
Bad
Pre-printed
Good
Better
Better
Best
Online check
Other variables Paper form
Bad
Good
PC or internet
Good
Better

Decisions made
by an authority
Bad
Better
Best
Good
Better

A great part of the administrative data that is used by Statistics Sweden consists of
taxation data that are legally important. Identities are pre-printed on paper forms or
checked online by an authority when data are recorded. When tax-payers submit
information to the National Tax Agency they often use the internet; and data are
then corrected or edited directly by the taxpayer. Some administrative sources are
suitable for statistical purposes, other sources are not. We give two examples below
that illustrate this difference.
Example: Codes used when police officers report crime are often used as an example of administrative data with low quality. Färnström (2013) discusses different
quality issues related to crime codes. Actually, the codes used by the Swedish
police are often a combination of administrative data of good quality and statistical
data of bad quality. The administrative part concerns type of crime and the statistical part of the code can consist of context information describing the victim, the
relation between victim and offender and other statistically interesting details. The

272

THEORY AND QUALITY OF REGISTER-BASED STATISTICS

coding system also has some weaknesses that lead to bad quality; categories such
as ‘other types of theft’ are easy for stressed police officers to use.
Example: A common opinion is that taxpayers only submit data that serve their
purposes and consequently tax data are of low quality. In the example below, taken
from a leading Swedish newspaper, most people would like to pay as little tax as
possible, so the deductions may be higher than are justifiable.
80 per cent of Swedish people’s tax deductions are pure tax evasion
Taxpayers submit errors worth billions in their tax declarations. Complicated rules and unclear legislation have made it hard for the country’s tax authorities to check all the deductions. Errors can be found
primarily in the deductions for share transactions, management fees and other share-related charges.

Deductions for the sales of shares
– 1/3 of all share sales contain errors
– 700 000 taxpayers report profit of around SEK 50 billion and losses of around SEK 10 billion
– Tax errors are difficult to judge and amount to billions of Swedish kronor
– Many inadvertent errors occur because of the complicated rules

Deductions for management fees
– 125 000 taxpayers claimed deductions of a total of SEK 515 million
– 66% of these deductions contain incorrect information
– Tax errors can in total be calculated at SEK 90 million
– A deduction for fees for fund managers is the most common error, the fee is deducted automatically

Deductions for other expenditure
– 700 000 taxpayers claimed deductions of a total of SEK 2.8 billion
– 82% of these deductions contain incorrect information
– Tax errors can in total be calculated to amount to around SEK 700 million

The headline exaggerates in several ways, ‘80 per cent’ is an exaggeration and ‘tax
evasion’ is often based on misunderstanding due to complicated rules:
– Deductions for share sales: the errors are largely unintentional.
– Deductions for management fees: the errors are on average 17% (90/515) and
the most common error may be unintentional due to misunderstanding.
– Deductions for other expenditure: 82% of these deductions contain errors but the
deductions are on average 25% (700/2800) incorrect.
Another perspective on these errors is when they are compared to the total income
for all those filing tax declarations, the error is 0.3%.
The fact that deductions in the declarations are too high, and that consequently
the tax is too low, does not mean that the statistics in the Income Register are of
low quality, even though they are based on these declarations. Assume that we
have data for a person who makes excessively high deductions on her/his tax
declaration, but otherwise declares correctly:
Income from employment
Deductions for other expenditure

257 600
25 500

The income is correct.
The deduction is too high but is accepted.

Taxable income

232 100

Taxable income is incorrect according to the tax rules
but is not used for the statistics.

Tax

100 000

The tax is incorrect and too low according to the tax
rules, but statistically it is correct, as this is the tax that
the person actually paid.

Disposable income

157 600

Statistically correct