Tải bản đầy đủ - 0 (trang)
4 Intra- and Inter-laboratory Reproducibility of the Test Method

4 Intra- and Inter-laboratory Reproducibility of the Test Method

Tải bản đầy đủ - 0trang


Validation in Support of Internationally Harmonised OECD Test Guidelines…


sufficiently standardised to enable the assay to be reproduced across laboratories;

further inter-calibration of equipment and test material might help; (2) the number

of laboratories could be increased for a better characterisation of the spread of possible responses generated by the test. Conversely, performance can also show that

between-laboratory results from e.g., four testing facilities do overlap to a great

extent and could have been demonstrated with fewer numbers of participating laboratories. Importantly, one needs to know a priori what is the expected range of

response values, what is the natural variability of the response measured, and how

does this range relates to the magnitude of the response for a range of test chemicals

(from weakly active to potent test chemicals). Ideally, the natural range of variability of the response can be indicated in the Test Guideline and each testing facility

can build its own historical control database accordingly.

Repeatability of the results over time within the same testing facility is also part

of the reproducibility assessment of the test method. Generally, for complex test

methods, a number of proficiency chemicals are defined post-validation on the basis

of the applicability domain and the dynamic range (i.e. spread of responses in the

dataset) of the test method. Proficiency chemicals are then recommended in the

OECD Test Guideline, serving as a benchmark of responsiveness of the test system when establishing the method for routine use. The proficiency chemicals are

also recommended when a testing facility goes through changes in e.g. change of


The coefficient of variation (CV) or the Standard Deviation (SD) of the measured

endpoint can be used for example to assess how reproducible a method is. It is not

possible to give the absolute value of what an acceptable CV or SD is, because it

will depend on the nature of the endpoint measured. For example for body or organ

weight measurement data, intra-laboratory CVs below 20–30 % are considered

achievable and acceptable (Fentem et al. 1998). However, for e.g. hormone measurement, variability is typically much higher from one test organism to another,

resulting in larger SD or CV for a given group. The inter-laboratory variability will

usually be higher than the intra-laboratory variability and should be considered

together with other information on the performance of the test. For that reason,

building an internal historical control database over time is important as an internal

benchmark of the stability of the test system.


Test Chemicals

Demonstration of the test method’s performance should be based on the testing of

reference chemicals representative of the types of substances for which the test

method will be used. The number of so-called reference chemicals and their representativity (in terms of modes of action, range of physical-chemical properties, range

of possible application/use, etc.) has been and continues to be the subject of much

discussion. Usually the validation management group is tasked to make a proposal

on the number and identity of test chemicals, that should be representative of what


A. Gourmelon and N. Delrue

toxicity(ies) the assay is expected to correctly detect (e.g. one or several modes of

action, strong and weekly active chemicals, liquids and solids). In the area of traditional ecotoxicity testing, an apical endpoint is measured (e.g. survival, growth or

reproduction) and no unique mode of action is involved, just baseline toxicity. In

these cases a few moderately toxic chemicals are tested in a large number of testing

facilities; this is usually considered adequate and sufficient to demonstrate the reproducibility of the method. However, for in vitro assays intended to be used as alternatives and replacement of existing in vivo assays that measure a range of activities

(positive, negative, strongly active, weakly active, acting via several mechanisms or

modes of action), it will be very important to have a good understanding of the predictive capacity of the test method being validated using a broad selection of chemicals. Ratios of sensitivity and specificity are calculated, and this cannot be done in a

meaningful way if the number of chemicals is not e.g. statistically justified and/or

too low (e.g., lower than about 20 for each type of expected outcome). Additionally,

these chemicals need to represent a range of activities that characterise the capacity

of the assay. For instance, if the assay is intended to discriminate positive versus

negative chemicals for the targeted biological effect, then a range of strong and borderline positives and negatives need to be tested in the validation; if the assay is

intended to discriminate among strong, moderate, weak and negative chemicals,

then chemicals representative of the dynamic range of the assay need to be tested.

Coding or blind testing of chemicals is a good practice to eliminate bias where it

can influence the outcome of the assay. However, the entire assay does not necessarily have to be performed blindly. Quite often, measurements that are obtained by

using electronic-type equipment (i.e. where there is no possibility of subjective

reading or assessment) do not necessitate coding or blind testing as an absolute

requirement. Nevertheless, it is always possible to code part of the study without

jeopardising the safety in the laboratory, nor involving heavy and costly management of the study. Blind evaluation of histopathology has shown to be challenging

for experts involved who typically need to compare slides and have an understanding of what is the normal aspect and the lesions or findings that can be expected.

Guidance documents have been developed at the OECD to share and communicate

best practice in the review and peer-review of histopathology slides; these endeavours enabled to catalogue pathological findings and associate such findings with a

scoring system to facilitate a semi-quantitative analysis of endpoints, with the view

to decrease subjectivity of the evaluation (OECD 2009c, 2010b, c).


Performance of the Test Method

The performance of test method should be evaluated in relation to relevant information from the species of concern, and existing relevant toxicity data. This principle

is particularly pertinent for alternative test methods that are intended to substitute an

existing test, and for which the predictive capacity needs to be as high as possible,

typically ranges of 85–90 % predictivity are achievable. For the protection of human

health or the environment, the rate of false negatives should be as low as possible,


Validation in Support of Internationally Harmonised OECD Test Guidelines…


to facilitate the regulatory acceptance of the validated test method. Also the selection of good in vivo data for the biological effect of interest is essential for an undisputable characterisation of predictivity. The in vivo data are typically obtained from

animal tests, which can occasionally present an issue if the animal model used (e.g.

rabbit) has anatomical difference compared to human (e.g. eye sac present in rabbit

and not in humans), which may result in more severe effects resulting from the test

than those that could be expected from human exposure to the test chemical (e.g.

eye irritation test). Consequently, the database against which an alternative method

is validated is only as good as the model’s relevance can be, given the differences

between animals and humans.

One concern with surrogate models used to make predictions over qualitative and

quantitative toxicological properties of a chemical substance is the potential loss in

the dynamic range of possible responses or effects compared to the target organism’s

response (i.e. human being, environmental species). By simplifying the test system

to a tissue or a cell, sensitivity and specificity to the chemical stressor inevitably

decrease. For regulators who have to ensure a sufficient level of protection, the rate

of false negatives is critical for future acceptance of a test method; the rate of false

positives, indicating the specificity of the method, is also important and should

remain as low as possible, but less critical for the purpose of protecting human health

or the environment. A structured and formal validation programme, where many

chemicals having good quality in vivo data are carefully selected, helps generating

sensitivity and specificity measures. These measures can then help determine how

the validated test method can be used with confidence in a regulatory context.


Accordance with the Principles of GLP

Ideally, all data supporting the validity of a test method should be obtained in accordance with the principles of GLP. At the OECD, member countries and some nonmember countries have decided to adhere to the system of Mutual Acceptance of

Data by applying Good Laboratory Practice and using OECD Test Guidelines,

because they see benefits for them. GLP includes quality assurance of studies performed. For regulators who have not been involved in the conduct of a study, but

who bear a responsibility in how regulatory decisions are made, a system that

ensures to third party(ies) that the study(ies) supporting a hazard conclusion were

conducted and documented following agreed standards, is important to enable data

exchange and acceptance globally. Nevertheless, GLP certification of a laboratory

participating in a validation study is not a requirement.


Expert Review

All data supporting the assessment of the validity of the test method should be available for expert review. The validation report usually provides access to data summarised in a way to facilitate the evaluation by the reviewer. The statistical


A. Gourmelon and N. Delrue

procedures and tests used to analyse the data should be described, so that the logic

can be followed by independent statisticians if needed. Typically statisticians and

reviewers will pay attention to raw data from control and treated groups and any

subsequent data transform applied if needed, the mean or the median, the standard

error, the pertinence of the statistical test used and the statistical difference the test

can detect, and whether a pattern in the dose/concentration-response exists. For

in vitro methods that make use of a prediction model in the interpretation of data,

reviewers will be interested in the model and how it enables to predict conclusions;

also the consistent treatment of equivocal results is important in building confidence

in the test method. Data owners are free to publish the outcome of the validation

exercise in the scientific peer-reviewed literature. However, the Working Group of

the National Coordinators of the Test Guidelines Programme is keen on having

access to a stand-alone validation report that usually provides more details that an

article in a peer-reviewed journal. The Working Group of the National Coordinators

of the Test Guidelines Programme, or a sub-group of it, also reviews the validation

report and can ask for more details if needed, in particular if such information is

critical to further acceptance of the Test Guideline.

At the OECD, various approaches to the review and peer-review of validation

studies have been used, and are accepted by member countries, provided transparency and clarity are guaranteed. Experience shows it is not easy to find truly independent experts who have never been involved in discussions about a specific test

method for a given area of hazard assessment and have no interest. Equally important to the Working Group of the National Coordinators of the Test Guidelines

Programme is the transparency at all stages of the validation, and the clarity in

opinions expressed in the review, including possible interest or conflict of interest of

the expert providing his/her views. At the end of the review or peer-review process,

the WNT takes a decision based on mutual agreement or consensus on the regulatory acceptance as an OECD Test Guideline.

Several formal peer-reviews have been organised successfully by validation

centres in the United States, Japan and in Europe, in particular for in vitro methods. Experts from various countries participate in the peer-review panels and have

dedicated meetings to discuss whether the validation has been successful in demonstrating the relevance and reliability of the method, and give an opinion about

the scientific validity of the test method. The questions addressed by the peerreview panel generally mirror the validation principles and may address additional

questions on specific aspects of the validation. The validation report and subsequent peer-review report or recommendations are then brought to the attention of

the Working Group of the National Coordinators of the Test Guidelines Programme

who approve, or not, the Test Guideline, in the light of all information available.

This approach to peer-review is the most formal one, but cannot always be implemented given the resources involved, unless a country or region is paying for it.

Occasionally, it has been used for the peer-review of endocrine disruption-related

test methods or very new test methods (OECD 2007, 2011). Alternatively, the

Working Group of the National Coordinators of the Test Guidelines Programme

considers that the outcome of the validation may also be reviewed by existing


Validation in Support of Internationally Harmonised OECD Test Guidelines…


OECD expert groups who have the opportunity to discuss issues on the performance of the method and propose solutions that may facilitate regulatory acceptance. A number of validation reports supporting the development of Test

Guidelines, reviewed by an Expert Group, are being endorsed by the Working

Group of the National Coordinators of the Test Guidelines Programme, and published in the OECD Series on Testing and Assessment. In this way, reports supporting the validation status are referenced in the Test Guidelines and remain

accessible to the public.



The above-mentioned validation principles are generally applicable across the range

of test methods entering a validation study. The process used for the validation

should remain flexible and adaptable (or modular), taking into account pre-existing

information on the status of a test method, experience in performing the method, the

intended purpose and use/place of the method (i.e. stand-alone replacement method,

alternative method, part of a battery of assays, etc.). These preliminary considerations are useful in determining the extent of validation remaining, either prospectively or retrospectively, depending on the quality data available. Although various

approaches are possible, it is important that decisions on how to conduct a validation study be guided by clear purposes for each phase of the validation. In a prospective validation study, not all purposes and questions can be addressed at once,

and several phases may be necessary, typically 2–3 phases, depending on what supporting information already exist that determine the objectives of a given phase of

validation. From experience at OECD with the validation of endocrine disruptionrelated assays, separate portions of the validation programme have been organised

to address different purposes: the demonstration of inter-laboratory variability, the

demonstration of the relevance of the assay for the detection of a range of chemical

activities, and the blind-testing of some chemicals. Not all laboratories were necessarily involved in all portions of the validation programme, and these portions have

either been performed one after the other, or in combinations. The commonality

between most validation studies is the availability of a management group that

defines together with the lead laboratory the gaps and the specific objectives to be

addressed in a validation programme composed of several types of studies.

Over the last 20 years, validation studies have been performed by individual

member countries, by test method developers, by established validation centres in

countries/region or under the auspices of OECD. Most of the validation programmes

have resulted in the adoption of an OECD Test Guideline, with a few exceptions.

There are now several examples that can illustrate different situations, to name a few

in the area of alternative test methods:

• ICCVAM evaluation of the Human Skin Corrosion test (TG 431),

• ECVAM validation of the Fish Embryo Toxicity Test (TG 236),


A. Gourmelon and N. Delrue

• JP METI validation of the Estrogen receptor-stably transfected transactivation

assay (ER-STTA), for the screening of agonistic activity of chemicals (TG 455)

• US EPA validation of the Steroidogenesis Assay (TG 456).

Each of these cases has resulted in the regulatory acceptance and adoption of an

OECD Test Guideline. Also, for each test method under consideration, the project to

develop an internationally agreed Test Guideline can be proposed to the OECD at

various stages of development. Most importantly, sponsors of test methods need to

be engaged in discussions with international experts and regulators at an early stage

of method development, at the OECD or elsewhere in meetings of scientific societies or expert networks and fora. These early discussions represent an important step

to gauge interest from peers and regulators, to get feedback from the regulatory community on important issues to be addressed in the validation and information needed

by regulators to make decisions. Each project and test method will have its specificities in terms of validation needs. Especially nowadays for some hazard areas, a

number of internationally agreed Test Guidelines already exist, so the requirements

for new test methods for the same hazard endpoint will be different (e.g. they will

need to demonstrate superiority compared to other methods to be accepted by regulators who do not want a plethora of similar methods doing the same thing).


Other Elements Influencing Regulatory Acceptance

The test methods adopted as OECD Test Guidelines are intended to generate valid

and high quality data to support chemical safety regulations in member countries.

Advances in life sciences allow the continuous development of new and improved

alternative testing methods. The regulatory acceptance of validated alternative testing methods at the OECD level represents the ultimate step leading to their regulatory implementation. Other upstream factors come into play, namely an enabling

policy environment for the development of the alternative methods. In Europe, the

regulatory framework for cosmetic products aims to strike the right balance among

policy mechanisms that facilitate the regulatory acceptance of non-animal methods.

In the last two decades, investments in research enabled the emergence of a wealth

of candidate methods in particular for topical toxicity testing. The recent ban of

animal testing for cosmetics in the European Union has pushed further the development of non-animal methods, while setting time pressure to get valid and acceptable

test methods. The validation process has been applied to filter methods of sufficient

relevance, reliability and predictive capacity. In recent examples of OECD Test

Guidelines, the regulatory acceptance has only been possible when protection of

human health was not jeopardized. The interpretation of negative results and the

potential for a test method to generate false negatives remain difficult issues for

regulators. The acceptance of negative results from a given source or test is generally better accepted when the results are interpreted in a framework where other

available sources of information and possible alternative tests are also integrated

and show concordant results. This means that test methods tend to be less and less


Validation in Support of Internationally Harmonised OECD Test Guidelines…


regarded as stand-alone: the context, the purpose, the mechanistic understanding,

the predictive capacity (including the rate of false negatives) and possible agreed

and harmonised testing strategies to combine sources of information are key elements for further regulatory acceptance.

For more complex endpoints such as reproductive toxicity and carcinogenicity,

the tasks remaining to be undertaken are daunting. Further efforts to understand

toxicity pathways and to build integrated approaches to testing and assessment

(IATA) are needed, in supplement to rigorous validation of individual methods.

These efforts will set the scientific basis and provide the context under which alternatives to animal testing can be consistently and safely applied by regulatory bodies. Recently the Syrian Hamster Embryo Cell Transformation Assay, went through

a validation programme, but it still not currently accepted as an OECD Test

Guideline. One apparent reason was the lack of framework to guide regulators on

how this assay could be used in the regulatory assessment of substances, mainly

non-genotoxic carcinogens. Also, the limitation in the predictive capacity and the

limited understanding of the underlying mechanisms of action appeared to hamper

its regulatory acceptance. There are certainly lessons to be learnt for future test

methods aiming to address complex hazard endpoints. In particular there is a need

for a careful consideration of the regulatory context, possible purpose and use of a

test method and data generated, and its applicability domain.



Challenges Ahead

Complex Endpoints Need Integrated Approaches

to Testing and Assessment (IATA)

For more complex endpoints, alternative methods have to be combined in some

ways to provide meaningful predictions. If possible, mode(s) of action will have to

be known or postulated for hypotheses and toxicity pathways to be elaborated. The

validation will then be very useful to demonstrate the relevance of the method and

its utility for a given regulatory purpose. Obviously, the reliability of the method

will also have to be established, but provided the procedures are well described the

reproducibility of assays tends to be more straightforward with improved techniques, properly calibrated equipment and standardised practices.

Having a clear scope and realistic goal for use of a test method also facilitates its

future regulatory acceptance. Experience at the OECD with some assays has shown

that regulatory acceptance is hampered by unrealistic or changing objective over

the course of validation and Test Guideline development. Also, when the expected

need and possible of the assay are not well defined, the selection of reference chemicals cannot be optimal and this can cause problems in meeting the objectives of a

validation study.

It is illusory to believe that all mechanisms of action will soon be discovered and

that future alternative test methods will all be mechanistically-based. Although this


A. Gourmelon and N. Delrue

is a wishful thinking for what the future should be, test methods developers and

regulators will continue to rely on alternative methods where mechanisms are yet

unknown, but utility and predictive capacity of the method are experimentally

established for a given hazard endpoint. Beyond experimental validation, given the

number of alternative methods addressing the same endpoint, developers and regulators should endeavour to talk together to agree on frameworks of application,

describing how they complement each other and which test should be used under

what circumstances. These frameworks, also called integrated approaches to testing

and assessment (IATA) can be articulated or not around modes of action/adverse

outcome pathways. Frameworks can also be developed in the absence of complete

knowledge on mode of action or adverse outcome pathway, as long as it proposes a

harmonised, meaningful and efficient use of methods to reach a conclusion. At

OECD, these IATAs are the best place where methods and approaches to testing can

be explained, with their advantages and limitations, and where harmonised testing

strategies can be proposed (OECD 2014). For regulators, it is also re-assuring that

despite a choice of possible alternative methods, they are guided by an agreed

framework for their application.


High Throughput Screening (HTS) Assays May Need

a Streamlined Validation Process

High-throughput screening techniques are more and more used beyond research and

development, and experience gained to date raise expectations that results of these

techniques may be used for screening and priority setting of chemicals in regulatory

programmes. From a manual method to the equivalent (ultra) high-throughput

methods, the principle of the test (e.g. binding to a receptor) and material used (e.g.

transformed cell line) may not differ substantially, thus the relevance of the test in

itself remains the same regardless of the throughput level. The main hurdles in the

validation may be of a technical nature, and validation principles may need to be

revisited and adapted to these new techniques.

For HTS assays that have an equivalent in vitro manual assay validated, the validation is greatly facilitated by having a well-defined list of reference chemicals that

has been used in the validation of the equivalent manual method and possibly other

similar methods. An adequate calibration of equipment used, a sufficient number of

internal, positive, negative controls, are important to the success of the reliability of

the test system. Given the small volumes of individual test chambers (i.e. microwells on the plates) and the high degree of robotisation, an issue could be the higher

variability impacting accuracy of the results to predict a biological response. This

may potentially be compensated by a higher level of standardisation and precision

enabled by the robotisation of procedures and lesser human handling. For HTS

assays that have no equivalent manual in vitro assay validated, one consideration

might be whether validation of the manual method is a pre-requisite, whether it will

facilitate the validation of the HTS method, or whether it is superfluous and not

needed for further regulatory acceptance.


Validation in Support of Internationally Harmonised OECD Test Guidelines…


Furthermore, there is a limited number of testing facilities around the world

equipped to perform these advanced techniques, given costly investments implied.

These facilities are expected to represent highly performing laboratories where all

calibration and quality control procedures are in place and working well. Provided

this assumption is correct, the reproducibility of HTS screening techniques across

laboratories should not be the main challenge of the validation process. As these

tools will be used on large numbers of chemicals for screening purposes, it will be

important to have assays with large applicability domains or multiple assays that

together cover a large applicability domain, to build confidence of regulators that

the test system does not miss positive effects. As a consequence, a streamlined

approach to validation may be needed to address these relevant aspects. Testing

more chemicals in fewer laboratories would make sense for an efficient use of

resources (Judson et al. 2013). Other principles of the validation, such as having a

detailed protocol and a description of the relationship between the test methods

endpoint(s) and the biological phenomenon of interest, certainly remain important

pre-requisites for any future acceptance of methods as OECD Test Guidelines, if

such methods are expected to be covered by the Mutual Acceptance of Data.


Conclusions and Concepts to Preserve

Existing OECD-agreed validation principles will most likely generally remain relevant and applicable to address challenges associated with the validation of future

test methods. Some adaptations may be needed, but demonstration of relevance and

reliability will continue to play a central role as pre-requisite for the regulatory

acceptance. Despite the fact that methods and techniques are getting more and more

sophisticated and require a good level of proficiency, having harmonised standards

for generating reliable results globally remain an important goal for the efficient use

of resources. The Mutual Acceptance of Data among OECD member and partner

countries is essential to maintain efficiency in testing and assessment of chemicals;

trustable methods and harmonised approaches will continue to be needed. It is also

important to continue to promote the OECD validation principles globally so that

new techniques and assays emerging from science are supported by a good quality

data generated using best practice to appraise their utility, potential for validation,

and further regulatory acceptance.


Balls M et al (1990) Report and recommendations of the CAAT/ERGATT workshop on the validation of toxicity test procedures. ATLA 18:313–337

Balls M et al (1995) Practical aspects of the validation of toxicity test procedures: the report and

recommendations of ECVAM workshop 5. ATLA 23:129–147

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Intra- and Inter-laboratory Reproducibility of the Test Method

Tải bản đầy đủ ngay(0 tr)