Chapter 4. Numerical Methods for Value-at-Risk
Tải bản đầy đủ - 0trang
240
CHAPTER 4
. Numerical methods for value-at-risk
value-at-risk
FIGURE 4.1 The probability that a loss is greater than value-at-risk, the density of the shaded region,
is equal to 1 − .
in-depth, general, algorithmic, and mathematical discussions, we have a personal preference
for [Mor96a, Hul00, DP97].2
In financial markets, risk is caused by uncertainty about the value of an investment in
the future. The value of a portfolio is a function of a set of risk factors. Risk factor is the
generic term for a financial variable related to market prices of selected reference securities,
for example, equity indices, interest rates, foreign exchange rates, and commodity futures
prices. Market risk is the risk that the value of a portfolio declines as a consequence of
changes in the risk-factor values. Therefore, to model market risk we need to understand how
risk factors evolve over time.
Consistently with the hypothesis of absence of arbitrage, we will assume that the changes
in risk factors are random. Although historical data is of limited use to predict changes in risk
factors, it can be used to estimate statistical models to model risk factors and their correlations.
In our examples, we use stocks as elementary risk factors, although the methodology applies
to a wide range of financial instruments.
A simple formula for value-at-risk can be obtained in the case where an n × 1 vector
of relative changes R in the market risk factors is a multivariate normal random variable
with mean vector and covariance matrix C, and if one assumes that the change in portfolio value can be approximated by an affine function of the relative changes in the risk
factors:
≈
+
T
R
(4.2)
Throughout this chapter we shall use superscript T to denote the transpose. Note: We are
using
to denote the change in portfolio, i.e.,
= t − 0 for a time lapse t, whereas
in the dot product is the vector of sensitivities w.r.t. the returns (i.e., the delta Greeks of
the portfolio), as defined later.
2
The Web site www.gloriamundi.org is an excellent source for information and links to papers on value-at-risk.
Numerical Methods for Value-at-Risk
241
Since
− = T R is a sum of normal random variables, then it is itself normal. The
distribution is determined by its mean and variance,
2
So
−
=E
−
=
−
2
=E
=E
T
T
R =
R−
2
T
ER =
=
T
T
C +
T
(4.3)
E R−
R−
T
=
T
C
(4.4)
is the random normal variable
=z
+
T
(4.5)
where z ∼ N 0 1 . Hence, inverting equation (4.1) while using N −1 1 −
the value-at-risk
VaR = N −1
T
C −
−
T
= −N −1
gives
(4.6)
where N −1 · is the inverse of the standard normal cdf.
The linear model with normal relative changes has a closed-form solution, but it suffers
from two serious problems. First, real-world returns have fatter tails than normal distributions. The model will therefore underestimate the likelihood of extreme returns, which
as a consequence may lead to inaccurate estimates of value-at-risk. Second, for portfolios
with derivatives, the change in value is a nonlinear function. The local error in the linear
approximation will therefore often be unacceptable, a property that is exacerbated by dynamic
hedging strategies that use the linearization to eliminate risk locally. To compute value-at-risk
for models that take these difficulties into account is a substantially harder task.
Let St be the process for a risk factor. Returns on St over the time horizon 0 t can be
defined either as arithmetic returns
Rt =
St − S 0
St
=
S0
S0
or as the log-return,
˜ t = log St − log S0
R
(4.7)
Log-returns have the advantage that one can aggregate returns over time by addition. In the
multivariate case, St is a vector of prices and returns are taken componentwise. Of course the
two are closely related. The difference,
˜t =
Rt − R
1
2
St
S0
2
+O
St
S0
3
is typically negligibly small for estimation purposes, and either type of return can safely be
approximated by the other. In the examples that follow, we choose log-returns.
Because the return is dimensionless, i.e., the quantity does not have a unit, return models
are preferred over models for prices. We consider a model in which the returns, sampled
at equally spaced points in time, form a sequence Ri i=1 of independent and identically
distributed random variables. This means that stock prices are discrete time Markov chains
with an infinite state space [Ros00]. Choosing different distributions gives different models
in this family.
CHAPTER 4
. Numerical methods for value-at-risk
Closing prices, 1997–2001
50
45
40
35
Price (CDN)
242
30
25
CTRa
20
15
BCE
10
5
1997
1998
1999
BCE and Canadian Tire
2000
2001
FIGURE 4.2 Daily closing prices for BCE and Canadian Tire from January 1997 to December 2001.
Visual inspection of historical time series gives clues on the key statistical properties.
Figure 4.2 shows the daily closing prices over 4 years for two Canadian stocks traded on the
Toronto Stock Exchange (TSX): Bell Canada Enterprises (BCE) and Canadian Tire (CTRa).
The scatter plot in Figure 4.3 shows that the daily returns form a cloud of samples around the
origin in what resembles a multivariate unimodal distribution. The time series can be divided
into segments with the same time span as the returns in the model Ri i=1 . For each time
interval, the relative return can be computed as
Ri =
Si − Si−1
Si−1
i=1
d
(4.8)
where Si−1 and Si are, respectively, the prices at the beginning and end of the time interval. Since the returns Ri i=1 in the model are independent and identically distributed, the
computed (observed) returns ri are viewed, rightly or wrongly, as independent samples from
the same distribution. After settling on a family of distributions for the random-walk increments, the parameters of this distribution can be estimated from the time series of returns
ri di=1 .
Many generalizations of the random-walk model have been proposed to correct shortcomings revealed in empirical studies; see, for instance, [CLM97]. Over time periods of a
few days one can make the simplifying assumption that the returns Ri i=1 are independent
and identically distributed. First, for time periods spanning more than a few years, the returns
are not identically distributed. To obtain the current reading and forecast for the volatility, it is standard practice either to use only recent data or to use a weighting scheme to
attribute a lesser weight to older data or to model the intertemporal dependencies by means
of more elaborate statistical models, such as ARCH and GARCH [Eng82, Bol86, Nel91,
Hul00].
4.1 Risk-Factor Models
243
FIGURE 4.3 Scatter plot of relative returns for BCE and Canadian Tire.
4.1 Risk-Factor Models
Recall that, in the random-walk model, returns are modeled as a sequence Ri i=1 of independent and identically distributed random variables. In this section, we discuss three different
instances of this model, three different alternatives for the distribution of the random variables: the normal random walk, the asymmetric Student’s t-distribution and the nonparametric
density estimator due to Parzen [Par61]. The methods will be generalized to the multivariate
case in the next section.
4.1.1 The Lognormal Model
In the lognormal model, the distribution of log-returns
Ri ∼ N
is normal with mean
i=1 2
2
(4.9)
and volatility . The mean can be estimated using the sample returns
=
1
d
d
ri
i=1
(4.10)
244
CHAPTER 4
. Numerical methods for value-at-risk
and the variance by
2
=
1
d−1
d
ri −
2
(4.11)
i=1
See, for instance, [LM86]. Some authors advocate using estimators that give more weight to
recent returns than to old ones (see, for example, [Mor96a, Hul00]).
To illustrate the performance, we estimate the parameters
and 2 for daily returns
for the BCE time series. Figure 4.4 shows the quantile-quantile plot3 for the fitted normal
distribution. It is clear that the normal model is a good approximation for small returns, but
FIGURE 4.4 Quantile-quantile plot for the normal random walk with parameters estimated from
4 years of daily returns for BCE.
3
A quantile-quantile plot is a method for comparing two distributions. Given a set of observations, we use it to
compare the empirical distribution and a distribution fitted to this data. Sorting the observations gives the empirical
cumulative distribution functions (cdfs). Each observation, which corresponds to a quantile, and the corresponding
quantile for the fitted distribution are marked in the plot. If the two distributions are the same, the points fall on
the diagonal reference line. Deviations from the diagonal line indicate that one distribution has fatter or thinner tails
with respect to the other. To learn more about this, the reader is referred to the relevant numerical project in Part II.
245
4.1 Risk-Factor Models
for both the negative and positive tails the distribution does not fit the data. Fat tails are
typical for stock returns; to estimate value-at-risk, where we need to compute tail quantiles,
the normal model is less suitable. The next two subsections explore different approaches to
construct random-walk models with more realistic tails.
4.1.2 The Asymmetric Student’s t Model
Student’s t-distributions have fat tails. The density for a t-distributed random variable is
+1
2
=
pT x
1+
√
−
x2
+1
2
x∈
(4.12)
2
= 0, and the variance for
the mean is
> 2 is
2
=
(4.13)
−2
The normalization factor involves the gamma function · . The degrees of freedom control
the fatness of the tails; as → , the distribution converges to the normal distribution.
An alternative to the normal model is to define a random walk with t-distributed increments. Since the fatness of the tails can be different for negative and positive returns, we
generalize this idea and let each random variable in the sequence Ri i=1 be distributed as
+ −2
1−
A = m+
− −2
B T+ +
+
−
B − 1 T−
(4.14)
The random variables T+ and T− are t-distributed with degrees of freedom + and − ,
respectively. The random variable B is a Bernoulli random variable; B takes the value 0 or 1
with probability 5. The random variables T− , T+ , and B are independent. We say that A is an
asymmetric Student’s t-distributed random variable. The density, figuratively a density made
up of a Student’s t pdf cut in half, is
px =
⎧ √
⎪
⎨ pT
⎪
⎩ pT
x−m
−
√
− −2
−
√
+
+ −2
x−m
√
1−
√
1−
√
+
−
− −2
if x ≤ m,
+
+ −2
(4.15)
if x > m.
Since the two regions each make up half of the density, m is the median of the distribution,
and, with a little algebra, it is easy to derive moment properties relative to the median. We
then have the following result, whose proof is left as an exercise.
Proposition 4.1. Suppose that − > 4 and + > 4. Then an asymmetric t-distributed random
variable, defined by equation (4.14), satisfies the following moment properties:
(i) The expectation is
⎡
1
= E A−m =
⎣
+ +1
2
+
2
1−
√
+ −2
+ −1
−
− +1
2
−
2
√
− −2
− −1
⎤
⎦
246
CHAPTER 4
. Numerical methods for value-at-risk
(ii) The second moment is
2
= E A−m
=
2
2
(iii) The second conditional moments are, for negative values,
2
−
= E A−m
2
A≤m =2
2
and, for positive values,
2
+
= E A−m
2
A>m =2
1−
2
(iv) The fourth conditional moments are, for negative values,
4
−
= E A−m
4
A≤m =2
3+
4 2
6
− −4
and, for positive values,
4
+
= E A−m
4
A>m =2
1−
4
3+
2
6
+ −4
Once the moment properties are known, estimating the parameters in the model is straightforward. The first step is to compute the median m of the observed returns ri di=1 by sorting
the samples and taking m to be the order-k value if d = 2k + 1 is odd, or the average of
the order-k and-(k + 1) values if d = 2k is even. Then find the sample estimate for the
second moment
2
=
d
1
d−1
ri − m
2
i=1
We then estimate the contribution to the second moment from the negative and the positive
halves. Let d = d− + d+ , where d− and d+ are the number of observations less than and
greater than m, respectively. Then
=
1
2 2 d−
ri − m
2
ri ≤m
Finally, using the sample estimates for the fourth moments,
4
−
=
2
d−
ri − m
4
4
+
and
ri ≤m
we can solve for estimates of the degrees of freedom
−
6
=
2
=
4
−
4 2
−3
+4
and
+
+
2
d+
ri − m
and
−,
6
=
4
+
2
4
4
ri >m
1−
2
−3
+4
The advantage of the asymmetric t model over the normal model is that, as illustrated
by the quantile-quantile plot in Figure 4.5, the tails of the empirical distribution can be
reproduced more accurately. However, this improvement comes at a price, since the pdf has
a discontinuity at the center. The jump is counterintuitive and the implementation of this
model is more difficult, but in comparison to the advantage of increased accuracy these are
minor concerns.
4.1 Risk-Factor Models
247
FIGURE 4.5 BCE quantile-quantile plot for the random walk model with the asymmetric t model.
4.1.3 The Parzen Model
A nonparametric density estimator is an alternative to using a parametric method, such as
either of the first two examples. Let ri di=1 be samples from a distribution with an unknown
pdf, p x . In [Par61] Parzen develops and analyzes a family of estimates of the form
pd x =
1
dh
d
K
i=1
x − ri
h
(4.16)
initially suggested by Rosenblatt in [Ros56]. In our examples, we use the weighting function [TT90]
Kx =
15
1 − x2
16
2
for x ≤ 1
(4.17)
248
CHAPTER 4
. Numerical methods for value-at-risk
Note that K x ≥ 0 is a kernel function that integrates to unity. Parzen shows that, if p x
is sufficiently smooth, pd x is asymptotically unbiased and, for an optimal sequence of
h-values, the mean square error converges to zero as4
E pd x − p x
2
= O d− 5
4
We refer to a random walk using the Parzen estimate (4.16) for the pdf as the Parzen model.
Similar to the asymmetric t model, the Parzen model can recreate the fat tails more
accurately than the normal model, and it also seems to have a slight advantage over the
asymmetric t model, as illustrated by the quantile-quantile plot in Figure 4.6. The advantage
FIGURE 4.6 BCE quantile-quantile plot for the random-walk model with the Parzen density estimate.
4
Parzen presents a theory for density estimates of the form of equation (4.16), with general weighting functions
K x . Let hd → 0 as the number of samples d → . He shows that density estimates of the form of equation (4.16)
converge (pointwise in a mean square sense) to a continuous pdf as d → , More precisely, given a sequence of
smoothing parameters hd d=1 with limd→ hd = 0 and limd→ dhd = ,
E pd x − p x
2
→0
as d →
The sequence of smoothing parameters giving optimal rate of convergence depends on both the point x and the pdf p x
as well as the weighting function K x . See Parzen [Par61] for examples of and details about general weighting functions.
4.1 Risk-Factor Models
249
of using a nonparametric model is that it does not rely on specific assumptions about the
shape of the density. There are three disadvantages to the Parzen model. First, the optimal
smoothing parameter h is unknown. While experimenting with different stocks, we have
found that taking h equal to the standard deviation works well.5 Second, for our choice of
weighting function, the density estimate has compact support. However, the support covers
the region of interest for value-at-risk calculations, so it should have a minor influence on the
result. Third, evaluating equation (4.16) or the corresponding cumulative distribution function
(cdf) for different values of x is expensive for large samples. In our implementation, we
avoid summing over all sample points by using cubic splines to approximate the cdf and
the pdf.
4.1.4 Multivariate Models
So far we have only considered models for the return on a single risk factor. In general,
portfolios depend on many risk factors. Therefore we must extend the one-dimensional
random-walk models, presented in the previous sections, to the multivariate case.
In the multivariate random walk, Ri i=0 is a sequence of n -valued vectors of random
variables. The random vectors are independent and identically distributed. The difficulty in
constructing a realistic multivariate model is that returns on the risk factors are typically
dependent, as exemplified by Figure 4.7. To approximate the dependence structure without
introducing an overly complex model, we restrict our attention to multivariate models where
the random vectors Ri i=1 satisfy
Ri = A−1 Xi + b
(4.18)
Moreover, we assume that the random vector X has independent components and the pdf is
a product of one-dimensional density functions:
p x = p1 x1 · · · pn xn
We postpone the discussion about how to choose the linear transformation, i.e., the matrix A
and the vector b, to Section 4.3, after discussing portfolios of derivatives.
To find a stochastic process to model stock prices in continuous time is a more difficult
problem. Returns are often modeled by stochastic differential equations (SDEs). As discussed
in Chapter 1, Brownian motion is the natural continuous-time generalization of a random
walk with normal increments. In this model, the return process is a constant-coefficient SDE,
dR = dt + dWt . Like the normal model for stock prices, geometric Brownian motion
underestimates the likelihood of large returns: It does not have fat tails.
Many different types of continuous-time models have been proposed and studied in the
literature, in particular for pricing derivatives. If the returns are a stationary Markov process,
then, for example, the sequence ri di=1 of historical returns can be used to find an estimate
for the transition density — the time-dependent probability density p r t representing the
density for the return r at time t. Figure 4.8 shows the Parzen estimate for the transition
density for the stock BCE. A good model has a transition density that is close to this estimate.
5
This choice of h may work well in our examples, but it is not a satisfactory solution in general since a fixed
smoothing parameter does not give convergence as the number of samples d → . The estimate converges for
a sequence of smoothing parameters that decrease to zero as the number of samples increases (see [Par61] for
details).
250
CHAPTER 4
. Numerical methods for value-at-risk
FIGURE 4.7 Principal components superimposed on the scatter plot for the returns on BCE and CTRa.
FIGURE 4.8 Parzen estimate for BCE to the transition density.