Thursday, April 19, 2012

Extremes, the Generalized Pareto Distribution, and MLE

In a recent post I discussed some of my work relating to modelling extreme values in various economic data-sets. The work that my colleagues and I have been undertaking focuses on the use of the Generalized Pareto distribution (GPD). The estimation of the parameters of this model facilitates estimates of Value at Risk (VaR) and Expected Shortfall (ES).

There are various ways of estimating the parameters of the GPD but, not surprisingly, maximum likelihood estimation (MLE) is a common choice. However, there are some real traps when it comes to estimating the GPD using MLE, and they're worth knowing about if you're into this sort of thing.

One of the major results of Extreme Value Theory (EVT) relates to the asymptotic distribution of “block maxima” – i.e., the maximum values of blocks, or snapshots, of data from an unknown underlying distribution. The Fisher and Tippet (1928) Theorem tells us that if these maxima are suitably normalized, then they converge in distribution to one of only three forms – Gumbel, Fr├ęchet, or Weibull. This is an extreme value analogue to conventional central limit theory. 

These three distributions can be encompassed by a single one – the Generalized Extreme Value (GEV) family of distributions. If individual data values {X1, X2, ……}, rather than blocks of data, are available then it is inefficient to artificially “block” them and estimate a GEV distribution. The detailed data information can be used more efficiently by modelling the distribution of the values that are “extreme” (i.e., exceed some high threshold value)

Just as the Fisher and Tippet (1928) Theorem tells us that the GEV distribution is appropriate, asymptotically, if we are using "blocked" data, results obtained by Pickands (1975) show that the "exceedances" above a high threshold will follow the GPD (asymptotically). See, also, Coles, (2001), 

I won't get side-tracked here into a discussion of the choice of the threshold value. However, we're faced with an interesting problem: we end up using an estimator for the GPD parameters that is justified on the basis of its asymptotic properties (consistency and asymptotic efficiency in the case of MLE, or consistency in the case of the Method of Moments (MOM) estimator). However, as we're working with observations in the tail of the distribution, we typically have samples (for estimation purposes) that are very small, even though the original series of data may have contained thousands of observations.

So, in practice it's really important to know about the small-sample properties of the estimators that we use for the GPD, and hence for VaR and ES. And that's where the fun begins!

There's a limited  amount of Monte Carlo simulation evidence that addresses this particular point. A good example is the paper by Hosking and Walllis (1987), which covers both MLE and MOM. The latter estimator has some big limitations when it comes to the GPD - the moments of this distribution exist only under certain (unobservable) conditions, and you can't use MOM if the moments don't exist!

Sticking with MLE, there are some different issues that have to be considered. You can find a full discussion in Giles, Feng and Godwin (2011) - working paper version here. Briefly, though, the MLE's of the shape and scale parameters aren't defined in certain situations, because the density takes infinite values for certain combinations of the parameter values. In addition, the usual regularity conditions are satisfied only in certain situations.

The first-order conditions for maximizing the likelihood function are highly non-linear functions of the parameters, and it can be quite tricky to ensure that you've actually obtained a global maximum of the likelihood function, and that at the same time you've avoided the problems noted in the last paragraph. I strongly suspect that there are a few MLE results out there for the GPD that are totally spurious - and the authors won't even be aware of it.

Helen and Ryan and I ran into these issues in the simulation work that we undertook to evaluate the small-sample performance of the MLE (and other estimators) for the GPD. In the end, we were able to resolve all of our difficulties by using the algorithm proposed by Grimshaw (1993). As far as I'm concerned, this is the definitive way to implement the MLE for the parameters of the GPD. We were really lucky, because Scott Grimshaw very kindly supplied us with R code that we were able to incorporate into our own R routines.

All of this came about because Helen and Ryan and I were deriving analytic bias-correction formulae for the MLE's in this problem. Even though the likelihood equations can't be solved analytically, we were able to obtain second-order analytic bias corrections by using the approach of Cox and Snell (1968). I actually have several papers with various co-authors that do this sort of thing for a range of problems - but more on this in another post.

In short, we showed that our bias correction is extremely effective, and it dominates using the bootstrap to bias correct. You can read the details for yourself, here.

The take-away message from this post is a simple one. Be very careful if you're estimating the parameters of the generalized Pareto distribution. If you decide to use the method of moments estimator, you may end up doing something really silly. If, as is quite likely, you use the MLE, then you will need to be very careful indeed. At the very least, take a look at Scott Grimshaw's paper, and you'll quickly see that there's a lot more to the problem that you might have thought.


Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London.

Cox, D. R. and E. J. Snell, 1968. A general definition of residuals. Journal of the Royal Statistical Society, B, 30, 248-275.

Fisher, R. A. and L. H. C. Tippett, 1928. Limiting forms of the frequency distribution of the  largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, 24, 180-190.

Giles, D. E., H. Feng and R. T. Godwin, 2011. Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution. Econometrics Working Paper 1105, Department of Economics, University of Victoria.

Grimshaw, S. D., 1993. Computing maximum likelihood estimates for the generalized Pareto distribution. Technometrics, 35, 185-191.

Hosking, J. R. M. and J. R. Wallis, 1987. Parameter and quantile estimation for the generalized Pareto distribution. Technometrics, 29, 339-349.

Pickands, J., 1975. Statistical inference using extreme order statistics. Annals of Statistics, 3, 119-131.

© 2012, David E. Giles


  1. Thank you, Mr. Giles
    I was trying to estimate ksi and beta of GPD distribution, by computing the log-likelihood function by hand and then using nlminb and optim in R, but did not succeed. I get NA and NaN function evaluations.

    Unfortunately it is impossible to download Grimshaw paper.


    1. Sergey - the POT package in R should be fine for most purposes. It allows ytou to use MLE and a range of other estimators.

      Dave Giles

  2. Thanks Mr. Dave Giles,
    My Name is Ruliff Demsy ( from Indonesia, and currently running a research about EVT analysis with POT
    I have read about your paper according your research about EVT back then in 2007.

    I have also searched through your code section and there is no code available for EVT in EViews or R.
    I am also confued in running MSE and Bias test for selecting best-fit parameter
    Would you mind to let me take a look into your code?

    Thanks a lot Sir!!

    1. Ruliff - As I said in my response to the previous comment - use the POT package in R. There are other R packages too - e.g. evd.

    2. Thanks Mr. Giles
      You also mentioned McNeil & Frey in your paper back then in 2007.
      Generally speaking, I want to reproduce McNeil and Frey methods in 2000 to produce 1 day estimate of VaR EVT by taking advantage of the GARCH modelling.

      Do you have any suggestions or warning regarding the methods that they proposed? because it is kind a hard to bring GARCH and modeling the residual in EVT together. Is there any all-in-one package in R to reproduce their method? Sorry to bother you with many question, I'm quite new to R

      Looking forward for your response.

    3. Ruliff - I don't think you can do it using one package. You can use the fGarch package for the first stage and then PoT for the GPD analysis.