Fundación BBVA Fundación BBVA

Multivariate Statistics

Multivariate Analysis of Ecological Data......| ....Biplots in Practice


This bibliography is not intended to be complete but rather gives the main literature about biplots so that the reader can continue to learn about this method.


Multivariate Analysis of Ecological Data

This appendix lists various bibliographical resources, with short annotations, for further reading:

Study design and data analysis

  • Anderson D.R., K.P. Burnham, W.R. Gould, and S. Cherry. "Concerns about finding effects that are actually spurious". Wildlife Society Bulletin 29 (2001): 311-316.
    The authors discuss the characteristics of research studies that are more exposed to the risk of finding spurious effects, and propose various ways to cope with the problem and avoid spurious results.

  • Beninger P.G., I. Boldina, and S. Katsakenevakis. "Strengthening statistical usage in marine ecology". Journal of Experimental Marine Biology and Ecology 426-427 (2012): 97-108.
    A review of common statistical fallacies in the ecological literature and how to avoid them.

  • Cottingham K.L., J.T. Lennon, and B.L. Brown. "Knowing when to draw the line: designing more informative ecological experiments". Frontiers in Ecology and the Environment 3 (2005): 145-152.
    Review of experimental design options for ANOVA and regression types of ecological studies.

  • Day R.W., and G.P. Quinn. "Comparisons of treatments after an analysis of variance in ecology". Ecological Monographs 59 (1989): 433-463.
    A review of approaches for comparisons of treatments following an ANOVA, including parametric and non-parametric tests, with discussion of pitfalls and solutions when dealing with hypothesis testing under unplanned multiple comparisons.

  • Graham M.H., and M.S. Edwards. "Statistical significance versus fit: estimating the importance of individual factors in ecological analysis of variance". Oikos 93 (2001): 503-515.
    The importance of effect size estimation and the available tools for variance decomposition in the context of complex ANOVA designs are presented clearly and succinctly.

  • Maindonald J. The design of research studies - A statistical perspective. Part I: planning and reporting, 2000, 120 p. https://digitalcollections.anu.edu.au/bitstream/1885/41533/2/GS00_2.pdf
    A very informative introduction to the design of experimental and observational studies.

  • Nakagawa S., and I.C. Cuthill. "Effect size, confidence interval and statistical significance: a practical guide for biologists". Biological Reviews 82 (2007): 591-605.
    An indispensable review of effect size estimation. R code that allows to perform the analyses discussed in the paper is available and can be downloaded at: http://www.bristol.ac.uk/biology/research/staff/cuthill.i

  • Nakagawa S., and R.P. Freckleton. "Missing inaction: the dangers of ignoring missing data". Trends in Ecology and Evolution 23: 592-596.
    The authors warn against deletion of cases with missing observations due to the ensuing reduced statistical power and increased estimation bias, and provide a compact review of how to deal properly with missing data.

  • Parkhurst D.F. "Statistical significance tests: equivalence and reverse tests should reduce misinterpretation". Bioscience 51 (2001): 1051-1057.
    A gentle introduction to the concepts and methods of equivalence and reverse testing to help avoid pitfalls of results interpretation in classical null statistical hypothesis testing.

  • Quinn G., and M. Keough. Experimental Design and Data Analysis for Biologists. Cambridge, UK: Cambridge University Press, 2002.
    Introductory textbook to study design and data analysis. Popular in courses at undergraduate and graduate level in experimental study design and ecological statistics. Useful background material to refresh ideas while reading MAED.

  • Regan H.M., M. Colivan, and M.A. Burgman. "A taxonomy and treatment of uncertainty for ecology and conservation biology". Ecological Applications 12 (2002): 618-628.
    A thorough discussion of sources of uncertainty in ecology and how to deal with them.

  • Scheiner S.M., and J. Gurevitch. Design and Analysis of Ecological Experiments. Oxford: Oxford University Press, 2001.
    A valuable collection of chapters by several authors dealing with study design, statistical modeling, spatial data analysis and meta-analysis.

  • Warton, D.I., and F.K.C. Hui. "The arcsine is asinine: the analysis of proportions in ecology". Ecology 92 (2011): 3-10. A brief review of useful transformations for proportions with some warnings against established traditions when dealing with this type of data.

 

Statistical modelling

  • Bolker B.M. Ecological Models and Data in R. Princeton, NJ: Princeton University Press, 2008.
    A gentle introduction to ecological modeling with clear and well structured coverage of maximum likelihood models and estimation.

  • Bolker B.M., M.E. Brooks, C.J. Clark, S.W. Geange, J.R. Poulsen, M.H. Stevens, and J.S. White. "Generalized linear mixed models: a practical guide for ecology and evolution". Trends in Ecology and Evolution 24 (2009): 127-135.
    The paper reviews how to deal with non-normal data that include random effects with the help of Generalized Linear Mixed Models.

  • Clark J.S. Models for Ecological Data: an Introduction. Princeton, NJ: Princeton University Press, 2007.
    Rigorous and rich introduction to statistical modelling, including approaches to temporal and spatial data. A companion Lab manual provides examples using R.

  • Grueber C.E., S. Nakagawa, R.L. Laws, and I.G. Jamieson. "Multimodel inference in ecology and evolution: challenges and solutions". Journal of Evolutionary Biology 24 (2011): 699-711.
    A comprehensive review of model selection and multimodel inference introducing basic concepts and approaches in a clear and balanced way.

  • Hilborn R., and C. Mangel. The Ecological Detective: Confronting Models with Data. Princeton, NJ: Princeton University Press, 1997.
    The book provides a very good introduction to theoretical and statistical modeling in ecology, explaining concepts, principles and protocols to the uninitiated.

  • Hobbs N.T., and R. Hilborn. "Alternatives to statistical hypothesis testing in ecology: a guide to self teaching". Ecological Applications 16 (2006): 5-19.
    A clear and concise introduction to statistical modeling, maximum likelihood estimation, model selection, Bayesian analysis and meta-analysis.

  • Stephens P.A., S.W. Buskirk, and C.M. Del Rio. "Inference in ecology and evolution". Trends in Ecology and Evolution 22 (2007): 192-197.
    The authors discuss the limitations of traditional null hypothesis significance tests and suggest to rely on more useful approaches, briefly reviewed, such as effect size estimation and model selection.

 

Multivariate analysis

  • Anderson M.J. "Permutation tests for univariate or multivariate analysis of variance and regression". Canadian Journal of Fisheries and Aquatic Sciences 58 (2001): 626-639.
    An informative and concise review of rational and applications of permutation tests in experimental and observational studies with complex designs.

  • Anderson M.J., and T.J. Willis. "Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology". Ecology 84 (2003): 511-525.
    A flexible method for constrained ordination capable of accommodating any distance or dissimilarity matrix.

  • Beals E.W. "Bray-Curtis ordination: an effective strategy for analysis of multivariate ecological data". Advances in Ecological Research 14 (1984): 1-55.
    A good introduction to Bray-Curtis (or polar) ordination, also covering other methods.


  • Borcard D., F. Gillet, and P. Legendre. Numerical Ecology with R. New York: Springer, 2011.
    Compact introduction to multivariate statistics, including multivariate analysis of spatial and temporal data. It is the R companion to Numerical Ecology by Legendre and Legendre, 2012.

  • Gauch H. G. Jr. Multivariate Analysis in Community Ecology. Cambridge, UK: Cambridge University Press, 1982.
    A thorough introduction to gradient analysis, it relates ecological theory and statistical methods clarifying the rational behind the approach.

  • Greenacre M.J. Correspondence Analysis in Practice, 2nd Edition. London: Chapman & Hall / CRC, 2007. Free download of the Spanish edition, published by the BBVA Foundation, 2008, at www.multivariatestatistics.org.
    Comprehensive introduction to correspondence analysis, multiple correspondence analysis, subset correspondence analysis and canonical correspondence analysis.

  • Greenacre M.J. 2010. Biplots in Practice. Madrid: BBVA Foundation, 2010. Free download from www.multivariatestatistics.org.
    A practical introduction to biplots, the concept underlying many multivariate methods that reduce dimensionality in large data sets, and visualize the results.

  • Greenacre M.J. "Correspondence analysis of raw data". Ecology 91 (2010): 958-963.
    Alternative approach to analyzing abundance or biomass matrices where the data are not expressed relative to the row and column margins, in contrast to regular CA and CCA where relative amounts are analysed.

  • Greenacre M.J. "The contributions of rare objects in correspondence analysis". Ecology 94 (2013): 241-249.
    Shows that CA and CCA are not unduly affected by the presence of rare species in an ecological data set, contrary to a popular misconception that these analyses are over-sensitive to species that occur sparsely and in low abundance.

  • Greenacre M.J. "Fuzzy coding in constrained ordinations". Ecology 94 (2013): 280-286.
    The use of fuzzy coding for explanatory variables in the CCA context, demonstrating the benefits and also how to choose the number of fuzzy categories.

  • Greenacre M.J. "Contribution biplots". Journal of Computational and Graphical Statistics 22 (2013): 107-122.
    An alternative scaling of the results of ordination methods such as PCA, CA, LRA, CCA and RDA, where the variables that contribute most to the solution are immediately detectable in the ordination.

  • Greenacre, M.J., and P.J. LEWI. "Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements". Journal of Classification 26 (2009): 29-54.
    Demonstrates clearly the advantage of weighting of variables in log-ratio analysis, as is done in regular CA, as well as the ability of log-ratio biplots to diagnose multiplicative models when variables line up in the ordinations.

  • Jackson D.A. "Stopping rules in principal component analysis: a comparison of heuristical and statistical approaches". Ecology 74 (1993): 2204-2214.
    A short introduction to some available options for evaluating the significance of principal components.

  • James F.C. "Multivariate analysis in ecology and systematics: panacea or pandora's box?". Annual Review in Ecology and Systematics 21 (1990): 129-166.
    An early review of multivariate statistical applications in ecology, at a time when increased computer and software availability made these methods available to all ecologists.

  • Johnson, R.A., and D.W. Wichern. Applied Multivariate Statistical Analysis, 6th edition. New Jersey: Prentice Hall, 2007.
    Widely read book reviewing applications of multivariate methods for biologists, physicists and sociologists.

  • Jongman R.H.G., C.J.F. Ter Braak, and O.F.R VAN TONGEREN. Data Analysis in Community and Landscape Ecology. Cambridge, UK: Cambridge University Press, 1995.
    A balanced treatment of ecological data analysis including extensive treatment of multivariate methods by a group of statisticians and ecologists with a strong quantitative background.

  • Legendre P., and M.J. Anderson. "Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments". Ecological Monographs 69 (1999): 1-24.
    Introduction to the flexible tool of distance-based redundancy analysis by the authors that eventually developed and generalized further this useful numerical approach.

  • Legendre P., and E.D. Gallagher. "Ecologically meaningful transformations for ordination of species data". Oecologia 129 (2001): 271-280.
    Review of effective transformation for species data that allow to extract relevant information when subject to ordination analysis.

  • Legendre P., and L. Legendre. Numerical Ecology, 3rd English edition. Amsterdam: Elsevier, 2012, 853 p.
    Classic introduction to statistics for ecologists with very good coverage of ecological data and multivariate methods. It has a R companion [Borcard et al. 2011].

  • Leps J., and P. Smilauer. Multivariate Analysis of Ecological Data using Canoco. Cambridge UK: Cambridge University Press, 2003.
    Much more than a handbook for Canoco applications, the book is an informative review of multivariate methods with many inspiring ecological examples.

  • Maindonald J., and J. Braun. Data Analysis and Graphics with R. An Example Based Approach, 3rd edition. Cambridge, UK: Cambridge University Press, 2011.
    The book provides a comprehensive overview of data analysis including parametric and non-parametric methods, statistical modeling and multivariate methods with R examples.

  • Manly B.F.J. Multivariate Statistical Methods: a Primer, 3rd edition. London: Chapman and Hall, 2004.
    A gentle introduction to multivariate methods blessed by the clear expository style of a distinguished and successful author.

  • Manly B.F.J. Randomization, Bootstrap and Monte Carlo Methods in Biology, 3rd edition. London: Chapman and Hall, 2007.
    The book provides a comprehensive overview of resampling and permutation methods with many relevant biological example applications.

  • McGarigal K., S. Cushman, and S. Stafford. Multivariate Statistics for Wildlife and Ecology Research. New York: Springer, 2000.
    Introduction to multivariate statics in ecology and wildlife management, focusing on practical applications.

  • Palmer M.W. "Putting things in even better order: the advantages of canonical correspondence analysis". Ecology 74 (1993): 2215-2230.
    A clear exposition of the advantages of Canonical Correspondence Analysis (CCA) applied to ecological data.

  • Peres-Neto P.R. "How well do multivariate data sets match? The advantages of a procrustean superimposition approach over the Mantel test". Oecologia 129 (2001): 169-178.
    The use and value of procrustean superimposition to compare (match) multivariate data sets.

  • Peres-Neto P.R., and D.A. Jackson. "The importance of scaling of multivariate analysis in ecological studies". Ecoscience 8 (2001): 522-526.
    A clear survey of the role of scaling in multivariate ecological data analysis, making use of intuitive graphical presentations to stress the important concepts and their implications.

  • Peres-Neto P.R., D.A. Jackson, and K.M. Somers. "Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis". Ecology 84 (2003): 2347-2363.
    The authors compare a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power.

  • Pielou, E.C. The Interpretation of Ecological Data: a Primer on Classification and Ordination. New York: John Wiley & Sons, Inc., 1984.
    An early review of ecological data analysis linking ecological and statistical concepts to ease the interpretation of results.

  • Ter Braak, C.J.F. "Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis". Ecology 67 (1986): 1167-1179.
    Another citation classic for an author that has made important contributions to the field, ensuring a wide availability of the new methods via software development (Canoco, in collaboration with Smilauer - see Leps and Smilauer 2003).

  • Ter Braak, C.J.F., and I.C. Prentice. "A theory of gradient analysis". Advances in Ecological Research 18 (1988): 271-313.
    A classic introduction to gradient analysis theory and its ecological applications.

  • Yee T.W. "Constrained additive ordination". Ecology 87 (2006): 203-213.
    The paper introduces Constrained Additive Ordination (CAO) models, described as "loosely speaking, [.] Generalized Additive Models fitted to a very small number of latent variables". The paper provides the R code to implement the CAO methodology with some clear example applications.

  • Zuur A.F., E.N. Ieno, and C.S. Elphick. "A protocol for data exploration to avoid common statistical problems". Methods in Ecology and Evolution 1 (2010): 3-14.
    The authors provide a protocol for data exploration, discussing "current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables; and provide advice on how to address these problems when they arise". The paper also provides R code to implement the protocol.

  • Zuur A.F., E.N. Ieno, N.J. Walker, A.A. Saveliev, and G.M. Smith. Mixed Effects Models and Extensions in Ecology with R. New York: Springer, 2009.
    A very popular introduction to mixed effects models rich with relevant ecological examples based on the kind of 'messy' data ecologists need to cope with.
  •  


 

Biplots in Practice

The term "biplot" originates in Ruben Gabriel's Biometrika paper in 1971:
Gabriel, K.R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika 58, 453-467.
This paper, which at the time of writing has 1008 citations on Google Scholar and 682 on the Science Citation Index (ISI Web of Knowledge), is widely regarded as the origin of the idea.


A less cited paper by Ruben Gabriel, but nevertheless one of my favourite ones on the biplot, appeared the following year in the Journal of Applied Meteorology (Ruben was also well-known for his work as a statistician in weather modification projects):
Gabriel, K.R. (1972). Analysis of meteorological data by means of canonical decompositions and biplots. Journal of Applied Meteorology 11, 1071-1077.


Another gem is by Dan Bradu and Ruben Gabriel in Technometrics in 1978:
Bradu, D. and Gabriel, K.R. (1972). The biplot as a diagnostic tool for models of two-way tables. Technometrics 20, 47-68.


Other authors also had the idea of adding variables to an existing configuration of points to make joint displays, although they did not call them biplots. For example, Doug Carroll's vector model for preferences is a biplot:
Carroll, J.D. (1972). Individual differences and multidimensional scaling. In R.N. Shepard, A.K. Romney, and S.B. Nerlove, eds, Multidimensional Scaling: Theory and Applications in the Behavioral Sciences (Vol. 1), 105-155. Seminar Press, New York.


Only one book exists to date specifically on the topic of biplots, by John Gower and David Hand:
Gower, J.C. and Hand, D.J (1996). Biplots. Chapman & Hall, London, UK.


This book is very complete, both on linear and nonlinear biplots, giving a rigorous theoretical treatment of the subject. Another book by John Gower is with coauthors Sugnet Gardner-Lubbe and Niel le Roux:
Gower, J.C., Gardner-Lubbe, S. and le Roux, N. (2010). Understanding Biplots. Wiley, Chichester, UK.


As far as the vast literature on the singular value decomposition (SVD) is concerned, I mention only two sources, by the author of one of the landmark algorithms for the SVD, Gene Golub in 1971, which seems to be an important year for the biplot:
Golub, G.H. and Reinsch, C. (1971). The singular value decomposition and least squares solutions. In: J.H. Wilkinson and C. Reinsch, eds, Handbook for Automatic Computation, 134-151. Springer-Verlag, Berlin.


And the other a classic book by Paul Green and Doug Carroll, originally published in 1976, which was the first time I saw the geometric interpretation of the SVD (called "basic structure" by these authors)-this book is invaluable as a practical introduction to matrix and vector geometry in multivariate analysis:
Green, P.E. and Carroll, J.D. (1997). Mathematical Tools for Applied Multivariate Analysis, Revised Edition. Academic Press, New York.


Most books or articles that treat the methods presented in this book will have a section or chapter on biplots and their interpretation in the context of that method. This is just a tiny selection of some of the literature that can be consulted, and by no means the primary references:

Principal component analysis

  • Joliffe, I.T. (2002). Principal Component Analysis (2nd edition). Springer, New York.

Log-ratio analysis (unweighted form)

  • Aitchison, J. and Greenacre, M. (2002). Biplots of compositional data. Applied Statistics 51, 375-392.

Log-ratio analysis (weighted form)

  • Greenacre, M. and Lewi, P.J. (2009). Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio scale measurements. Journal of Classification 26, 29-54.

Correspondence analysis


Multiple correspondence analysis

  • Greenacre, M. and Blasius, J., eds (2006). (eds), Multiple Correspondence Analysis and Related Methods, Chapman & Hall/CRC Press, London.
  • Michalidis, G. and de Leeuw, J. (1998). The Gifi system for descriptive multivariate analysis. Statistical Science 13, 307-336.

Discriminant analysis/centroid biplots

  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning (2nd edition). Springer, New York. This book may be freely downloaded at www-stat.stanford.edu

Constrained biplots

  • Legendre, P. and Legendre, L. (1998). Numerical Ecology (2nd edition). Elsevier, Amsterdam.


Finally we give some resources on the internet, on R packages relevant to this book, Biplots in Practice (in alphabetic order of package names).