Hypervolume Functions

Hypervolume is a method originally proposed by Hutchinson (1957). It provide methods to calculate the volume of a niche for a given specice and thus the ncihe overlap between two species. It helps to infer how niche breath of species has contribute to the co-occurance of different species in the same location at the same time. The hypervolume functions provide by this package are adapted from the R package MVNH (https://github.com/lvmuyang/MVNH).

An Overview

The MVNH framework provides parametric measures for analyzing ecological niches using the multivariate normal distribution model. This framework offers powerful tools for quantifying and comparing the size and dissimilarity of species' niches, with each measure being partitionable into biologically meaningful components.

The framework models a species' niche as a multivariate normal distribution in environmental space, where:

  • Each environmental variable represents one dimension of the niche.
  • The mean vector represents the niche optimum.
  • The covariance matrix represents the niche breadth and shape.

There are four hypervolume functions in this package:

  • MVNH_det calculates the total hypervolume of a species' niche based on the determinant of the covariance matrix (generalized variance). This measure can be partitioned into:
  • MVNH_dissimilarity calculates the Bhattacharyya distance between two species' niches, providing a comprehensive measure of niche differentiation.
  • average_MVNH_det calculates the mean hypervolume across multiple species in a community, providing an overall measure of niche size at the community level.
  • average_MVNH_dissimilarity calculates the mean Bhattacharyya distance between all unique pairs of species in a community, providing a measure of overall niche differentiation.

Practical Considerations

  • Statistical assumptions: This framework relies on multivariate normal distribution of environmental data.

    • When encountering skewed variables, apply appropriate transformations to normalize distributions.
    • Be aware that as you increase the number of variables, you face greater challenges with:
      • Variable interdependence (collinearity) which can drive determinant values toward zero
      • Potential violations of the multivariate normality assumption
    • Address variable interdependence through either:
      • Thoughtful pre-selection of ecologically meaningful variables with direct influence on species distributions
      • Application of dimension reduction methods such as PCA (principal component analysis)
        • Important note: PCA creates orthogonal axes, which forces the correlation component to 1.0, eliminating correlation structure information.
        • For datasets containing multiple distinct groups of related environmental variables (such as climate factors, soil properties, or topographic features), consider using generalized canonical variables to identify the most representative variables within each natural category while preserving the ecological relationships between different variable groups.
  • Measurement standardization: Before analysis, standardize all environmental variables to comparable scales to prevent variables with larger numerical ranges from disproportionately influencing results.

The Functions

MetaCommunityMetrics.MVNH_detFunction
MVNH_det(data::DataFrame; var_names::Vector{String}=String[]) -> DataFrame

Calculate the niche hypervolume of a species based on environmental variables.

Arguments

  • data::DataFrame: DataFrame where each row represents an observation of a species (presence only, need to filter out absences) and columns represent environmental variables.
  • var_names::Vector{String}=String[]: Optional vector specifying names for the environmental variables. If empty, default names "variable1", "variable2", etc. will be used.

Returns

  • DataFrame: A DataFrame containing:
    • Correlation: The correlation component (calculated as det(COV)/prod(variances))
    • One column for each environmental variable showing its variance
    • total: The total hypervolume (calculated as the determinant of the covariance matrix)

Details

  • Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
  • Variables should be normalized before using this function to avoid bias from different scales
  • The function computes the covariance matrix of the input data, extracts variances, and calculates the determinant
  • This function is a Julia implementation of the MVNH_det function from the R package MVNH (GPL-3)
  • Original package and documentation: https://github.com/lvmuyang/MVNH

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics, UnicodePlots

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted

julia> data = @pipe df |> 
            filter(row -> row[:Presence] > 0, _) |>
            filter(row -> row[:Species] == "BA", _) |>
            select(_, [:normalized_temperature, :normalized_precipitation])
143×2 DataFrame
 Row │ normalized_temperature  normalized_precipitation 
     │ Float64                 Float64                  
─────┼──────────────────────────────────────────────────
   1 │            -0.37813                    0.13009
   2 │            -0.00856861                 0.237183
   3 │            -0.664638                  -0.772406
   4 │             2.05431                    0.451875
   5 │            -0.39968                   -0.719024
  ⋮  │           ⋮                        ⋮
 139 │             1.85574                   -0.583737
 140 │             0.0953878                  1.21099
 141 │            -1.02227                    1.33501
 142 │            -0.400246                  -0.438892
 143 │            -0.817817                   0.418038
                                        133 rows omitted

julia> result = MVNH_det(data; var_names=["Temperature", "Precipitation"])
1×4 DataFrame
 Row │ total    correlation  Temperature  Precipitation 
     │ Float64  Float64      Float64      Float64       
─────┼──────────────────────────────────────────────────
   1 │ 1.15268     0.999732     0.962495        1.19792
source
MetaCommunityMetrics.MVNH_dissimilarityFunction
MVNH_dissimilarity(data_1::DataFrame, data_2::DataFrame; var_names::Vector{String}=String[]) -> DataFrame

Calculate niche dissimilarity between two species based on their environmental variables, using the Bhattacharyya distance and its components.

Arguments

  • data_1::DataFrame: DataFrame for the first species, where each row represents an observation (presence only, need to filter out absences) and columns represent environmental variables.
  • data_2::DataFrame: DataFrame for the second species, with the same structure as data_1.
  • var_names::Vector{String}=String[]: Optional vector specifying names for the environmental variables. If empty, default names "variable1", "variable2", etc. will be used.

Returns

  • DataFrame: A dataframe containing three metrics and their components:

    • "Bhattacharyya_distance": The total Bhattacharyya distance and its components
    • "Mahalanobis_distance": The Mahalanobis component of the Bhattacharyya distance
    • "Determinant_ratio": The determinant ratio component of the Bhattacharyya distance

    Each metric contains:

    • total: The total value of the respective distance measure
    • correlation: The correlation component of the distance measure
    • One value for each environmental variable showing its contribution to the distance

Details

  • The Bhattacharyya distance is calculated as the sum of two components:
    1. Mahalanobis component: (1/8) × (μ₁-μ₂)ᵀ × (S₁+S₂)/2⁻¹ × (μ₁-μ₂)
    2. Determinant ratio component: (1/2) × log(det((S₁+S₂)/2) / sqrt(det(S₁) × det(S₂)))
  • Each component is further decomposed into individual variable contributions and correlation effects
  • Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
  • Variables should be normalized before using this function to avoid bias from different scales
  • This function is a Julia implementation inspired by the MVNH R package (GPL-3)
  • Original package and documentation: https://github.com/lvmuyang/MVNH

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted


julia> data_1 = @pipe df |> 
            filter(row -> row[:Presence] > 0, _) |>
            filter(row -> row[:Species] == "BA", _) |>
            select(_, [:normalized_temperature, :normalized_precipitation])

143×2 DataFrame
 Row │ normalized_temperature  normalized_precipitation 
     │ Float64                 Float64                  
─────┼──────────────────────────────────────────────────
   1 │            -0.37813                    0.13009
   2 │            -0.00856861                 0.237183
   3 │            -0.664638                  -0.772406
   4 │             2.05431                    0.451875
   5 │            -0.39968                   -0.719024
  ⋮  │           ⋮                        ⋮
 139 │             1.85574                   -0.583737
 140 │             0.0953878                  1.21099
 141 │            -1.02227                    1.33501
 142 │            -0.400246                  -0.438892
 143 │            -0.817817                   0.418038
                                        133 rows omitted

julia> data_2 = @pipe df |> 
            filter(row -> row[:Presence] > 0, _) |>
            filter(row -> row[:Species] == "SH", _) |>
            select(_, [:normalized_temperature, :normalized_precipitation])
58×2 DataFrame
 Row │ normalized_temperature  normalized_precipitation 
     │ Float64                 Float64                  
─────┼──────────────────────────────────────────────────
   1 │              -0.229864                  1.84371
   2 │               0.460218                 -0.624328
   3 │              -1.03283                  -1.16451
   4 │               0.675006                 -0.120586
   5 │               0.40729                   0.20034
  ⋮  │           ⋮                        ⋮
  54 │              -0.870299                 -0.235392
  55 │               0.504555                 -1.50887
  56 │               2.03065                  -0.740789
  57 │              -0.174396                  0.448461
  58 │               0.547169                  1.03257
                                         48 rows omitted
                   
julia> result = MVNH_dissimilarity(data_1, data_2; var_names=["Temperature", "Precipitation"])
3×5 DataFrame
 Row │ metric                  total       correlation  Temperature  Precipitation 
     │ String                  Float64     Float64      Float64      Float64       
─────┼─────────────────────────────────────────────────────────────────────────────
   1 │ Bhattacharyya_distance  0.00980771  0.00015205    0.00388058     0.00577508
   2 │ Mahalanobis_distance    0.00664862  5.06232e-6    0.00234902     0.00429454
   3 │ Determinant_ratio       0.00315908  0.000146988   0.00153156     0.00148054
source
MetaCommunityMetrics.average_MVNH_detFunction
average_MVNH_det(data::DataFrame, presence_absence::Vector{Int}, species::AbstractVector; 
                 var_names::Vector{String}=String[]) -> Float64

Calculate the average niche hypervolume across multiple species in a community dataset.

Arguments

  • data::DataFrame: DataFrame containing environmental variables where each row represents an observation.
  • presence_absence::Vector{Int}: Vector indicating presence (1) or absence (0) for each observation in data.
  • species::AbstractVector: Vector containing species identifiers corresponding to each observation in data, which must be a vector of strings.
  • var_names::Vector{String}=String[]: Optional vector specifying names for the environmental variables. If empty, default names will be used.

Returns

  • Float64: The average hypervolume across all species with presence data.

Details

  • For each unique species, the function:
    1. Filters observations where the species is present (presence_absence > 0)
    2. Calculates the niche hypervolume using the MVNH_det function
    3. Extracts the total hypervolume value
  • The function then computes the mean of all individual species hypervolumes
  • Species with no presence data are skipped in the calculation
  • Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
  • Variables should be normalized before using this function to avoid bias from different scales

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted

julia> data = @pipe df |> 
           select(_, [:normalized_temperature, :normalized_precipitation])
           
53352×2 DataFrame
   Row │ normalized_temperature  normalized_precipitation 
       │ Float64                 Float64                  
───────┼──────────────────────────────────────────────────
     1 │               0.829467              -1.4024
     2 │              -1.12294               -0.0519895
     3 │              -0.409808              -0.803663
     4 │              -1.35913               -0.646369
     5 │               0.0822                 1.09485
   ⋮   │           ⋮                        ⋮
 53348 │              -0.571565              -0.836345
 53349 │              -2.33729               -0.398522
 53350 │               0.547169               1.03257
 53351 │              -0.815015               0.95971
 53352 │               0.48949               -1.59416
                                        53342 rows omitted

julia> result = average_MVNH_det(data, Vector{Int64}(df.Presence), df.Species; var_names=["Temperature", "Precipitation"])
1.2103765096417536
source
MetaCommunityMetrics.average_MVNH_dissimilarityFunction
average_MVNH_dissimilarity(data::DataFrame, presence_absence::Vector{Int}, species::AbstractVector; 
                          var_names::Vector{String}=String[]) -> Float64

Calculate the average niche dissimilarity between all unique pairs of species in a community dataset using Bhattacharyya distance.

Arguments

  • data::DataFrame: DataFrame containing environmental variables where each row represents an observation.
  • presence_absence::Vector{Int}: Vector indicating presence (1) or absence (0) for each observation in data.
  • species::AbstractVector: Vector containing species identifiers corresponding to each observation in data, which must be a vector of strings.
  • var_names::Vector{String}=String[]: Optional vector specifying names for the environmental variables. If empty, default names will be used.

Returns

  • Float64: The average Bhattacharyya distance across all unique species pairs.

Details

  • For each unique pair of species, the function:
    1. Filters observations where each species is present (presence_absence > 0)
    2. Calculates the niche dissimilarity using the MVNH_dissimilarity function
    3. Extracts the total Bhattacharyya distance value
  • The function then computes the mean of all pairwise Bhattacharyya distances
  • Species pairs where either species has no presence data are skipped
  • Each species pair is processed only once (i.e., sp1-sp2 is calculated, but sp2-sp1 is skipped)
  • Environmental variables are assumed to follow a multivariate normal distribution
  • Variables should be normalized before using this function to avoid bias from different scales

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted

julia> data = @pipe df |> 
           select(_, [:normalized_temperature, :normalized_precipitation])    
53352×2 DataFrame
   Row │ normalized_temperature  normalized_precipitation 
       │ Float64                 Float64                  
───────┼──────────────────────────────────────────────────
     1 │               0.829467              -1.4024
     2 │              -1.12294               -0.0519895
     3 │              -0.409808              -0.803663
     4 │              -1.35913               -0.646369
     5 │               0.0822                 1.09485
   ⋮   │           ⋮                        ⋮
 53348 │              -0.571565              -0.836345
 53349 │              -2.33729               -0.398522
 53350 │               0.547169               1.03257
 53351 │              -0.815015               0.95971
 53352 │               0.48949               -1.59416
                                        53342 rows omitted

julia> result = average_MVNH_dissimilarity(data, Vector{Int64}(df.Presence), df.Species; var_names=["Temperature", "Precipitation"])     
0.03059942936454443
source

References

  • Lu, Muyang, Kevin Winner, and Walter Jetz. A unifying framework for quantifying and comparing n‐dimensional hypervolumes. Methods in Ecology and Evolution 12.10, 1953-1968 (2021). https://doi.org/10.1111/2041-210X.13665