Hypervolume Functions
Hypervolume is a method originally proposed by Hutchinson (1957). It provide methods to calculate the volume of a niche for a given specice and thus the ncihe overlap between two species. It helps to infer how niche breath of species has contribute to the co-occurance of different species in the same location at the same time. The hypervolume functions provide by this package are adapted from the R package MVNH
(https://github.com/lvmuyang/MVNH).
An Overview
The MVNH framework provides parametric measures for analyzing ecological niches using the multivariate normal distribution model. This framework offers powerful tools for quantifying and comparing the size and dissimilarity of species' niches, with each measure being partitionable into biologically meaningful components.
The framework models a species' niche as a multivariate normal distribution in environmental space, where:
- Each environmental variable represents one dimension of the niche.
- The mean vector represents the niche optimum.
- The covariance matrix represents the niche breadth and shape.
There are four hypervolume functions in this package:
MVNH_det
calculates the total hypervolume of a species' niche based on the determinant of the covariance matrix (generalized variance). This measure can be partitioned into:MVNH_dissimilarity
calculates the Bhattacharyya distance between two species' niches, providing a comprehensive measure of niche differentiation.average_MVNH_det
calculates the mean hypervolume across multiple species in a community, providing an overall measure of niche size at the community level.average_MVNH_dissimilarity
calculates the mean Bhattacharyya distance between all unique pairs of species in a community, providing a measure of overall niche differentiation.
Practical Considerations
Statistical assumptions: This framework relies on multivariate normal distribution of environmental data.
- When encountering skewed variables, apply appropriate transformations to normalize distributions.
- Be aware that as you increase the number of variables, you face greater challenges with:
- Variable interdependence (collinearity) which can drive determinant values toward zero
- Potential violations of the multivariate normality assumption
- Address variable interdependence through either:
- Thoughtful pre-selection of ecologically meaningful variables with direct influence on species distributions
- Application of dimension reduction methods such as PCA (principal component analysis)
- Important note: PCA creates orthogonal axes, which forces the correlation component to 1.0, eliminating correlation structure information.
- For datasets containing multiple distinct groups of related environmental variables (such as climate factors, soil properties, or topographic features), consider using generalized canonical variables to identify the most representative variables within each natural category while preserving the ecological relationships between different variable groups.
Measurement standardization: Before analysis, standardize all environmental variables to comparable scales to prevent variables with larger numerical ranges from disproportionately influencing results.
The Functions
MetaCommunityMetrics.MVNH_det
— FunctionMVNH_det(data::DataFrame; var_names::Vector{String}=String[]) -> DataFrame
Calculate the niche hypervolume of a species based on environmental variables.
Arguments
data::DataFrame
: DataFrame where each row represents an observation of a species (presence only, need to filter out absences) and columns represent environmental variables.var_names::Vector{String}=String[]
: Optional vector specifying names for the environmental variables. If empty, default names "variable1", "variable2", etc. will be used.
Returns
DataFrame
: A DataFrame containing:Correlation
: The correlation component (calculated as det(COV)/prod(variances))- One column for each environmental variable showing its variance
total
: The total hypervolume (calculated as the determinant of the covariance matrix)
Details
- Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
- Variables should be normalized before using this function to avoid bias from different scales
- The function computes the covariance matrix of the input data, extracts variances, and calculates the determinant
- This function is a Julia implementation of the
MVNH_det
function from the R packageMVNH
(GPL-3) - Original package and documentation: https://github.com/lvmuyang/MVNH
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics, UnicodePlots
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> data = @pipe df |>
filter(row -> row[:Presence] > 0, _) |>
filter(row -> row[:Species] == "BA", _) |>
select(_, [:normalized_temperature, :normalized_precipitation])
143×2 DataFrame
Row │ normalized_temperature normalized_precipitation
│ Float64 Float64
─────┼──────────────────────────────────────────────────
1 │ -0.37813 0.13009
2 │ -0.00856861 0.237183
3 │ -0.664638 -0.772406
4 │ 2.05431 0.451875
5 │ -0.39968 -0.719024
⋮ │ ⋮ ⋮
139 │ 1.85574 -0.583737
140 │ 0.0953878 1.21099
141 │ -1.02227 1.33501
142 │ -0.400246 -0.438892
143 │ -0.817817 0.418038
133 rows omitted
julia> result = MVNH_det(data; var_names=["Temperature", "Precipitation"])
1×4 DataFrame
Row │ total correlation Temperature Precipitation
│ Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────
1 │ 1.15268 0.999732 0.962495 1.19792
MetaCommunityMetrics.MVNH_dissimilarity
— FunctionMVNH_dissimilarity(data_1::DataFrame, data_2::DataFrame; var_names::Vector{String}=String[]) -> DataFrame
Calculate niche dissimilarity between two species based on their environmental variables, using the Bhattacharyya distance and its components.
Arguments
data_1::DataFrame
: DataFrame for the first species, where each row represents an observation (presence only, need to filter out absences) and columns represent environmental variables.data_2::DataFrame
: DataFrame for the second species, with the same structure asdata_1
.var_names::Vector{String}=String[]
: Optional vector specifying names for the environmental variables. If empty, default names "variable1", "variable2", etc. will be used.
Returns
DataFrame
: A dataframe containing three metrics and their components:"Bhattacharyya_distance"
: The total Bhattacharyya distance and its components"Mahalanobis_distance"
: The Mahalanobis component of the Bhattacharyya distance"Determinant_ratio"
: The determinant ratio component of the Bhattacharyya distance
Each metric contains:
total
: The total value of the respective distance measurecorrelation
: The correlation component of the distance measure- One value for each environmental variable showing its contribution to the distance
Details
- The Bhattacharyya distance is calculated as the sum of two components:
- Mahalanobis component: (1/8) × (μ₁-μ₂)ᵀ × (S₁+S₂)/2⁻¹ × (μ₁-μ₂)
- Determinant ratio component: (1/2) × log(det((S₁+S₂)/2) / sqrt(det(S₁) × det(S₂)))
- Each component is further decomposed into individual variable contributions and correlation effects
- Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
- Variables should be normalized before using this function to avoid bias from different scales
- This function is a Julia implementation inspired by the
MVNH
R package (GPL-3) - Original package and documentation: https://github.com/lvmuyang/MVNH
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> data_1 = @pipe df |>
filter(row -> row[:Presence] > 0, _) |>
filter(row -> row[:Species] == "BA", _) |>
select(_, [:normalized_temperature, :normalized_precipitation])
143×2 DataFrame
Row │ normalized_temperature normalized_precipitation
│ Float64 Float64
─────┼──────────────────────────────────────────────────
1 │ -0.37813 0.13009
2 │ -0.00856861 0.237183
3 │ -0.664638 -0.772406
4 │ 2.05431 0.451875
5 │ -0.39968 -0.719024
⋮ │ ⋮ ⋮
139 │ 1.85574 -0.583737
140 │ 0.0953878 1.21099
141 │ -1.02227 1.33501
142 │ -0.400246 -0.438892
143 │ -0.817817 0.418038
133 rows omitted
julia> data_2 = @pipe df |>
filter(row -> row[:Presence] > 0, _) |>
filter(row -> row[:Species] == "SH", _) |>
select(_, [:normalized_temperature, :normalized_precipitation])
58×2 DataFrame
Row │ normalized_temperature normalized_precipitation
│ Float64 Float64
─────┼──────────────────────────────────────────────────
1 │ -0.229864 1.84371
2 │ 0.460218 -0.624328
3 │ -1.03283 -1.16451
4 │ 0.675006 -0.120586
5 │ 0.40729 0.20034
⋮ │ ⋮ ⋮
54 │ -0.870299 -0.235392
55 │ 0.504555 -1.50887
56 │ 2.03065 -0.740789
57 │ -0.174396 0.448461
58 │ 0.547169 1.03257
48 rows omitted
julia> result = MVNH_dissimilarity(data_1, data_2; var_names=["Temperature", "Precipitation"])
3×5 DataFrame
Row │ metric total correlation Temperature Precipitation
│ String Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────────────────────────────────
1 │ Bhattacharyya_distance 0.00980771 0.00015205 0.00388058 0.00577508
2 │ Mahalanobis_distance 0.00664862 5.06232e-6 0.00234902 0.00429454
3 │ Determinant_ratio 0.00315908 0.000146988 0.00153156 0.00148054
MetaCommunityMetrics.average_MVNH_det
— Functionaverage_MVNH_det(data::DataFrame, presence_absence::Vector{Int}, species::AbstractVector;
var_names::Vector{String}=String[]) -> Float64
Calculate the average niche hypervolume across multiple species in a community dataset.
Arguments
data::DataFrame
: DataFrame containing environmental variables where each row represents an observation.presence_absence::Vector{Int}
: Vector indicating presence (1) or absence (0) for each observation indata
.species::AbstractVector
: Vector containing species identifiers corresponding to each observation indata
, which must be a vector of strings.var_names::Vector{String}=String[]
: Optional vector specifying names for the environmental variables. If empty, default names will be used.
Returns
Float64
: The average hypervolume across all species with presence data.
Details
- For each unique species, the function:
- Filters observations where the species is present (presence_absence > 0)
- Calculates the niche hypervolume using the
MVNH_det
function - Extracts the total hypervolume value
- The function then computes the mean of all individual species hypervolumes
- Species with no presence data are skipped in the calculation
- Environmental variables are assumed to follow a multivariate normal distribution, otherwise transformation to normal distribution is recommended before using this function.
- Variables should be normalized before using this function to avoid bias from different scales
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> data = @pipe df |>
select(_, [:normalized_temperature, :normalized_precipitation])
53352×2 DataFrame
Row │ normalized_temperature normalized_precipitation
│ Float64 Float64
───────┼──────────────────────────────────────────────────
1 │ 0.829467 -1.4024
2 │ -1.12294 -0.0519895
3 │ -0.409808 -0.803663
4 │ -1.35913 -0.646369
5 │ 0.0822 1.09485
⋮ │ ⋮ ⋮
53348 │ -0.571565 -0.836345
53349 │ -2.33729 -0.398522
53350 │ 0.547169 1.03257
53351 │ -0.815015 0.95971
53352 │ 0.48949 -1.59416
53342 rows omitted
julia> result = average_MVNH_det(data, Vector{Int64}(df.Presence), df.Species; var_names=["Temperature", "Precipitation"])
1.2103765096417536
MetaCommunityMetrics.average_MVNH_dissimilarity
— Functionaverage_MVNH_dissimilarity(data::DataFrame, presence_absence::Vector{Int}, species::AbstractVector;
var_names::Vector{String}=String[]) -> Float64
Calculate the average niche dissimilarity between all unique pairs of species in a community dataset using Bhattacharyya distance.
Arguments
data::DataFrame
: DataFrame containing environmental variables where each row represents an observation.presence_absence::Vector{Int}
: Vector indicating presence (1) or absence (0) for each observation indata
.species::AbstractVector
: Vector containing species identifiers corresponding to each observation indata
, which must be a vector of strings.var_names::Vector{String}=String[]
: Optional vector specifying names for the environmental variables. If empty, default names will be used.
Returns
Float64
: The average Bhattacharyya distance across all unique species pairs.
Details
- For each unique pair of species, the function:
- Filters observations where each species is present (presence_absence > 0)
- Calculates the niche dissimilarity using the
MVNH_dissimilarity
function - Extracts the total Bhattacharyya distance value
- The function then computes the mean of all pairwise Bhattacharyya distances
- Species pairs where either species has no presence data are skipped
- Each species pair is processed only once (i.e., sp1-sp2 is calculated, but sp2-sp1 is skipped)
- Environmental variables are assumed to follow a multivariate normal distribution
- Variables should be normalized before using this function to avoid bias from different scales
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames, Statistics
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> data = @pipe df |>
select(_, [:normalized_temperature, :normalized_precipitation])
53352×2 DataFrame
Row │ normalized_temperature normalized_precipitation
│ Float64 Float64
───────┼──────────────────────────────────────────────────
1 │ 0.829467 -1.4024
2 │ -1.12294 -0.0519895
3 │ -0.409808 -0.803663
4 │ -1.35913 -0.646369
5 │ 0.0822 1.09485
⋮ │ ⋮ ⋮
53348 │ -0.571565 -0.836345
53349 │ -2.33729 -0.398522
53350 │ 0.547169 1.03257
53351 │ -0.815015 0.95971
53352 │ 0.48949 -1.59416
53342 rows omitted
julia> result = average_MVNH_dissimilarity(data, Vector{Int64}(df.Presence), df.Species; var_names=["Temperature", "Precipitation"])
0.03059942936454443
References
- Lu, Muyang, Kevin Winner, and Walter Jetz. A unifying framework for quantifying and comparing n‐dimensional hypervolumes. Methods in Ecology and Evolution 12.10, 1953-1968 (2021). https://doi.org/10.1111/2041-210X.13665