Dispersal-Niche Continuum Index (DNCI) Functions
The Dispersal-Niche Continuum Index (DNCI) functions in MetaCommunityMetrics
quantifies the balance between dispersal and niche processes within a metacommunity, providing insight into community structure and the relative influence of these two key ecological drivers. The function DNCI_multigroup
in this package is adapted from the R package DNCImper
: Assembly process identification based on SIMPER analysis. These methods, originally developed by Clarke(1993) and later refined by Gibert & Escarguel(2019) and Vilmi et al.(2021), offer powerful tools for identifying the processes underlying species assembly in metacommunities.
Background
The DNCI functions is built around the Per-SIMPER and DNCI analyses. PerSIMPER, based on the Similarity Percentage (SIMPER) analysis developed by Clarke (1993), assesses the contribution of individual taxa to overall dissimilarity (OAD) between groups of assemblages. PerSIMPER enhances this by comparing empirical SIMPER plots with randomized plots generated through matrix permutation, which helps identify whether niche, dispersal, or both processes are driving community assembly.
The DNCI (Dispersal-Niche Continuum Index) further extends this approach by transforming the qualitative results of PerSIMPER into a quantitative index, providing a straightforward measure of the influence of niche and dispersal processes on community structure.
Functionality Overview
The DNCI functions in MetaCommunityMetrics
allow you to analyze the processes driving species assembly within your dataset. By comparing empirical data with randomized permutations, one can determine the extent to which niche and dispersal processes influence the structure of metacommunities. Before calculating the DNCI, groupings of sites (clusters) are required, as the DNCI relies on analyzing community composition across spatial groups. This package provides a function to perform the necessary clustering, which is not available in the equivalent R package. When the DNCI value is significantly below zero, dispersal processes are likely the dominant drivers of community composition. In contrast, a DNCI value significantly above zero suggests that niche processes play a primary role in shaping community composition. If the DNCI value is not significantly different from zero, it indicates that dispersal and niche processes contribute equally to variations in community composition.
The Functions
MetaCommunityMetrics.create_clusters
— Functioncreate_clusters(time::AbstractVector, latitude::Vector{Float64}, longitude::Vector{Float64}, site::AbstractVector, species::AbstractVector, presence::AbstractVector) -> Dict{Int, DataFrame}
This function creates clusters (groupings of sites) for each unique time step in a dataset which can then used for calculating DNCI. Only presnece-absence data can be used. Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.
Arguments
time::AbstractVector
: Vector or single value representing sampling dates. Can be strings, integers, or any other type.latitude::Vector
: A vector indicating the latitude of each site.longitude::Vector
: A vector indicating the longitude of each site.site::AbstractVector
: A vector indicating the spatial location of each site. At least 10 sites are required for clustering.species::AbstractVector
: A vector indicating the species present at each site.presence::AbstractVector
: A vector indicating the presence (1) or absence (0) of species at each site.
Returns
Dict{Int, DataFrame}
: A dictionary where each key represents a unique time point from the input data, with the corresponding value being aDataFrame
for that time step. EachDataFrame
contains the following columns:Time
,Latitude
,Longitude
,Site
,Total_Richness
, andGroup
(indicating the assigned cluster).
Details This function performs hierarchical clustering on the geographical coordinates of sampling sites at each unique time step, assuming that organism dispersal occurs within the study region. It incorporates checks and adjustments to ensure the following conditions are met: at least 2 clusters, a minimum of 5 sites per cluster, and that the variation in the number of taxa/species and sites per group does not exceed 40% and 30%, respectively. These conditions are critical for calculating an unbiased DNCI value, and the function will issue warnings and the groupings will be returned as "missing" if any are not fulfilled.
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> total_presence_df = @pipe df|>
groupby(_,[:Species,:Sampling_date_order])|>
combine(_,:Presence=>sum=>:Total_Presence) |>
filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
Row │ Species Sampling_date_order Total_Presence
│ String3 Int64 Int64
─────┼──────────────────────────────────────────────
1 │ BA 41 2
2 │ BA 50 2
3 │ BA 51 8
4 │ BA 52 19
5 │ BA 53 18
⋮ │ ⋮ ⋮ ⋮
787 │ SH 56 3
788 │ SH 60 4
789 │ SH 70 3
790 │ SH 73 5
791 │ SH 117 4
781 rows omitted
julia> non_empty_site_df = @pipe df|>
innerjoin(_, total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
groupby(_, [:Sampling_date_order, :plot]) |>
combine(_, :Presence=>sum=>:Total_N) |>
filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
Row │ Sampling_date_order plot Total_N
│ Int64 Int64 Int64
──────┼─────────────────────────────────────
1 │ 1 1 1
2 │ 1 2 1
3 │ 1 6 1
4 │ 1 8 1
5 │ 1 9 1
⋮ │ ⋮ ⋮ ⋮
2541 │ 117 20 6
2542 │ 117 21 4
2543 │ 117 22 4
2544 │ 117 23 5
2545 │ 117 24 5
2535 rows omitted
julia> filtered_df = @pipe df|>
innerjoin(_, non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation Total_N
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64 Int64
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024 1
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895 1
3 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369 1
4 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485 1
5 │ 2010 1 16 1 11 BA 0 0 35.5 -108.0 1.24515 1.62621 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
48351 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345 5
48352 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522 3
48353 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257 6
48354 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971 4
48355 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416 5
48345 rows omitted 18974 rows omitted
julia> result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
5 => 285×7 DataFrame…
56 => 437×7 DataFrame…
35 => 437×7 DataFrame…
55 => 456×7 DataFrame…
110 => 456×7 DataFrame…
114 => 418×7 DataFrame…
60 => 456×7 DataFrame…
30 => 380×7 DataFrame…
32 => 418×7 DataFrame…
6 => 342×7 DataFrame…
67 => 437×7 DataFrame…
45 => 437×7 DataFrame…
117 => 456×7 DataFrame…
73 => 437×7 DataFrame…
⋮ => ⋮
julia> result[1]
266×7 DataFrame
Row │ Time Latitude Longitude Site Species Presence Group
│ Int64 Float64 Float64 Int64 String3 Int64 Missing
─────┼───────────────────────────────────────────────────────────────
1 │ 1 35.0 -110.0 1 BA 0 missing
2 │ 1 35.0 -109.5 2 BA 0 missing
3 │ 1 35.5 -109.5 8 BA 0 missing
4 │ 1 35.5 -109.0 9 BA 0 missing
5 │ 1 35.5 -108.0 11 BA 0 missing
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
262 │ 1 36.0 -110.0 13 SH 0 missing
263 │ 1 36.0 -109.0 15 SH 0 missing
264 │ 1 36.5 -109.5 20 SH 0 missing
265 │ 1 36.5 -109.0 21 SH 0 missing
266 │ 1 36.5 -108.0 23 SH 0 missing
256 rows omitted
julia> result[3]
285×7 DataFrame
Row │ Time Latitude Longitude Site Species Presence Group
│ Int64 Float64 Float64 Int64 String3 Int64 Int64?
─────┼──────────────────────────────────────────────────────────────
1 │ 3 35.0 -110.0 1 BA 0 1
2 │ 3 35.0 -109.5 2 BA 0 1
3 │ 3 35.0 -108.5 4 BA 0 2
4 │ 3 35.5 -109.5 8 BA 0 1
5 │ 3 35.5 -108.0 11 BA 0 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
281 │ 3 36.0 -107.5 18 SH 0 2
282 │ 3 36.5 -109.5 20 SH 0 3
283 │ 3 36.5 -109.0 21 SH 0 3
284 │ 3 35.5 -110.0 7 SH 0 1
285 │ 3 36.5 -108.0 23 SH 0 2
275 rows omitted
MetaCommunityMetrics.plot_clusters
— Functionplot_clusters(latitude::Vector{Float64}, longitude::Vector{Float64}, group::AbstractVector, output_file="clusters.svg") -> String
Visualizes clustering results by generating an SVG image displaying the geographic coordinates and cluster assignments of sampling sites.
Arguments
latitude::Vector{Float64}
: A vector of latitude coordinates of the sampling sites.longitude::Vector{Float64}
: A vector of longitude coordinates of the sampling sites.group::AbstractVector
: A vector indicating the group assignments for each data point.output_file::String="clusters.svg"
: The filename for the output SVG visualization. Default is "clusters.svg".
Returns
String
: The path to the created SVG file.
Details
- The function generates a standalone SVG file that can be viewed in any web browser or image viewer.
- Each cluster is assigned a unique color, and sampling sites are plotted based on their geographic coordinates.
- The visualization includes a legend identifying each cluster.
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> total_presence_df=@pipe df|>
groupby(_,[:Species,:Sampling_date_order])|>
combine(_,:Presence=>sum=>:Total_Presence) |>
filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
Row │ Species Sampling_date_order Total_Presence
│ String3 Int64 Int64
─────┼──────────────────────────────────────────────
1 │ BA 41 2
2 │ BA 50 2
3 │ BA 51 8
4 │ BA 52 19
5 │ BA 53 18
⋮ │ ⋮ ⋮ ⋮
787 │ SH 56 3
788 │ SH 60 4
789 │ SH 70 3
790 │ SH 73 5
791 │ SH 117 4
781 rows omitted
julia> non_empty_site_df = @pipe df|>
innerjoin(_, total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
groupby(_, [:Sampling_date_order, :plot]) |>
combine(_, :Presence=>sum=>:Total_N) |>
filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
Row │ Sampling_date_order plot Total_N
│ Int64 Int64 Int64
──────┼─────────────────────────────────────
1 │ 1 1 1
2 │ 1 2 1
3 │ 1 6 1
4 │ 1 8 1
5 │ 1 9 1
⋮ │ ⋮ ⋮ ⋮
2541 │ 117 20 6
2542 │ 117 21 4
2543 │ 117 22 4
2544 │ 117 23 5
2545 │ 117 24 5
2535 rows omitted
julia> filtered_df = @pipe df|>
innerjoin(_, non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation Total_N
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64 Int64
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024 1
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895 1
3 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369 1
4 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485 1
5 │ 2010 1 16 1 11 BA 0 0 35.5 -108.0 1.24515 1.62621 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
48351 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345 5
48352 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522 3
48353 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257 6
48354 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971 4
48355 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416 5
48345 rows omitted 18974 rows omitted
julia> clustering_result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
5 => 285×7 DataFrame…
56 => 437×7 DataFrame…
35 => 437×7 DataFrame…
55 => 456×7 DataFrame…
110 => 456×7 DataFrame…
114 => 418×7 DataFrame…
60 => 456×7 DataFrame…
30 => 380×7 DataFrame…
32 => 418×7 DataFrame…
6 => 342×7 DataFrame…
67 => 437×7 DataFrame…
45 => 437×7 DataFrame…
117 => 456×7 DataFrame…
73 => 437×7 DataFrame…
⋮ => ⋮
julia> clustering_result[3]
285×7 DataFrame
Row │ Time Latitude Longitude Site Species Presence Group
│ Int64 Float64 Float64 Int64 String3 Int64 Int64?
─────┼──────────────────────────────────────────────────────────────
1 │ 3 35.0 -110.0 1 BA 0 1
2 │ 3 35.0 -109.5 2 BA 0 1
3 │ 3 35.0 -108.5 4 BA 0 2
4 │ 3 35.5 -109.5 8 BA 0 1
5 │ 3 35.5 -108.0 11 BA 0 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
281 │ 3 36.0 -107.5 18 SH 0 2
282 │ 3 36.5 -109.5 20 SH 0 3
283 │ 3 36.5 -109.0 21 SH 0 3
284 │ 3 35.5 -110.0 7 SH 0 1
285 │ 3 36.5 -108.0 23 SH 0 2
275 rows omitted
julia> plot_clusters(clustering_result[3].Latitude, clustering_result[3].Longitude, clustering_result[3].Group; output_file="clusters.svg")
This plot shows the clustering result for time step 1 based on geographic coordinates:
MetaCommunityMetrics.DNCI_multigroup
— FunctionDNCI_multigroup(comm::Matrix, groups::Vector, Nperm::Int=1000; Nperm_count::Bool=true) -> DataFrame
Calculates the dispersal-niche continuum index (DNCI) for multiple groups, a metric proposed by Vilmi(2021). The DNCI quantifies the balance between dispersal and niche processes within a metacommunity, providing insight into community structure and the relative influence of these two key ecological drivers. Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.
Arguments
comm::Matrix
: A presence-absence data matrix where rows represent observations (e.g., sites or samples) and columns represent species.groups::Vector
: A vector indicating the group membership for each row in thecomm
matrix. You can use thecreate_clusters
function to generate the group membership.Nperm::Int=1000
: The number of permutations for significance testing. Default is 1000.Nperm_count::Bool=true
: A flag indicating whether the number of permutations is printed. Default isfalse
.
Returns
DataFrame
: A DataFrame containing the DNCI value, the associate confiden interval (CI_DNCI
) and variance (S_DNCI
) for each pair of groups.
Details
- The function calculates the DNCI for each pair of groups in the input data.
- When the DNCI value is significantly below zero, dispersal processes are likely the dominant drivers of community composition.
- In contrast, a DNCI value significantly above zero suggests that niche processes play a primary role in shaping community composition.
- If the DNCI value is not significantly different from zero, it indicates that dispersal and niche processes contribute equally to spatial variations in community composition at a given time point.
- Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.
- Caution: High frequencies of empty sites can bias DNCI values toward zero; DNCI not significantly different from zero in such cases may indicate insufficient ecological variation for reliable process detection rather than genuine equal relative contributions of dispersal and niche processes.
- This function is a translation/adaptation of a function from the R package
DNCImper
, licensed under GPL-3. - Original package and documentation available at: https://github.com/Corentin-Gibert-Paleontology/DNCImper
Example
julia> using MetaCommunityMetrics, Pipe, DataFrames, Random
julia> df = load_sample_data()
53352×12 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895
3 │ 2010 1 16 1 4 BA 0 0 35.0 -108.5 -0.409808 -0.803663
4 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369
5 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
53348 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345
53349 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522
53350 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257
53351 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971
53352 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416
53342 rows omitted
julia> total_presence_df=@pipe df|>
groupby(_,[:Species,:Sampling_date_order])|>
combine(_,:Presence=>sum=>:Total_Presence) |>
filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
Row │ Species Sampling_date_order Total_Presence
│ String3 Int64 Int64
─────┼──────────────────────────────────────────────
1 │ BA 41 2
2 │ BA 50 2
3 │ BA 51 8
4 │ BA 52 19
5 │ BA 53 18
⋮ │ ⋮ ⋮ ⋮
787 │ SH 56 3
788 │ SH 60 4
789 │ SH 70 3
790 │ SH 73 5
791 │ SH 117 4
781 rows omitted
julia> non_empty_site_df = @pipe df|>
innerjoin(_, total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
groupby(_, [:Sampling_date_order, :plot]) |>
combine(_, :Presence=>sum=>:Total_N) |>
filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
Row │ Sampling_date_order plot Total_N
│ Int64 Int64 Int64
──────┼─────────────────────────────────────
1 │ 1 1 1
2 │ 1 2 1
3 │ 1 6 1
4 │ 1 8 1
5 │ 1 9 1
⋮ │ ⋮ ⋮ ⋮
2541 │ 117 20 6
2542 │ 117 21 4
2543 │ 117 22 4
2544 │ 117 23 5
2545 │ 117 24 5
2535 rows omitted
julia> filtered_df = @pipe df|>
innerjoin(_, non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
Row │ Year Month Day Sampling_date_order plot Species Abundance Presence Latitude Longitude normalized_temperature normalized_precipitation Total_N
│ Int64 Int64 Int64 Int64 Int64 String3 Int64 Int64 Float64 Float64 Float64 Float64 Int64
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 2010 1 16 1 1 BA 0 0 35.0 -110.0 0.829467 -1.4024 1
2 │ 2010 1 16 1 2 BA 0 0 35.0 -109.5 -1.12294 -0.0519895 1
3 │ 2010 1 16 1 8 BA 0 0 35.5 -109.5 -1.35913 -0.646369 1
4 │ 2010 1 16 1 9 BA 0 0 35.5 -109.0 0.0822 1.09485 1
5 │ 2010 1 16 1 11 BA 0 0 35.5 -108.0 1.24515 1.62621 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
48351 │ 2023 3 21 117 9 SH 0 0 35.5 -109.0 -0.571565 -0.836345 5
48352 │ 2023 3 21 117 10 SH 0 0 35.5 -108.5 -2.33729 -0.398522 3
48353 │ 2023 3 21 117 12 SH 1 1 35.5 -107.5 0.547169 1.03257 6
48354 │ 2023 3 21 117 16 SH 0 0 36.0 -108.5 -0.815015 0.95971 4
48355 │ 2023 3 21 117 23 SH 0 0 36.5 -108.0 0.48949 -1.59416 5
48345 rows omitted 18974 rows omitted
julia> clustering_result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
5 => 285×7 DataFrame…
56 => 437×7 DataFrame…
35 => 437×7 DataFrame…
55 => 456×7 DataFrame…
110 => 456×7 DataFrame…
114 => 418×7 DataFrame…
60 => 456×7 DataFrame…
30 => 380×7 DataFrame…
32 => 418×7 DataFrame…
6 => 342×7 DataFrame…
67 => 437×7 DataFrame…
45 => 437×7 DataFrame…
117 => 456×7 DataFrame…
73 => 437×7 DataFrame…
⋮ => ⋮
julia> clustering_result[3]
285×7 DataFrame
Row │ Time Latitude Longitude Site Species Presence Group
│ Int64 Float64 Float64 Int64 String3 Int64 Int64?
─────┼──────────────────────────────────────────────────────────────
1 │ 3 35.0 -110.0 1 BA 0 1
2 │ 3 35.0 -109.5 2 BA 0 1
3 │ 3 35.0 -108.5 4 BA 0 2
4 │ 3 35.5 -109.5 8 BA 0 1
5 │ 3 35.5 -108.0 11 BA 0 2
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
281 │ 3 36.0 -107.5 18 SH 0 2
282 │ 3 36.5 -109.5 20 SH 0 3
283 │ 3 36.5 -109.0 21 SH 0 3
284 │ 3 35.5 -110.0 7 SH 0 1
285 │ 3 36.5 -108.0 23 SH 0 2
275 rows omitted
julia> group_df = @pipe filtered_df |>
filter(row -> row[:Sampling_date_order] == 3, _) |>
select(_, [:plot, :Species, :Presence]) |>
innerjoin(_, clustering_result[3], on = [:plot => :Site, :Species], makeunique = true)|>
select(_, [:plot, :Species, :Presence, :Group]) |>
unstack(_, :Species, :Presence, fill=0)
15×21 DataFrame
Row │ plot Group BA DM DO DS NA OL OT PB PE PF PH PL PM PP RF RM RO SF SH
│ Int64 Int64? Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 │ 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
3 │ 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 │ 8 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 │ 11 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
11 │ 18 2 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
12 │ 20 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
13 │ 21 3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
14 │ 7 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
15 │ 23 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
5 rows omitted
julia> comm= @pipe group_df |>
select(_, Not([:plot,:Group])) |>
Matrix(_)
15×19 Matrix{Int64}:
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
julia> Random.seed!(1234)
julia> DNCI_result = DNCI_multigroup(comm, group_df.Group, 1000; Nperm_count = false)
3×5 DataFrame
Row │ group1 group2 DNCI CI_DNCI S_DNCI
│ Int64 Int64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────
1 │ 1 2 -0.421799 2.74032 1.37016
2 │ 1 3 -1.33813 1.95422 0.977109
3 │ 2 3 -1.39344 4.03394 2.01697
References
- Clarke, K. R. Non-parametric multivariate analyses of changes in community structure. Australian Journal of Ecology 18, 117-143 (1993). https://doi.org:https://doi.org/10.1111/j.1442-9993.1993.tb00438.x
- Gibert, C. & Escarguel, G. PER-SIMPER—A new tool for inferring community assembly processes from taxon occurrences. Global Ecology and Biogeography 28, 374-385 (2019). https://doi.org:https://doi.org/10.1111/geb.12859
- Vilmi, A. et al. Dispersal–niche continuum index: a new quantitative metric for assessing the relative importance of dispersal versus niche processes in community assembly. Ecography 44, 370-379 (2021). https://doi.org:https://doi.org/10.1111/ecog.05356