Dispersal-Niche Continuum Index (DNCI) Functions

The Dispersal-Niche Continuum Index (DNCI) functions in MetaCommunityMetrics quantifies the balance between dispersal and niche processes within a metacommunity, providing insight into community structure and the relative influence of these two key ecological drivers. The function DNCI_multigroup in this package is adapted from the R package DNCImper: Assembly process identification based on SIMPER analysis. These methods, originally developed by Clarke(1993) and later refined by Gibert & Escarguel(2019) and Vilmi et al.(2021), offer powerful tools for identifying the processes underlying species assembly in metacommunities.

Background

The DNCI functions is built around the Per-SIMPER and DNCI analyses. PerSIMPER, based on the Similarity Percentage (SIMPER) analysis developed by Clarke (1993), assesses the contribution of individual taxa to overall dissimilarity (OAD) between groups of assemblages. PerSIMPER enhances this by comparing empirical SIMPER plots with randomized plots generated through matrix permutation, which helps identify whether niche, dispersal, or both processes are driving community assembly.

The DNCI (Dispersal-Niche Continuum Index) further extends this approach by transforming the qualitative results of PerSIMPER into a quantitative index, providing a straightforward measure of the influence of niche and dispersal processes on community structure.

Functionality Overview

The DNCI functions in MetaCommunityMetrics allow you to analyze the processes driving species assembly within your dataset. By comparing empirical data with randomized permutations, one can determine the extent to which niche and dispersal processes influence the structure of metacommunities. Before calculating the DNCI, groupings of sites (clusters) are required, as the DNCI relies on analyzing community composition across spatial groups. This package provides a function to perform the necessary clustering, which is not available in the equivalent R package. When the DNCI value is significantly below zero, dispersal processes are likely the dominant drivers of community composition. In contrast, a DNCI value significantly above zero suggests that niche processes play a primary role in shaping community composition. If the DNCI value is not significantly different from zero, it indicates that dispersal and niche processes contribute equally to variations in community composition.

The Functions

MetaCommunityMetrics.create_clustersFunction
create_clusters(time::AbstractVector, latitude::Vector{Float64}, longitude::Vector{Float64}, site::AbstractVector, species::AbstractVector, presence::AbstractVector) -> Dict{Int, DataFrame}

This function creates clusters (groupings of sites) for each unique time step in a dataset which can then used for calculating DNCI. Only presnece-absence data can be used. Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.

Arguments

  • time::AbstractVector: Vector or single value representing sampling dates. Can be strings, integers, or any other type.
  • latitude::Vector: A vector indicating the latitude of each site.
  • longitude::Vector: A vector indicating the longitude of each site.
  • site::AbstractVector: A vector indicating the spatial location of each site. At least 10 sites are required for clustering.
  • species::AbstractVector: A vector indicating the species present at each site.
  • presence::AbstractVector: A vector indicating the presence (1) or absence (0) of species at each site.

Returns

  • Dict{Int, DataFrame}: A dictionary where each key represents a unique time point from the input data, with the corresponding value being a DataFrame for that time step. Each DataFrame contains the following columns: Time, Latitude, Longitude, Site, Total_Richness, and Group (indicating the assigned cluster).

Details This function performs hierarchical clustering on the geographical coordinates of sampling sites at each unique time step, assuming that organism dispersal occurs within the study region. It incorporates checks and adjustments to ensure the following conditions are met: at least 2 clusters, a minimum of 5 sites per cluster, and that the variation in the number of taxa/species and sites per group does not exceed 40% and 30%, respectively. These conditions are critical for calculating an unbiased DNCI value, and the function will issue warnings and the groupings will be returned as "missing" if any are not fulfilled.

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted
                                                                                          
julia> total_presence_df = @pipe df|>
                        groupby(_,[:Species,:Sampling_date_order])|>
                        combine(_,:Presence=>sum=>:Total_Presence) |>
                        filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
 Row │ Species  Sampling_date_order  Total_Presence 
     │ String3  Int64                Int64          
─────┼──────────────────────────────────────────────
   1 │ BA                        41               2
   2 │ BA                        50               2
   3 │ BA                        51               8
   4 │ BA                        52              19
   5 │ BA                        53              18
  ⋮  │    ⋮              ⋮                 ⋮
 787 │ SH                        56               3
 788 │ SH                        60               4
 789 │ SH                        70               3
 790 │ SH                        73               5
 791 │ SH                       117               4
                                    781 rows omitted

julia> non_empty_site_df = @pipe df|>
                    innerjoin(_,  total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
                    groupby(_, [:Sampling_date_order, :plot]) |>
                    combine(_, :Presence=>sum=>:Total_N) |>
                    filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
  Row │ Sampling_date_order  plot   Total_N 
      │ Int64                Int64  Int64   
──────┼─────────────────────────────────────
    1 │                   1      1        1
    2 │                   1      2        1
    3 │                   1      6        1
    4 │                   1      8        1
    5 │                   1      9        1
  ⋮   │          ⋮             ⋮       ⋮
 2541 │                 117     20        6
 2542 │                 117     21        4
 2543 │                 117     22        4
 2544 │                 117     23        5
 2545 │                 117     24        5
                           2535 rows omitted

julia> filtered_df = @pipe df|>
                  innerjoin(_,  non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation  Total_N 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                   Int64   
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024             1
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895          1
     3 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369           1
     4 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485            1
     5 │  2010      1     16                    1     11  BA               0         0      35.5     -108.0                1.24515                1.62621            2
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮                 ⋮
 48351 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345           5
 48352 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522           3
 48353 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257            6
 48354 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971            4
 48355 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416            5
                                                                                                                                                     48345 rows omitted                                                                                                                                                            18974 rows omitted

julia> result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
  5   => 285×7 DataFrame…
  56  => 437×7 DataFrame…
  35  => 437×7 DataFrame…
  55  => 456×7 DataFrame…
  110 => 456×7 DataFrame…
  114 => 418×7 DataFrame…
  60  => 456×7 DataFrame…
  30  => 380×7 DataFrame…
  32  => 418×7 DataFrame…
  6   => 342×7 DataFrame…
  67  => 437×7 DataFrame…
  45  => 437×7 DataFrame…
  117 => 456×7 DataFrame…
  73  => 437×7 DataFrame…
  ⋮   => ⋮
julia> result[1]
266×7 DataFrame
 Row │ Time   Latitude  Longitude  Site   Species  Presence  Group   
     │ Int64  Float64   Float64    Int64  String3  Int64     Missing 
─────┼───────────────────────────────────────────────────────────────
   1 │     1      35.0     -110.0      1  BA              0  missing 
   2 │     1      35.0     -109.5      2  BA              0  missing 
   3 │     1      35.5     -109.5      8  BA              0  missing 
   4 │     1      35.5     -109.0      9  BA              0  missing 
   5 │     1      35.5     -108.0     11  BA              0  missing 
  ⋮  │   ⋮       ⋮          ⋮        ⋮       ⋮        ⋮         ⋮
 262 │     1      36.0     -110.0     13  SH              0  missing 
 263 │     1      36.0     -109.0     15  SH              0  missing 
 264 │     1      36.5     -109.5     20  SH              0  missing 
 265 │     1      36.5     -109.0     21  SH              0  missing 
 266 │     1      36.5     -108.0     23  SH              0  missing 
                                                     256 rows omitted

julia> result[3]
285×7 DataFrame
 Row │ Time   Latitude  Longitude  Site   Species  Presence  Group  
     │ Int64  Float64   Float64    Int64  String3  Int64     Int64? 
─────┼──────────────────────────────────────────────────────────────
   1 │     3      35.0     -110.0      1  BA              0       1
   2 │     3      35.0     -109.5      2  BA              0       1
   3 │     3      35.0     -108.5      4  BA              0       2
   4 │     3      35.5     -109.5      8  BA              0       1
   5 │     3      35.5     -108.0     11  BA              0       2
  ⋮  │   ⋮       ⋮          ⋮        ⋮       ⋮        ⋮        ⋮
 281 │     3      36.0     -107.5     18  SH              0       2
 282 │     3      36.5     -109.5     20  SH              0       3
 283 │     3      36.5     -109.0     21  SH              0       3
 284 │     3      35.5     -110.0      7  SH              0       1
 285 │     3      36.5     -108.0     23  SH              0       2
                                                    275 rows omitted
source
MetaCommunityMetrics.plot_clustersFunction
plot_clusters(latitude::Vector{Float64}, longitude::Vector{Float64}, group::AbstractVector, output_file="clusters.svg") -> String

Visualizes clustering results by generating an SVG image displaying the geographic coordinates and cluster assignments of sampling sites.

Arguments

  • latitude::Vector{Float64}: A vector of latitude coordinates of the sampling sites.
  • longitude::Vector{Float64}: A vector of longitude coordinates of the sampling sites.
  • group::AbstractVector: A vector indicating the group assignments for each data point.
  • output_file::String="clusters.svg": The filename for the output SVG visualization. Default is "clusters.svg".

Returns

  • String: The path to the created SVG file.

Details

  • The function generates a standalone SVG file that can be viewed in any web browser or image viewer.
  • Each cluster is assigned a unique color, and sampling sites are plotted based on their geographic coordinates.
  • The visualization includes a legend identifying each cluster.

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted
                                                                                          
julia> total_presence_df=@pipe df|>
                        groupby(_,[:Species,:Sampling_date_order])|>
                        combine(_,:Presence=>sum=>:Total_Presence) |>
                        filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
 Row │ Species  Sampling_date_order  Total_Presence 
     │ String3  Int64                Int64          
─────┼──────────────────────────────────────────────
   1 │ BA                        41               2
   2 │ BA                        50               2
   3 │ BA                        51               8
   4 │ BA                        52              19
   5 │ BA                        53              18
  ⋮  │    ⋮              ⋮                 ⋮
 787 │ SH                        56               3
 788 │ SH                        60               4
 789 │ SH                        70               3
 790 │ SH                        73               5
 791 │ SH                       117               4
                                    781 rows omitted

julia> non_empty_site_df = @pipe df|>
                    innerjoin(_,  total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
                    groupby(_, [:Sampling_date_order, :plot]) |>
                    combine(_, :Presence=>sum=>:Total_N) |>
                    filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
  Row │ Sampling_date_order  plot   Total_N 
      │ Int64                Int64  Int64   
──────┼─────────────────────────────────────
    1 │                   1      1        1
    2 │                   1      2        1
    3 │                   1      6        1
    4 │                   1      8        1
    5 │                   1      9        1
  ⋮   │          ⋮             ⋮       ⋮
 2541 │                 117     20        6
 2542 │                 117     21        4
 2543 │                 117     22        4
 2544 │                 117     23        5
 2545 │                 117     24        5
                           2535 rows omitted

julia> filtered_df = @pipe df|>
                  innerjoin(_,  non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation  Total_N 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                   Int64   
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024             1
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895          1
     3 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369           1
     4 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485            1
     5 │  2010      1     16                    1     11  BA               0         0      35.5     -108.0                1.24515                1.62621            2
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮                 ⋮
 48351 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345           5
 48352 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522           3
 48353 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257            6
 48354 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971            4
 48355 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416            5
                                                                                                                                                     48345 rows omitted                                                                                                                                                            18974 rows omitted

julia> clustering_result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
  5   => 285×7 DataFrame…
  56  => 437×7 DataFrame…
  35  => 437×7 DataFrame…
  55  => 456×7 DataFrame…
  110 => 456×7 DataFrame…
  114 => 418×7 DataFrame…
  60  => 456×7 DataFrame…
  30  => 380×7 DataFrame…
  32  => 418×7 DataFrame…
  6   => 342×7 DataFrame…
  67  => 437×7 DataFrame…
  45  => 437×7 DataFrame…
  117 => 456×7 DataFrame…
  73  => 437×7 DataFrame…
  ⋮   => ⋮


julia> clustering_result[3]
285×7 DataFrame
 Row │ Time   Latitude  Longitude  Site   Species  Presence  Group  
     │ Int64  Float64   Float64    Int64  String3  Int64     Int64? 
─────┼──────────────────────────────────────────────────────────────
   1 │     3      35.0     -110.0      1  BA              0       1
   2 │     3      35.0     -109.5      2  BA              0       1
   3 │     3      35.0     -108.5      4  BA              0       2
   4 │     3      35.5     -109.5      8  BA              0       1
   5 │     3      35.5     -108.0     11  BA              0       2
  ⋮  │   ⋮       ⋮          ⋮        ⋮       ⋮        ⋮        ⋮
 281 │     3      36.0     -107.5     18  SH              0       2
 282 │     3      36.5     -109.5     20  SH              0       3
 283 │     3      36.5     -109.0     21  SH              0       3
 284 │     3      35.5     -110.0      7  SH              0       1
 285 │     3      36.5     -108.0     23  SH              0       2
                                                    275 rows omitted

julia> plot_clusters(clustering_result[3].Latitude, clustering_result[3].Longitude, clustering_result[3].Group; output_file="clusters.svg")
source

This plot shows the clustering result for time step 1 based on geographic coordinates: Cluster Plot

MetaCommunityMetrics.DNCI_multigroupFunction
DNCI_multigroup(comm::Matrix, groups::Vector, Nperm::Int=1000; Nperm_count::Bool=true) -> DataFrame

Calculates the dispersal-niche continuum index (DNCI) for multiple groups, a metric proposed by Vilmi(2021). The DNCI quantifies the balance between dispersal and niche processes within a metacommunity, providing insight into community structure and the relative influence of these two key ecological drivers. Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.

Arguments

  • comm::Matrix: A presence-absence data matrix where rows represent observations (e.g., sites or samples) and columns represent species.
  • groups::Vector: A vector indicating the group membership for each row in the comm matrix. You can use the create_clusters function to generate the group membership.
  • Nperm::Int=1000: The number of permutations for significance testing. Default is 1000.
  • Nperm_count::Bool=true: A flag indicating whether the number of permutations is printed. Default is false.

Returns

  • DataFrame: A DataFrame containing the DNCI value, the associate confiden interval (CI_DNCI) and variance (S_DNCI) for each pair of groups.

Details

  • The function calculates the DNCI for each pair of groups in the input data.
  • When the DNCI value is significantly below zero, dispersal processes are likely the dominant drivers of community composition.
  • In contrast, a DNCI value significantly above zero suggests that niche processes play a primary role in shaping community composition.
  • If the DNCI value is not significantly different from zero, it indicates that dispersal and niche processes contribute equally to spatial variations in community composition at a given time point.
  • Please remove singletons (taxa/species that occuring at one site within a time step) before using this function.
  • Caution: High frequencies of empty sites can bias DNCI values toward zero; DNCI not significantly different from zero in such cases may indicate insufficient ecological variation for reliable process detection rather than genuine equal relative contributions of dispersal and niche processes.
  • This function is a translation/adaptation of a function from the R package DNCImper, licensed under GPL-3.
  • Original package and documentation available at: https://github.com/Corentin-Gibert-Paleontology/DNCImper

Example

julia> using MetaCommunityMetrics, Pipe, DataFrames, Random

julia> df = load_sample_data()
53352×12 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                  
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895
     3 │  2010      1     16                    1      4  BA               0         0      35.0     -108.5               -0.409808              -0.803663
     4 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369
     5 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮
 53348 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345
 53349 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522
 53350 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257
 53351 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971
 53352 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416
                                                                                                                                            53342 rows omitted
                                                                                          
julia> total_presence_df=@pipe df|>
                        groupby(_,[:Species,:Sampling_date_order])|>
                        combine(_,:Presence=>sum=>:Total_Presence) |>
                        filter(row -> row[:Total_Presence] > 1, _)
791×3 DataFrame
 Row │ Species  Sampling_date_order  Total_Presence 
     │ String3  Int64                Int64          
─────┼──────────────────────────────────────────────
   1 │ BA                        41               2
   2 │ BA                        50               2
   3 │ BA                        51               8
   4 │ BA                        52              19
   5 │ BA                        53              18
  ⋮  │    ⋮              ⋮                 ⋮
 787 │ SH                        56               3
 788 │ SH                        60               4
 789 │ SH                        70               3
 790 │ SH                        73               5
 791 │ SH                       117               4
                                    781 rows omitted

julia> non_empty_site_df = @pipe df|>
                    innerjoin(_,  total_presence_df, on = [:Species, :Sampling_date_order], makeunique = true)|>
                    groupby(_, [:Sampling_date_order, :plot]) |>
                    combine(_, :Presence=>sum=>:Total_N) |>
                    filter(row -> row[:Total_N] > 0, _)
2545×3 DataFrame
  Row │ Sampling_date_order  plot   Total_N 
      │ Int64                Int64  Int64   
──────┼─────────────────────────────────────
    1 │                   1      1        1
    2 │                   1      2        1
    3 │                   1      6        1
    4 │                   1      8        1
    5 │                   1      9        1
  ⋮   │          ⋮             ⋮       ⋮
 2541 │                 117     20        6
 2542 │                 117     21        4
 2543 │                 117     22        4
 2544 │                 117     23        5
 2545 │                 117     24        5
                           2535 rows omitted

julia> filtered_df = @pipe df|>
                  innerjoin(_,  non_empty_site_df, on = [:plot, :Sampling_date_order], makeunique = true)
48355×13 DataFrame
   Row │ Year   Month  Day    Sampling_date_order  plot   Species  Abundance  Presence  Latitude  Longitude  normalized_temperature  normalized_precipitation  Total_N 
       │ Int64  Int64  Int64  Int64                Int64  String3  Int64      Int64     Float64   Float64    Float64                 Float64                   Int64   
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │  2010      1     16                    1      1  BA               0         0      35.0     -110.0                0.829467              -1.4024             1
     2 │  2010      1     16                    1      2  BA               0         0      35.0     -109.5               -1.12294               -0.0519895          1
     3 │  2010      1     16                    1      8  BA               0         0      35.5     -109.5               -1.35913               -0.646369           1
     4 │  2010      1     16                    1      9  BA               0         0      35.5     -109.0                0.0822                 1.09485            1
     5 │  2010      1     16                    1     11  BA               0         0      35.5     -108.0                1.24515                1.62621            2
   ⋮   │   ⋮      ⋮      ⋮             ⋮             ⋮       ⋮         ⋮         ⋮         ⋮          ⋮                ⋮                        ⋮                 ⋮
 48351 │  2023      3     21                  117      9  SH               0         0      35.5     -109.0               -0.571565              -0.836345           5
 48352 │  2023      3     21                  117     10  SH               0         0      35.5     -108.5               -2.33729               -0.398522           3
 48353 │  2023      3     21                  117     12  SH               1         1      35.5     -107.5                0.547169               1.03257            6
 48354 │  2023      3     21                  117     16  SH               0         0      36.0     -108.5               -0.815015               0.95971            4
 48355 │  2023      3     21                  117     23  SH               0         0      36.5     -108.0                0.48949               -1.59416            5
                                                                                                                                                     48345 rows omitted                                                                                                                                                            18974 rows omitted

julia> clustering_result = create_clusters(filtered_df.Sampling_date_order, filtered_df.Latitude, filtered_df.Longitude, filtered_df.plot, filtered_df.Species, filtered_df.Presence)
Warning: Cluster count fell below 2 at time 1, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 2, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 10, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 14, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 17, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 43, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 87, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 89, which is not permissible for clustering. Groups assigned as missing.
Warning: Cluster count fell below 2 at time 99, which is not permissible for clustering. Groups assigned as missing.
Dict{Int64, DataFrame} with 117 entries:
  5   => 285×7 DataFrame…
  56  => 437×7 DataFrame…
  35  => 437×7 DataFrame…
  55  => 456×7 DataFrame…
  110 => 456×7 DataFrame…
  114 => 418×7 DataFrame…
  60  => 456×7 DataFrame…
  30  => 380×7 DataFrame…
  32  => 418×7 DataFrame…
  6   => 342×7 DataFrame…
  67  => 437×7 DataFrame…
  45  => 437×7 DataFrame…
  117 => 456×7 DataFrame…
  73  => 437×7 DataFrame…
  ⋮   => ⋮


julia> clustering_result[3]
285×7 DataFrame
 Row │ Time   Latitude  Longitude  Site   Species  Presence  Group  
     │ Int64  Float64   Float64    Int64  String3  Int64     Int64? 
─────┼──────────────────────────────────────────────────────────────
   1 │     3      35.0     -110.0      1  BA              0       1
   2 │     3      35.0     -109.5      2  BA              0       1
   3 │     3      35.0     -108.5      4  BA              0       2
   4 │     3      35.5     -109.5      8  BA              0       1
   5 │     3      35.5     -108.0     11  BA              0       2
  ⋮  │   ⋮       ⋮          ⋮        ⋮       ⋮        ⋮        ⋮
 281 │     3      36.0     -107.5     18  SH              0       2
 282 │     3      36.5     -109.5     20  SH              0       3
 283 │     3      36.5     -109.0     21  SH              0       3
 284 │     3      35.5     -110.0      7  SH              0       1
 285 │     3      36.5     -108.0     23  SH              0       2
                                                    275 rows omitted

julia> group_df = @pipe filtered_df |>
                  filter(row -> row[:Sampling_date_order] == 3, _) |>
                  select(_, [:plot, :Species, :Presence]) |>
                  innerjoin(_, clustering_result[3], on = [:plot => :Site, :Species], makeunique = true)|>
                  select(_, [:plot, :Species, :Presence, :Group]) |>
                  unstack(_, :Species, :Presence, fill=0)
15×21 DataFrame
 Row │ plot   Group   BA     DM     DO     DS     NA     OL     OT     PB     PE     PF     PH     PL     PM     PP     RF     RM     RO     SF     SH    
     │ Int64  Int64?  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64  Int64 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     1       1      0      1      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
   2 │     2       1      0      1      0      0      0      0      0      1      0      0      0      0      0      0      0      0      0      0      0
   3 │     4       2      0      0      0      0      0      0      0      0      0      0      0      0      0      1      0      0      0      0      0
   4 │     8       1      0      1      1      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
   5 │    11       2      0      1      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
  ⋮  │   ⋮      ⋮       ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮      ⋮
  11 │    18       2      0      0      0      0      0      0      0      1      0      0      0      0      0      1      0      0      0      0      0
  12 │    20       3      0      0      0      0      0      0      0      0      0      0      0      0      0      1      0      0      0      0      0
  13 │    21       3      0      0      0      0      0      0      0      1      0      0      0      0      0      0      0      0      0      0      0
  14 │     7       1      0      0      0      0      0      0      1      0      0      0      0      0      0      1      0      0      0      0      0
  15 │    23       2      0      0      0      0      0      0      1      0      0      0      0      0      0      0      0      0      0      0      0
                                                                                                                                            5 rows omitted

julia> comm= @pipe group_df |>
                  select(_, Not([:plot,:Group])) |>
                  Matrix(_)
15×19 Matrix{Int64}:
 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
 0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0

julia> Random.seed!(1234) 

julia> DNCI_result = DNCI_multigroup(comm, group_df.Group, 1000; Nperm_count = false)
3×5 DataFrame
 Row │ group1  group2  DNCI       CI_DNCI  S_DNCI   
     │ Int64   Int64   Float64    Float64  Float64  
─────┼──────────────────────────────────────────────
   1 │      1       2  -0.421799  2.74032  1.37016
   2 │      1       3  -1.33813   1.95422  0.977109
   3 │      2       3  -1.39344   4.03394  2.01697
source

References

  1. Clarke, K. R. Non-parametric multivariate analyses of changes in community structure. Australian Journal of Ecology 18, 117-143 (1993). https://doi.org:https://doi.org/10.1111/j.1442-9993.1993.tb00438.x
  2. Gibert, C. & Escarguel, G. PER-SIMPER—A new tool for inferring community assembly processes from taxon occurrences. Global Ecology and Biogeography 28, 374-385 (2019). https://doi.org:https://doi.org/10.1111/geb.12859
  3. Vilmi, A. et al. Dispersal–niche continuum index: a new quantitative metric for assessing the relative importance of dispersal versus niche processes in community assembly. Ecography 44, 370-379 (2021). https://doi.org:https://doi.org/10.1111/ecog.05356