--- title: "Getting started" output: rmarkdown::html_vignette: toc: true fig_caption: true fontsize: 11pt documentclass: article bibliography: references.bib csl: reference-style.csl vignette: > %\VignetteIndexEntry{Getting started} %\VignetteEngine{knitr::rmarkdown_notangle} --- ```{r, include = FALSE} # define dummy variables so that vignette passes package checks tas_features <- raster::raster(matrix(1)) ``` ```{r, include = FALSE} # define variables for vignette figures and code execution h <- 3.5 w <- 3.5 is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_", "_R_CHECK_LICENSE_") %in% names(Sys.getenv())) || !identical(Sys.getenv("MY_UNIVERSE"), "") || any(c("CI", "GITHUB_ACTIONS", "GITHUB_SHA") %in% names(Sys.getenv())) knitr::opts_chunk$set( fig.align = "center", eval = !is_check, purl = !is_check, R.options = list(width = 80) ) ``` ```{r, include = FALSE} # set up print method print <- function(x, ...) { if (!interactive() && inherits(x, "ConservationProblem")) { prioritizr::knit_print.ConservationProblem(x) } else if (!interactive() && inherits(x, "OptimizationProblem")) { prioritizr::knit_print.OptimizationProblem(x) } else { base::print(x) } } ``` ## Introduction The aim of this tutorial is to provide a short introduction to the _prioritizr R_ package. It is also intended to help conservation planners familiar the _Marxan_ decision support tool [@r3] start using the package for their work. ## Data Let's load the packages and data used in this tutorial. Since this tutorial uses data from the _prioritizrdata R_ package, please ensure that it is installed. The data used in this tutorial were obtained from the _Introduction to Marxan_ course and the Australian Government's National Vegetation Information System. ```{r, message = FALSE} # load packages library(prioritizrdata) library(prioritizr) library(sf) library(terra) library(vegan) library(cluster) # set seed for reproducibility set.seed(500) # load planning unit data tas_pu <- get_tas_pu() # load feature data tas_features <- get_tas_features() ``` Let's have a look at the planning unit data. The `tas_pu` object contains planning units represented as spatial polygons (i.e., a `sf::st_sf()` object). This object has three columns that denote the following information for each planning unit: a unique identifier (`id`), unimproved land value (`cost`), and current conservation status (`locked_in`). Planning units that have at least half of their area overlapping with existing protected areas are denoted with a locked in `TRUE` value, otherwise they are denoted with a value of `FALSE`. We will also set the costs for existing protected areas to zero, so that existing protected areas aren't included in the the cost of the prioritization. ```{r, fig.width = w, fig.height = h} # print planning unit data print(tas_pu) # set costs for existing protected areas to zero tas_pu$cost <- tas_pu$cost * !tas_pu$locked_in # plot map of planning unit costs plot(st_as_sf(tas_pu[, "cost"]), main = "Planning unit costs") # plot map of planning unit coverage by protected areas plot(st_as_sf(tas_pu[, "locked_in"]), main = "Protected area coverage") ``` Now, let's look at the conservation feature data. The `tas_features` object describes the spatial distribution of the features. Specifically, the feature data are a multi-layer raster (i.e., a `terra::rast()` object). Each layer corresponds to a different vegetation community. Within each layer, cells values denote the presence (using value of 1) or absence (using value of 0) of the vegetation community across the study area. ```{r, fig.width = 4.5, fig.height = 4.5} # print planning unit data print(tas_features) # plot map of the first four vegetation classes plot(tas_features[[1:4]]) ``` ## Problem formulation Now we will formulate a conservation planing problem. To achieve this, we first specify which objects contain the planning unit and feature data (using the `problem()` function). Next, we specify that we want to use the minimum set objective function (using the `add_min_set_objective()` function). This objective function indicates that we wish to minimize the total cost of planning units selected by the prioritization. We then specify boundary penalties to reduce spatial fragmentation in the resulting prioritization (using the `add_boundary_penalties()` function; see the [_Calibrating trade-offs_ vignette](calibrating_trade-offs_tutorial.html) for details on calibrating the penalty value). We also specify representation targets to ensure the resulting prioritization provides adequate coverage of each vegetation community (using the `add_relative_targets()` function). Specifically, we specify targets to ensure at least 17% of the spatial extent of each vegetation community (based on the [Aichi Target 11](https://www.cbd.int/sp/targets/)). Additionally, we set constraints to ensure that planning units predominately covered by existing protected areas are selected by the prioritization (using the `add_locked_in_constraints()` function). Finally, we specify that the prioritization should either select -- or not select -- planning units for prioritization (using the `add_binary_decisions()` function). ```{r, fig.width = w, fig.height = h} # build problem p1 <- problem(tas_pu, tas_features, cost_column = "cost") %>% add_min_set_objective() %>% add_boundary_penalties(penalty = 0.005) %>% add_relative_targets(0.17) %>% add_locked_in_constraints("locked_in") %>% add_binary_decisions() # print problem print(p1) ``` ## Prioritization We can now solve the problem formulation (`p1`) to generate a prioritization (using the `solve()` function). The _prioritizr_ R package supports a range of different exact algorithm solvers, including _Gurobi_, _IBM CPLEX_, _CBC_, _HiGHS_, _Rsymphony_, and _lpsymphony_. Although there are benefits and limitations associated with each of these different solvers, they should return similar results. Note that you will need at least one solver installed on your system to generate prioritizations. Since we did not specify a solver when building the problem, the _prioritizr R_ package will automatically select the best available solver installed. We recommend using the _Gurobi_ solver if possible, and have used it for this tutorial (see the _Gurobi Installation Guide_ vignette for installation instructions). After solving the problem, the prioritization will be stored in the `solution_1` column of the `s1` object. ```{r, fig.width = w, fig.height = h} # solve problem s1 <- solve(p1) # plot map of prioritization plot( st_as_sf(s1[, "solution_1"]), main = "Prioritization", pal = c("grey90", "darkgreen") ) ``` ## Feature representation Let's examine how well the vegetation communities are represented by existing protected areas and the prioritization. ```{r, fig.width = 7} # create column with existing protected areas tas_pu$pa <- round(tas_pu$locked_in) # calculate feature representation statistics based on existing protected areas tc_pa <- eval_target_coverage_summary(p1, tas_pu[, "pa"]) print(tc_pa) # calculate feature representation statistics based on the prioritization tc_s1 <- eval_target_coverage_summary(p1, s1[, "solution_1"]) print(tc_s1) # explore representation by existing protected areas ## calculate number of features adequately represented by existing protected ## areas sum(tc_pa$met) ## summarize representation (values show percent coverage) summary(tc_pa$relative_held * 100) ## visualize representation (values show percent coverage) hist(tc_pa$relative_held * 100, main = "Feature representation by existing protected areas", xlim = c(0, 100), xlab = "Percent coverage of features (%)") # explore representation by prioritization ## summarize representation (values show percent coverage) summary(tc_s1$relative_held * 100) ## calculate number of features adequately represented by the prioritization sum(tc_s1$met) ## visualize representation (values show percent coverage) hist( tc_s1$relative_held * 100, main = "Feature representation by prioritization", xlim = c(0, 100), xlab = "Percent coverage of features (%)" ) ``` We can see that representation of the vegetation communities by existing protected areas is remarkably poor. For example, many of the vegetation communities have nearly zero coverage by existing protected areas. In other words, are almost entirely absent from existing protected areas. We can also see that all vegetation communities have at least 17% coverage by the prioritization -- meaning that it meets the representation targets for all of the features. ## Evaluating importance After generating the prioritization, we can examine the relative importance of planning units selected by the prioritization. This can be useful to identify critically important planning units for conservation -- in other words, places that contain biodiversity features which cannot be represented anywhere else -- and schedule implementation of the prioritization. To achieve this, we will use an incremental rank approach [@r66]. Briefly, this approach involves generating incremental prioritizations with increasing budgets, wherein planning units selected in a previous increment are locked in to the following solution. Additionally, locked out constraints are used to ensure that only planning units selected in the original solution are available for selection. If you're interested, other approaches for examining importance are also available (see [`?importance`](https://prioritizr.net/reference/importance.html)). ```{r, message = FALSE, results = "hide"} # calculate relative importance imp_s1 <- eval_rank_importance(p1, s1["solution_1"], n = 10) print(imp_s1) # manually set locked in planning units to -1 to help with visualization, # this way we can easily see the importance scores for the priority areas imp_s1$rs[tas_pu$locked_in] <- -1 ``` ```{r, fig.width = w, fig.height = h} # plot map of importance scores plot(st_as_sf(imp_s1[, "rs"]), main = "Overall importance") ``` ## Portfolios Conservation planning exercises often involve generating multiple different prioritizations. This can help decision makers consider different options, and provide starting points for building consensus among stakeholders. To generate a range of different prioritizations given the same problem formulation, we can use portfolio functions. Here we will use the gap portfolio to generate 1000 solutions that are within 20% of optimality. Please note that you will need to have the *Gurobi* solver installed to use this specific portfolio. If you don't have access to *Gurobi*, you could try using the shuffle portfolio instead (using the `add_shuffle_portfolio()` function). ```{r} # create new problem with a portfolio added to it p2 <- p1 %>% add_gap_portfolio(number_solutions = 1000, pool_gap = 0.2) # print problem print(p2) # generate prioritizations prt <- solve(p2) print(prt) ``` After generating all these prioritizations, we now want some way to visualize them. Because it would be onerous to look at each and every prioritization individually, we will use statistical analyses to help us. We can visualize the differences between these different prioritizations -- based on which planning units they selected -- using a hierarchical cluster analysis [@r35]. ```{r, fig.height = 4.5, fig.width = 7, fig.show = "hold"} # extract solutions prt_results <- sf::st_drop_geometry(prt) prt_results <- prt_results[, startsWith(names(prt_results), "solution_")] # calculate pair-wise distances between different prioritizations for analysis prt_dists <- vegan::vegdist(t(prt_results), method = "jaccard", binary = TRUE) # run cluster analysis prt_clust <- hclust(as.dist(prt_dists), method = "average") # visualize clusters opar <- par() par(oma = c(0, 0, 0, 0), mar= c(0, 4.1, 1.5, 2.1)) plot( prt_clust, labels = FALSE, sub = NA, xlab = "", main = "Different prioritizations in portfolio" ) suppressWarnings(par(opar)) ``` We can see that there are approximately six main groups of prioritizations in the portfolio. To explore these different groups, let's conduct another cluster analysis (i.e., a _k_-medoids analysis) to extract the most representative prioritization from each of these groups. In other words, we will run another statistical analysis to find the most central prioritization within each group. ```{r, fig.width = 7, fig.height = 5} # run k-medoids analysis prt_med <- pam(prt_dists, k = 6) # extract names of prioritizations that are most central for each group. prt_med_names <- prt_med$medoids print(prt_med_names) # create a copy of prt and set values for locked in planning units to -1 # so we can easily visualize differences between prioritizations prt2 <- prt[, prt_med_names] prt2[which(tas_pu$locked_in > 0.5), prt_med_names] <- -1 # plot a map showing main different prioritizations # dark grey: locked in planning units # grey: planning units not selected # green: selected planning units plot(st_as_sf(prt2), pal = c("grey60", "grey90", "darkgreen")) ``` ## _Marxan_ compatibility The _prioritizr R_ package provides functionality to help _Marxan_ users generate prioritizations. Specifically, it can import conservation planning data prepared for _Marxan_, and can generate prioritizations using a similar problem formulation as _Marxan_ [based on @r1]. Indeed, the problem formulation presented earlier in this vignette is very similar to that used by _Marxan_. The key difference is that the problem formulation we specified earlier uses "hard constraints" for feature representation, and _Marxan_ uses "soft constraints" for feature representation. This means that prioritization we generated earlier was mathematically guaranteed to reach the targets for all features. However, if we used _Marxan_ to generate the prioritization, then we could have produced a prioritization that would fail to reach targets (depending the _Species Penalty Factors_ used to generate the prioritization). In addition to these differences in terms problem formulation, the _prioritizr R_ package uses exact algorithms -- instead of the simulated annealing algorithm -- which ensures that we obtain prioritizations that are near optimal. Here we will show the _prioritizr R_ package can import _Marxan_ data and generate a prioritization. To begin with, let's import a conservation planning data prepared for _Marxan_. ```{r} # import data ## planning unit data pu_path <- system.file("extdata/marxan/input/pu.dat", package = "prioritizr") pu_data <- read.csv(pu_path, header = TRUE, stringsAsFactors = FALSE) print(head(pu_data)) ## feature data spec_path <- system.file( "extdata/marxan/input/spec.dat", package = "prioritizr" ) spec_data <- read.csv(spec_path, header = TRUE, stringsAsFactors = FALSE) print(head(spec_data)) ## amount of each feature within each planning unit data puvspr_path <- system.file( "extdata/marxan/input/puvspr.dat", package = "prioritizr" ) puvspr_data <- read.csv(puvspr_path, header = TRUE, stringsAsFactors = FALSE) print(head(puvspr_data)) ## boundary data bound_path <- system.file( "extdata/marxan/input/bound.dat", package = "prioritizr" ) bound_data <- read.table(bound_path, header = TRUE, stringsAsFactors = FALSE) print(head(bound_data)) ``` After importing the data, we can now generate a prioritization based on the _Marxan_ problem formulation (using the `marxan_problem()` function). **Please note that this function does not generate prioritizations using _Marxan_.** Instead, it uses the data to create an optimization problem formulation similar to _Marxan_ -- using hard constraints instead of soft constraints -- and uses an exact algorithm solver to generate a prioritization. ```{r} # create problem p3 <- marxan_problem( pu_data, spec_data, puvspr_data, bound_data, blm = 0.0005 ) # print problem print(p3) # solve problem s3 <- solve(p3) # print first six rows of solution object print(head(s3)) ``` ## Conclusion This tutorial shows how the _prioritizr R_ package can be used to build a conservation problem, generate a prioritization, and evaluate it. Although we explored just a few functions, the package provides many different functions so that you can build and custom-tailor conservation planning problems to suit your needs. To learn more about the package, please see the package vignettes for [an overview of the package](package_overview.html), [instructions for installing the _Gurobi_ optimization suite](gurobi_installation_guide.html), [benchmarks comparing the performance of different solvers](solver_benchmarks.html), and [a record of publications that have cited the package](publication_record.html). In addition to this tutorial, the package also provides tutorials on [incorporating connectivity into prioritizations](connectivity_tutorial.html), [calibrating trade-offs between different criteria](calibrating_trade-offs_tutorial.html) (e.g., total cost and spatial fragmentation), and [creating prioritizations that have multiple management zones or management actions](management_zones_tutorial.html). ## References