Data Preparation
To train the deepSSF model, we need to crop out local layers for each of our covariates at every observed step in the trajectory. These local layers must be centred on the current step (as the movement kernel is centred on the centre cell), and the next step becomes our ‘target’.
An example for a single step may look something like the image below, where we have a local covariate that is 101 x 101 cells (an odd number so we have a central cell) that are 25 m x 25 m each, resulting in a map that is 2525 m x 2525 m. This is a distance of 1262.5 m to the nearest edge from the central cell, which covers about 97% of the observed step lengths.
What we are trying to predict, or the ‘target’, is the location of the next step, which might look something like this:
Therefore, for each step we need the cropped out ‘local’ layers, which are 101 x 101 cells with the current location in the centre, and the target, which is the location of the next step. Once we have those for every step we are ready to start training the deep learning model. We also want the temporal covariates, but these we just save in a dataframe, which we turn into grids for training during the deepSSF_train scripts.
We save the local spatial layers as one stack per covariate, with a layer for each step. So if we had layers for NDVI and slope that we wanted to use as covariates, and there were 1,000 steps in the trajectory, we would save a raster stack object for NDVI with 1,000 layers, one for slope with 1,000 layers, and then another stack with 1,000 layers for the target (which has values of 0 everywhere except at the location of the next step, which is 1).
When we import these raster stacks into Python we turn them into NumPy arrays and then turn them into PyTorch tensors (which are required for training the model using PyTorch), and stack the covariates together along a different dimension that represents each of the covariates, ready for training.
Another option is to save every local layers for each covariate separately, and have the filenames stored in a dataframe, with a column for each of the covariates. This setup is a more common approach for importing image data for training deep learning models, and we may opt for this in future applications of the approach, particularly when there is more data as the raster objects can be quite large.
Scripts
There is a script for preparing the data using ‘derived’ covariates (NDVI, canopy cover, herbaceous vegetation and slope), and a script for preparing data using Sentinel-2 imagery, which has 12 Sentinel-2 bands, as well as slope. We could have cropped out the local layers for all of the covariates (derived + Sentinel-2) in a single script, although each one takes a while to run, so we thought it best to keep them separate in case all layers aren’t needed (it should be straightforward to add in your own layers).
Both of the data preparation scripts are written in R.