In this script, we will assess how well the deepSSF model predicts the next step in the observed trajectory.
Whilst the realism and emergent properties of simulated trajectories are difficult to assess, we can validate the deepSSF models on their predictive performance at the next step, for each of the habitat selection, movement and next-step probability surfaces. Ensuring that the probability surfaces are normalised to sum to one, they can be compared to the predictions of typical step selection functions when the same probability surfaces are generated for the same local covariates. This approach not only allows for comparison between models, but can be informative as to when in the observed trajectory the model performs well or poorly, which can be analysed across the entire tracking period or for each hour of the day, and can lead to critical evaluation of the covariates that are used by the models and allow for model refinement.
Import packages
Code
# If using Google Colab, uncomment the following line# !pip install rasterioimport sysprint(sys.version) # Print Python version in useimport numpy as np # Array operationsimport matplotlib.pyplot as plt # Plotting libraryimport torch # Main PyTorch libraryimport torch.optim as optim # Optimization algorithmsimport torch.nn as nn # Neural network modulesimport os # Operating system utilitiesimport glob # File path pattern matchingimport pandas as pd # Data manipulationimport rasterio # Geospatial raster datafrom datetime import datetime, timedelta # Date/time utilitiesfrom rasterio.plot import show # Plot raster data import deepSSF_model # Import the deepSSF modelimport deepSSF_utils # Import the deepSSF utilities# Get today's datetoday_date = datetime.today().strftime('%Y-%m-%d')
3.12.5 | packaged by Anaconda, Inc. | (main, Sep 12 2024, 18:18:29) [MSC v.1929 64 bit (AMD64)]
If using Google Colab, uncomment the following lines
The file directories will also need to be changed to match the location of the files in your Google Drive.
Code
# from google.colab import drive# drive.mount('/content/drive')
Import the GPS data
We only use this for selecting a spatial extent for the area we want to predict over.
Code
# select the id of that data that the model was trained onbuffalo_id =2005n_samples =10297# 2005 has 10297 samples# Specify the path to your CSV filecsv_file_path =f'../buffalo_local_data_id/buffalo_{buffalo_id}_data_df_lag_1hr_n{n_samples}.csv'# Read the CSV file into a DataFramebuffalo_df = pd.read_csv(csv_file_path)# Display the first few rows of the DataFrameprint(buffalo_df.head())# Lag the values in column 'A' by one indexbuffalo_df['bearing_tm1'] = buffalo_df['bearing'].shift(1)# Pad the missing value with a specified value, e.g., 0buffalo_df['bearing_tm1'] = buffalo_df['bearing_tm1'].fillna(0)# Display the first few rows of the DataFrameprint(buffalo_df.head())
Instead of importing the stacks of local layers (one for each step), here we want to import the spatial covariates for the extent we want to simulate over. We use an extent that covers all of the observed locations, which refer to as the ‘landscape’.
NDVI
We have monthly NDVI layers for 2018 and 2019, which we import as a stack. The files don’t import with a time component, so we will use a function further down that indexes them correctly.
Code
# for monthly NDVIfile_path ='../mapping/cropped rasters/ndvi_monthly.tif'# read the raster filewith rasterio.open(file_path) as src:# Read the raster band as separate variable ndvi_landscape = src.read([i for i inrange(1, src.count +1)])# Get the metadata of the raster ndvi_meta = src.meta raster_transform = src.transform# Print the metadata to check for time componentprint("Metadata:", ndvi_meta)# Check for specific time-related metadataif'TIFFTAG_DATETIME'in src.tags():print("Time component found:", src.tags()['TIFFTAG_DATETIME'])else:print("No explicit time component found in metadata.")# the rasters don't contain a time component, so we will use a function later to index the layers correctly
Metadata: {'driver': 'GTiff', 'dtype': 'float32', 'nodata': nan, 'width': 2400, 'height': 2280, 'count': 24, 'crs': CRS.from_wkt('LOCAL_CS["GDA94 / Geoscience Australia Lambert",UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]'), 'transform': Affine(25.0, 0.0, 0.0,
0.0, -25.0, -1406000.0)}
No explicit time component found in metadata.
Prepare the NDVI data
There are a few things we need to do to prepare the landscape layers.
First, we need to ensure that there are no NA values in the data. For NDVI we will replace any NA values with -1 (which denotes water), as in our case that is typically why they were set to NA.
Secondly, the model expects the covariates to on the same scale as the training data. We will therefore scale the NDVI data using the same max and min scaling parameters as the training data. To get these, there are some min and max print statements in the deepSSF_train.ipynb script. When we plot the NDVI data below we will see that the values will no longer range from 0 to 1, which is because there are values in the landscape layers that are outside of the range of the training data.
Code
# Check the coordinate reference systemprint("NDVI metadata:")print(ndvi_meta)print("\n")# Have a look at the affine transformation parameters that are used to convert pixel # coordinates to geographic coordinates and vice versaprint("Affine transformation parameters:")print(raster_transform)print("\n")# Check the shape (layers, row, columns) of the rasterprint("Shape of the raster:")print(ndvi_landscape.shape)# Replace NaNs in the original array with -1, which represents waterndvi_landscape = np.nan_to_num(ndvi_landscape, nan=-1.0)# from the stack of local layers (training data)ndvi_max =0.8220ndvi_min =-0.2772# Convert the numpy array to a PyTorch tensorndvi_landscape_tens = torch.from_numpy(ndvi_landscape)# Normalizing the datandvi_landscape_norm = (ndvi_landscape_tens - ndvi_min) / (ndvi_max - ndvi_min)# Show two example layers of the scaled NDVI datalayer_index =1plt.imshow(ndvi_landscape_norm[layer_index,:,:].numpy())plt.colorbar() plt.title(f'NDVI layer index {layer_index}')plt.show()layer_index =8plt.imshow(ndvi_landscape_norm[layer_index,:,:].numpy())plt.colorbar() plt.title(f'NDVI layer index {layer_index}')plt.show()
# Path to the canopy cover raster filefile_path ='../mapping/cropped rasters/canopy_cover.tif'# read the raster filewith rasterio.open(file_path) as src:# Read the raster band as separate variable canopy_landscape = src.read(1)# Get the metadata of the raster canopy_meta = src.meta
Prepare the canopy cover data
As with the NDVI data, we need to ensure that there are no NA values in the data.
As the canopy cover values in the landscape layer are within the same range as the training data, we see that the values range from 0 to 1.
Code
# Check the canopy metadata:print("Canopy metadata:")print(canopy_meta)print("\n")# Check the shape (rows, columns) of the canopy raster:print("Shape of canopy raster:")print(canopy_landscape.shape)print("\n")# Check for NA values in the canopy raster:print("Number of NA values in the canopy raster:")print(np.isnan(canopy_landscape).sum())# Define the maximum and minimum canopy values from the stack of local layers:canopy_max =82.5000canopy_min =0.0# Convert the canopy data from a NumPy array to a PyTorch tensor:canopy_landscape_tens = torch.from_numpy(canopy_landscape)# Normalise the canopy data:canopy_landscape_norm = (canopy_landscape_tens - canopy_min) / (canopy_max - canopy_min)# Visualise the normalised canopy cover:plt.imshow(canopy_landscape_norm.numpy())plt.colorbar()plt.title('Canopy Cover')plt.show()
Canopy metadata:
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.3999999521443642e+38, 'width': 2400, 'height': 2280, 'count': 1, 'crs': CRS.from_wkt('LOCAL_CS["GDA94 / Geoscience Australia Lambert",UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]'), 'transform': Affine(25.0, 0.0, 0.0,
0.0, -25.0, -1406000.0)}
Shape of canopy raster:
(2280, 2400)
Number of NA values in the canopy raster:
0
Herbaceous vegetation
Code
# Path to the herbaceous vegetation raster filefile_path ='../mapping/cropped rasters/veg_herby.tif'# read the raster filewith rasterio.open(file_path) as src:# Read the raster band as separate variable herby_landscape = src.read(1)# Get the metadata of the raster herby_meta = src.meta
Code
# Check the herbaceous metadata:print("Herbaceous metadata:")print(herby_meta)print("\n")# Check the shape (rows, columns) of the herbaceous raster:print("Shape of herbaceous raster:")print(herby_landscape.shape)print("\n")# Check for NA values in the herby raster:print("Number of NA values in the herbaceous vegetation raster:")print(np.isnan(herby_landscape).sum())# Define the maximum and minimum herbaceous values from the stack of local layers:herby_max =1.0herby_min =0.0# Convert the herbaceous data from a NumPy array to a PyTorch tensor:herby_landscape_tens = torch.from_numpy(herby_landscape)# Normalize the herbaceous data:herby_landscape_norm = (herby_landscape_tens - herby_min) / (herby_max - herby_min)# Visualize the normalised herbaceous cover:plt.imshow(herby_landscape_norm.numpy())plt.colorbar()plt.show()
Herbaceous metadata:
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.3999999521443642e+38, 'width': 2400, 'height': 2280, 'count': 1, 'crs': CRS.from_wkt('LOCAL_CS["GDA94 / Geoscience Australia Lambert",UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]'), 'transform': Affine(25.0, 0.0, 0.0,
0.0, -25.0, -1406000.0)}
Shape of herbaceous raster:
(2280, 2400)
Number of NA values in the herbaceous vegetation raster:
0
Slope
Code
# Path to the slope raster filefile_path ='../mapping/cropped rasters/slope.tif'# read the raster filewith rasterio.open(file_path) as src:# Read the raster band as separate variable slope_landscape = src.read(1)# Get the metadata of the raster slope_meta = src.meta
Code
# Check the slope metadata:print("Slope metadata:")print(slope_meta)print("\n")# Check the shape (rows, columns) of the slope landscape raster:print("Shape of slope landscape raster:")print(slope_landscape.shape)print("\n")# Check for NA values in the slope raster:print("Number of NA values in the slope raster:")print(np.isnan(slope_landscape).sum())# Replace NaNs in the slope array with 0.0:slope_landscape = np.nan_to_num(slope_landscape, nan=0.0)# Define the maximum and minimum slope values from the stack of local layers:slope_max =12.2981slope_min =0.0006# Convert the slope landscape data from a NumPy array to a PyTorch tensor:slope_landscape_tens = torch.from_numpy(slope_landscape)# Normalize the slope landscape data:slope_landscape_norm = (slope_landscape_tens - slope_min) / (slope_max - slope_min)# Visualize the slope landscape (note: displaying the original tensor, not the normalised data):plt.imshow(slope_landscape_tens.numpy())plt.colorbar()plt.show()
Slope metadata:
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': nan, 'width': 2400, 'height': 2280, 'count': 1, 'crs': CRS.from_wkt('LOCAL_CS["GDA94 / Geoscience Australia Lambert",UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]'), 'transform': Affine(25.0, 0.0, 0.0,
0.0, -25.0, -1406000.0)}
Shape of slope landscape raster:
(2280, 2400)
Number of NA values in the slope raster:
9356
Convert between numpy array and raster
To check that we can go back and forth between numpy arrays (with pixel coordinates) and rasters (with geographic coordinates), we will convert the slope numpy array to a raster.
Code
# Create a figure and axis with matplotlibfig, ax = plt.subplots(figsize=(6, 6))# Convert the slope_landcape (numpy array) to a raster and plot with the rasterio libraryrasterio.plot.show(slope_landscape, transform=raster_transform, ax=ax, cmap='viridis')# Set the title and labelsax.set_title('Raster with Geographic Coordinates')ax.set_xlabel('Longitude')ax.set_ylabel('Latitude')# Show the plotplt.show()
Subset function (with padding)
Now that we have our landscape layers imported, we need a way to crop out the local layers that can be fed into the deepSSF model as covariates.
We will use the same subset function as in the simulation script, which we stored in the deepSSF_utils.py script. This function will take the landscape layers and a set of coordinates, and return the local layers for those coordinates.
This function also has padding for if the simulated individual was to go off the edge of the landscape, which we retain here (although we won’t need that functionality).
Use the subset function to crop out the local layers for all covariates. Try different locations using the x and y coordinates, which are in geographic coordinates (x = easting/longitude, y = northing/latitude).
Code
# Pick a location (x, y) from the buffalo DataFramex = buffalo_df['x1_'].iloc[0]y = buffalo_df['y1_'].iloc[0]# Define the size of the window to extractwindow_size =101# Select the NDVI layer indexwhich_ndvi =1# Extract subsets from various raster layers using the custom function.# Each call centres the window at the specified (x, y) location and applies padding where necessary.ndvi_subset, origin_x, origin_y = subset_raster(ndvi_landscape_norm[which_ndvi, :, :], x, y, window_size, raster_transform)canopy_subset, origin_x, origin_y = subset_raster(canopy_landscape_norm, x, y, window_size, raster_transform)herby_subset, origin_x, origin_y = subset_raster(herby_landscape_norm, x, y, window_size, raster_transform)slope_subset, origin_x, origin_y = subset_raster(slope_landscape_norm, x, y, window_size, raster_transform)# Create a 2x2 grid of subplots with a fixed figure size.fig, axs = plt.subplots(2, 2, figsize=(10, 8))# Plot the NDVI subset.im0 = axs[0, 0].imshow(ndvi_subset.numpy(), cmap='viridis')fig.colorbar(im0, ax=axs[0, 0], shrink=0.8)axs[0, 0].set_title('NDVI Subset')# Plot the Canopy Cover subset.im1 = axs[0, 1].imshow(canopy_subset.numpy(), cmap='viridis')fig.colorbar(im1, ax=axs[0, 1], shrink=0.8)axs[0, 1].set_title('Canopy Cover Subset')# Plot the Herbaceous Vegetation subset.im2 = axs[1, 0].imshow(herby_subset.numpy(), cmap='viridis')fig.colorbar(im2, ax=axs[1, 0], shrink=0.8)axs[1, 0].set_title('Herbaceous Vegetation Subset')# Plot the Slope subset.im3 = axs[1, 1].imshow(slope_subset.numpy(), cmap='viridis')fig.colorbar(im3, ax=axs[1, 1], shrink=0.8)axs[1, 1].set_title('Slope Subset')
Text(0.5, 1.0, 'Slope Subset')
Create a mask for edge cells
Due to the padding at the edges of the covariates, convolutional layers create artifacts that can affect the colour scale of the predictions when plotting. To avoid this, we will create a mask that we can apply to the predictions to remove the edge cells.
Code
# Create a mask to remove the edge values for plotting # (as it affects the colour scale)x_mask = np.ones_like(ndvi_subset)y_mask = np.ones_like(ndvi_subset)# Mask out bordering cellsx_mask[:, :3] =-np.infx_mask[:, 98:] =-np.infy_mask[:3, :] =-np.infy_mask[98:, :] =-np.inf
Load the model
Set the device for the model
Code
# run on the GPU or on the CPU, if a GPU is not availabledevice = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')print(f"Using {device} device")
Using cpu device
Define the parameters for the model
Here we enter the specific parameter values and hyperparameters for the model. These are the values that will be used to instantiate the model.
Code
# Define the parameters for the modelparams_dict = {"batch_size": 32, #number of samples in each batch"image_dim": 101, #number of pixels along the edge of each local patch/image"pixel_size": 25, #number of metres along the edge of a pixel"dim_in_nonspatial_to_grid": 4, #the number of scalar predictors that are converted to a grid and appended to the spatial features"dense_dim_in_nonspatial": 4, #change this to however many other scalar predictors you have (bearing, velocity etc)"dense_dim_hidden": 128, #number of nodes in the hidden layers"dense_dim_out": 128, #number of nodes in the output of the fully connected block (FCN)"dense_dim_in_all": 2500,# + 128, #number of inputs entering the fully connected block once the nonspatial features have been concatenated to the spatial features"input_channels": 4+4, #number of spatial layers in each image + number of scalar layers that are converted to a grid"output_channels": 4, #number of filters to learn"kernel_size": 3, #the size of the 2D moving windows / kernels that are being learned"stride": 1, #the stride used when applying the kernel. This reduces the dimension of the output if set to greater than 1"kernel_size_mp": 2, #the size of the kernel that is used in max pooling operations"stride_mp": 2, #the stride that is used in max pooling operations"padding": 1, #the amount of padding to apply to images prior to applying the 2D convolution"num_movement_params": 12, #number of parameters used to parameterise the movement kernel"dropout": 0.1, #the proportion of nodes that are dropped out in the dropout layers"device": device }
As described in the deepSSF_train.ipynb script, we saved the model definition into a file named deepSSF_model.py. We can instantiate the model by importing the file (which was done when importing other packages) and calling the classes parameter dictionary from that script.
# # load the model weights# print(model.state_dict())model.load_state_dict(torch.load(f'model_checkpoints/checkpoint_CNN_buffalo2005_2025-02-04.pt', map_location=torch.device('cpu'), weights_only=True))# print(model.state_dict())# model.eval()
<All keys matched successfully>
Setup simulation parameters
To get the simulation running we need a few extra functions.
Firstly, we need to index the NDVI layers correctly, based on the time of the simulated location. We’ll do this by creating a function that takes day of the year of the simulated location and returns the correct index for the NDVI layers.
Recall that Python indexes from 0, so when the month_index is equal to 2 for instance, this will index the third layer, which is for March.
Code
# Create a mapping from day of the year to month indexdef day_to_month_index(day_of_year):# Calculate the year and the day within that year base_date = datetime(2018, 1, 1) # base date for the calculation, which is when the NDVI layers start date = base_date + timedelta(days=int(day_of_year) -1) year_diff = date.year - base_date.year month_index = (date.month -1) + (year_diff *12) # month index (0-based, accounting for year change)return month_indexyday =70# day of the year, which is March 11thmonth_index = day_to_month_index(yday)print(month_index)
2
Next-step probability values
We can now calculate the next-step probabilities for each observed step. As we generate habitat selection, movement and next-step probability surfaces, we can get the predicted probability values for each one, which can be compared to the respective process in the SSF.
The process for generating the next-step probabilities is as follows:
Get the current location of the individual
Crop out the local layers for the current location
Run the model of the local layers to get the habitat selection, movement and next-step probability surfaces
Get the predicted probability values at the location of the next step
Store the predicted probability values and export them as a csv for comparison with the SSF
First, select the data to generate prediction values for. For testing the function we can select a subset.
Code
# To select a subset of samples to test the function# test_data = buffalo_df.iloc[0:10]# To select all of the datatest_data = buffalo_df# Get the number of samples in the test datan_samples =len(test_data)print(f'Number of samples: {n_samples}')# Create empty vectors to store the predicted probabilitieshabitat_probs = np.repeat(0., n_samples)move_probs = np.repeat(0., n_samples)next_step_probs = np.repeat(0., n_samples)
Number of samples: 10103
Directory for saving next-step prediction plots
Code
# output directory for saving probability valuesoutput_dir =f'../outputs/next_step_validation/id{buffalo_id}'os.makedirs(output_dir, exist_ok=True)
Loop over each step
Code
# Start at 1 so the bearing at t - 1 is availablefor i inrange(1, n_samples): sample = test_data.iloc[i]# Current location (x1, y1) x = sample['x1_'] y = sample['y1_']# Convert geographic coordinates to pixel coordinates px, py =~raster_transform * (x, y)# Next step location (x2, y2) x2 = sample['x2_'] y2 = sample['y2_']# Convert geographic coordinates to pixel coordinates px2, py2 =~raster_transform * (x2, y2)# The difference in x and y coordinates d_x = x2 - x d_y = y2 - y# print('d_x and d_y are ', d_x, d_y) # Debugging# Temporal covariates hour_t2_sin = sample['hour_t2_sin'] hour_t2_cos = sample['hour_t2_cos'] yday_t2_sin = sample['yday_t2_sin'] yday_t2_cos = sample['yday_t2_cos']# Bearing of previous step (t - 1) bearing = sample['bearing_tm1']# Hour of the day (for saving the plot) hour_t2 = sample['hour_t2']# Day of the year yday = sample['yday_t2']# Convert day of the year to month index month_index = day_to_month_index(yday)# print(month_index)# Extract the subset of the covariates at the location of x1, y1# NDVI ndvi_subset, origin_x, origin_y = subset_raster(ndvi_landscape_norm[month_index,:,:], x, y, window_size, raster_transform)# Canopy cover canopy_subset, origin_x, origin_y = subset_raster(canopy_landscape_norm, x, y, window_size, raster_transform)# Herbaceous vegetation herby_subset, origin_x, origin_y = subset_raster(herby_landscape_norm, x, y, window_size, raster_transform)# Slope slope_subset, origin_x, origin_y = subset_raster(slope_landscape_norm, x, y, window_size, raster_transform)# Location of the next step in local pixel coordinates px2_subset = px2 - origin_x py2_subset = py2 - origin_y# print('px2_subset and py2_subset are ', px2_subset, py2_subset) # Debugging# Extract the value of the covariates at the location of x2, y2# value = ndvi_subset.detach().cpu().numpy()[(int(py2_subset), int(px2_subset))]# Stack the channels along a new axis x1 = torch.stack([ndvi_subset, canopy_subset, herby_subset, slope_subset], dim=0)# Add a batch dimension (required to be the correct dimension for the model) x1 = x1.unsqueeze(0)# print(x1.shape)# Convert lists to PyTorch tensors hour_t2_sin_tensor = torch.tensor(hour_t2_sin).float() hour_t2_cos_tensor = torch.tensor(hour_t2_cos).float() yday_t2_sin_tensor = torch.tensor(yday_t2_sin).float() yday_t2_cos_tensor = torch.tensor(yday_t2_cos).float()# Stack tensors x2 = torch.stack((hour_t2_sin_tensor.unsqueeze(0), hour_t2_cos_tensor.unsqueeze(0), yday_t2_sin_tensor.unsqueeze(0), yday_t2_cos_tensor.unsqueeze(0)), dim=1)# print(x2)# print(x2.shape)# Put bearing in the correct dimension (batch_size, 1) bearing = torch.tensor(bearing).float().unsqueeze(0).unsqueeze(0)# print(bearing)# print(bearing.shape)# -------------------------------------------------------------------------# Run the model# ------------------------------------------------------------------------- model_output = model((x1, x2, bearing))# -------------------------------------------------------------------------# Habitat selection probability# ------------------------------------------------------------------------- hab_density = model_output.detach().cpu().numpy()[0,:,:,0] hab_density_exp = np.exp(hab_density)# print(np.sum(hab_density_exp)) # Should be 1# Store the probability of habitat selection at the location of x2, y2# These probabilities are normalised in the model function habitat_probs[i] = hab_density_exp[(int(py2_subset), int(px2_subset))]# print('Habitat probability value = ', habitat_probs[i])# -------------------------------------------------------------------------# Movement probability# ------------------------------------------------------------------------- move_density = model_output.detach().cpu().numpy()[0,:,:,1] move_density_exp = np.exp(move_density)# print(np.sum(move_density_exp)) # Should be 1# Store the movement probability at the location of x2, y2# These probabilities are normalised in the model function move_probs[i] = move_density_exp[(int(py2_subset), int(px2_subset))]# print('Movement probability value = ', move_probs[i])# -------------------------------------------------------------------------# Next step probability# ------------------------------------------------------------------------- step_density = hab_density + move_density step_density_exp = np.exp(step_density)# print('Sum of step density exp = ', np.sum(step_density_exp)) # Won't be 1 step_density_exp_norm = step_density_exp / np.sum(step_density_exp)# print('Sum of step density exp norm = ', np.sum(step_density_exp_norm)) # Should be 1# Extract the value of the covariates at the location of x2, y2 next_step_probs[i] = step_density_exp_norm[(int(py2_subset), int(px2_subset))]# print('Next-step probability value = ', next_step_probs[i])# -------------------------------------------------------------------------# Plot the next-step predictions# -------------------------------------------------------------------------# Plot the first few probability surfaces - change the condition to i < n_steps to plot allif i <6:# Mask out bordering cells hab_density_mask = hab_density * x_mask * y_mask move_density_mask = move_density * x_mask * y_mask step_density_mask = step_density * x_mask * y_mask# Create a mask for the next step next_step_mask = np.ones_like(hab_density) next_step_mask[int(py2_subset), int(px2_subset)] =-np.inf# Plot the outputs fig_out, axs_out = plt.subplots(2, 2, figsize=(10, 8))# Plot NDVI im1 = axs_out[0, 0].imshow(ndvi_subset.numpy(), cmap='viridis') axs_out[0, 0].set_title('NDVI') fig_out.colorbar(im1, ax=axs_out[0, 0], shrink=0.7)# Plot habitat selection log-probability im2 = axs_out[0, 1].imshow(hab_density_mask * next_step_mask, cmap='viridis') axs_out[0, 1].set_title('Habitat selection log-probability') fig_out.colorbar(im2, ax=axs_out[0, 1], shrink=0.7)# Movement density log-probability im3 = axs_out[1, 0].imshow(move_density_mask * next_step_mask, cmap='viridis') axs_out[1, 0].set_title('Movement log-probability') fig_out.colorbar(im3, ax=axs_out[1, 0], shrink=0.7)# Next-step probability im4 = axs_out[1, 1].imshow(step_density_mask * next_step_mask, cmap='viridis') axs_out[1, 1].set_title('Next-step log-probability') fig_out.colorbar(im4, ax=axs_out[1, 1], shrink=0.7) filename_covs =f'{output_dir}/id{buffalo_id}_step{i+1}_yday{yday}_hour{hour_t2}.png' plt.tight_layout() plt.savefig(filename_covs, dpi=600, bbox_inches='tight') plt.show() plt.close() # Close the figure to free memory
As each cell has a probability values, we can calculate what the probability would be if the model provided no information at all, and each cell was equally likely to be the next step. This is just 1 divided by the total number of cells.
rolling_window_size =100# Rolling window size# Convert to pandas Series and compute rolling meanrolling_mean_habitat = pd.Series(habitat_probs).rolling(window=window_size, center=True).mean()rolling_mean_movement = pd.Series(move_probs).rolling(window=window_size, center=True).mean()rolling_mean_next_step = pd.Series(next_step_probs).rolling(window=window_size, center=True).mean()
Plot the probabilities
We can get an idea of how variable the probabilities are for the habitat selection and movement surfaces, and for the next-step probabilities, by plotting them across the trajectory
Code
# Plot the habitat probs through time as a line graphplt.plot(habitat_probs[habitat_probs >0], color='blue', label='Habitat Probabilities')plt.plot(rolling_mean_habitat[rolling_mean_habitat >0], color='red', label='Rolling Mean')plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probsplt.xlabel('Index')plt.ylabel('Probability')plt.title('Habitat Probability')plt.legend() # Add legend to differentiate linesplt.show()# Plot the movement probs through time as a line graphplt.plot(move_probs[move_probs >0], color='blue', label='Movement Probabilities')plt.plot(rolling_mean_movement[rolling_mean_movement >0], color='red', label='Rolling Mean')plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probsplt.xlabel('Index')plt.ylabel('Probability')plt.title('Movement Probability')plt.legend() # Add legend to differentiate linesplt.show()# Plot the next step probs through time as a line graphplt.plot(next_step_probs[next_step_probs >0], color='blue', label='Next Step Probabilities')plt.plot(rolling_mean_next_step[rolling_mean_next_step >0], color='red', label='Rolling Mean')plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probsplt.xlabel('Index')plt.ylabel('Probability')plt.title('Next Step Probability')plt.legend() # Add legend to differentiate linesplt.show()
Save the probabilities
We can save the probabilities to a csv file to compare with the SSF probabilities.
Code
# output directory for saving probability valuesoutput_dir =f'../outputs/next_step_validation'os.makedirs(output_dir, exist_ok=True)# Append the probabilities to the dataframebuffalo_df['habitat_probs'] = habitat_probsbuffalo_df['move_probs'] = move_probsbuffalo_df['next_step_probs'] = next_step_probscsv_filename =f'{output_dir}/deepSSF_id{buffalo_id}_n{len(test_data)}_{today_date}.csv'print(csv_filename)buffalo_df.to_csv(csv_filename, index=True)
Calculate probability values for other individuals
Here we want to take the model that we fitted to the focal individual, and see how well it can predict the steps of other individuals, which are in quite different areas of the landscape and occupy different environmental spaces.
We follow exactly the same process as above (but with the different locations). We are still trying the predict the next step of the individual, but they are different trajectories.
We consider this an out-of-sample test of the model.
Load dataframes for each individual
As a check that the function is giving us what we expect we also load in the focal individual’s data and calculate the probabilities for that individual (which should give exactly the same values as above).
Code
# Step 1: Specify the directory containing your TIFF filesdata_dir ='../buffalo_local_data_id/'# Replace with the actual path to your TIFF files# Step 2: Use glob to get a list of all TIFF files matching the patterncsv_files = glob.glob(os.path.join(data_dir, 'buffalo_*.csv')) print(f'Found {len(csv_files)} CSV files')print('\n'.join(csv_files))# select only 2005csv_files = csv_files# csv_files = [csv_files[0]] # to select a single id# print(csv_files)
for j inrange(0, len(csv_files)):# Read one of the CSV files into a DataFrame buffalo_df = pd.read_csv(csv_files[j])# Extract ID from the string for saving the file buffalo_id = csv_files[j][55:59]# Lag the bearing values in column 'A' by one index (to get the previous bearing) buffalo_df['bearing_tm1'] = buffalo_df['bearing'].shift(1)# Pad the missing value with a specified value, e.g., 0 buffalo_df['bearing_tm1'] = buffalo_df['bearing_tm1'].fillna(0) test_data = buffalo_df n_samples =len(test_data)# create empty vectors to store the predicted probabilities habitat_probs = np.repeat(0., n_samples) move_probs = np.repeat(0., n_samples) next_step_probs = np.repeat(0., n_samples)# start at 1 so the bearing at t - 1 is availablefor i inrange(1, n_samples): sample = test_data.iloc[i]# Current location (x1, y1) x = sample['x1_'] y = sample['y1_']# Convert geographic coordinates to pixel coordinates px, py =~raster_transform * (x, y)# Next step location (x2, y2) x2 = sample['x2_'] y2 = sample['y2_']# Convert geographic coordinates to pixel coordinates px2, py2 =~raster_transform * (x2, y2)# The difference in x and y coordinates d_x = x2 - x d_y = y2 - y# print('d_x and d_y are ', d_x, d_y) # Debugging# Temporal covariates hour_t2_sin = sample['hour_t2_sin'] hour_t2_cos = sample['hour_t2_cos'] yday_t2_sin = sample['yday_t2_sin'] yday_t2_cos = sample['yday_t2_cos']# Bearing of previous step (t - 1) bearing = sample['bearing_tm1']# Day of the year yday = sample['yday_t2']# Convert day of the year to month index month_index = day_to_month_index(yday)# print(month_index)# Extract the subset of the covariates at the location of x1, y1# NDVI ndvi_subset, origin_x, origin_y = subset_raster(ndvi_landscape_norm[month_index,:,:], x, y, window_size, raster_transform)# Canopy cover canopy_subset, origin_x, origin_y = subset_raster(canopy_landscape_norm, x, y, window_size, raster_transform)# Herbaceous vegetation herby_subset, origin_x, origin_y = subset_raster(herby_landscape_norm, x, y, window_size, raster_transform)# Slope slope_subset, origin_x, origin_y = subset_raster(slope_landscape_norm, x, y, window_size, raster_transform)# Location of the next step in local pixel coordinates px2_subset = px2 - origin_x py2_subset = py2 - origin_y# print('px2_subset and py2_subset are ', px2_subset, py2_subset) # Debugging# Extract the value of the covariates at the location of x2, y2# value = ndvi_subset.detach().cpu().numpy()[(int(py2_subset), int(px2_subset))]# Stack the channels along a new axis; here, 1 is commonly used for channel axis in PyTorch x1 = torch.stack([ndvi_subset, canopy_subset, herby_subset, slope_subset], dim=0) x1 = x1.unsqueeze(0)# print(x1.shape)# Convert lists to PyTorch tensors hour_t2_sin_tensor = torch.tensor(hour_t2_sin).float() hour_t2_cos_tensor = torch.tensor(hour_t2_cos).float() yday_t2_sin_tensor = torch.tensor(yday_t2_sin).float() yday_t2_cos_tensor = torch.tensor(yday_t2_cos).float()# Stack tensors x2 = torch.stack((hour_t2_sin_tensor.unsqueeze(0), hour_t2_cos_tensor.unsqueeze(0), yday_t2_sin_tensor.unsqueeze(0), yday_t2_cos_tensor.unsqueeze(0)), dim=1)# print(x2.shape)# Put bearing in the correct dimension (batch_size, 1) bearing = torch.tensor(bearing).float().unsqueeze(0).unsqueeze(0)# print(bearing)# print(bearing.shape)# -------------------------------------------------------------------------# Run the model# ------------------------------------------------------------------------- model_output = model((x1, x2, bearing))# -------------------------------------------------------------------------# Habitat selection probability# ------------------------------------------------------------------------- hab_density = model_output.detach().cpu().numpy()[0,:,:,0] hab_density_exp = np.exp(hab_density)# print(np.sum(hab_density_exp)) # Should be 1# Store the probability of habitat selection at the location of x2, y2# These probabilities are normalised in the model function habitat_probs[i] = hab_density_exp[(int(py2_subset), int(px2_subset))]# print('Habitat probability value = ', habitat_probs[i])# -------------------------------------------------------------------------# Movement probability# ------------------------------------------------------------------------- move_density = model_output.detach().cpu().numpy()[0,:,:,1] move_density_exp = np.exp(move_density)# print(np.sum(move_density_exp)) # Should be 1# Store the movement probability at the location of x2, y2# These probabilities are normalised in the model function move_probs[i] = move_density_exp[(int(py2_subset), int(px2_subset))]# print('Movement probability value = ', move_probs[i])# -------------------------------------------------------------------------# Next step probability# ------------------------------------------------------------------------- step_density = hab_density + move_density step_density_exp = np.exp(step_density)# print('Sum of step density exp = ', np.sum(step_density_exp)) # Won't be 1 step_density_exp_norm = step_density_exp / np.sum(step_density_exp)# print('Sum of step density exp norm = ', np.sum(step_density_exp_norm)) # Should be 1# Extract the value of the covariates at the location of x2, y2 next_step_probs[i] = step_density_exp_norm[(int(py2_subset), int(px2_subset))]# print('Next-step probability value = ', next_step_probs[i])# Convert to pandas Series and compute rolling mean rolling_mean_habitat = pd.Series(habitat_probs).rolling(window=window_size, center=True).mean() rolling_mean_movement = pd.Series(move_probs).rolling(window=window_size, center=True).mean() rolling_mean_next_step = pd.Series(next_step_probs).rolling(window=window_size, center=True).mean()# Plot the habitat probs through time as a line graph plt.plot(habitat_probs[habitat_probs >0], color='blue', label='Habitat Probabilities') plt.plot(rolling_mean_habitat[rolling_mean_habitat >0], color='red', label='Rolling Mean') plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probs plt.xlabel('Index') plt.ylabel('Probability') plt.title(f'Habitat Probability - ID {buffalo_id}') plt.legend() # Add legend to differentiate lines plt.show()# Plot the movement probs through time as a line graph plt.plot(move_probs[move_probs >0], color='blue', label='Movement Probabilities') plt.plot(rolling_mean_movement[rolling_mean_movement >0], color='red', label='Rolling Mean') plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probs plt.xlabel('Index') plt.ylabel('Probability') plt.title(f'Movement Probability - ID {buffalo_id}') plt.legend() # Add legend to differentiate lines plt.show()# Plot the next step probs through time as a line graph plt.plot(next_step_probs[next_step_probs >0], color='blue', label='Next Step Probabilities') plt.plot(rolling_mean_next_step[rolling_mean_next_step >0], color='red', label='Rolling Mean') plt.axhline(y=null_prob, color='black', linestyle='--', label='Null Probability') # null probs plt.xlabel('Index') plt.ylabel('Probability') plt.title(f'Next Step Probability - ID {buffalo_id}') plt.legend() # Add legend to differentiate lines plt.show()# Save the data to a CSV file csv_filename =f'{output_dir}/deepSSF_id{buffalo_id}_n{len(test_data)}_{today_date}.csv'print(f'File saved as: {csv_filename}') buffalo_df.to_csv(csv_filename, index=True)