Introduction
Zonal statistics provide valuable insights into spatial data analysis by summarizing raster values within predefined zones. Python offers powerful libraries like rasterio and rasterstats to efficiently compute zonal statistics. In this blog post, we will explore how to calculate zonal statistics for a GeoTIFF file with multiple bands using Python. We will walk through the necessary code and provide explanations along the way.
Prerequisites: To follow along with this tutorial, make sure you have the following prerequisites installed:
- Python (3.6 or higher)
- rasterio library (pip install rasterio)
- rasterstats library (pip install rasterstats)
Step 1: Import the required libraries
We begin by importing the necessary libraries: rasterio and rasterstats. The former is used for reading the raster dataset, while the latter handles the zonal statistics calculations.
import rasterio
from rasterstats import zonal_stats
Step 2: Open the GeoTIFF file
Next, we need to open the GeoTIFF file containing the raster data. Use the rasterio.open function to accomplish this. Specify the path to the GeoTIFF file as the argument. Make sure you are inside a with block to ensure the file is properly closed.
with rasterio.open('raster.tif') as src:
Step 3: Iterate over each band
Since we have a GeoTIFF file with multiple bands, we need to iterate over each band individually to calculate zonal statistics. We use the range function to iterate from the first band (1) to the total number of bands present in the raster dataset (src.count + 1).
for band in range(1, src.count + 1):
Step 4: Calculate zonal statistics
Within the band iteration loop, call the zonal_stats function from the rasterstats library. Pass the following arguments:
- Use src.read(band) to read the data for the current band.
- Specify the affine transformation matrix of the raster dataset using affine=src.transform.
- Optionally, you can specify the statistics you want to calculate using the stats parameter. For example, “mean” will compute the mean value.
stats = zonal_stats('zones.shp', src.read(band), affine=src.transform, stats="mean")
Step 5: Process the results
Within the band iteration loop, call the zonal_stats function from the rasterstats library. Pass the following arguments:
print(f"Band {band}: Mean = {stats[0]['mean']}")
Step 6: Conclusion
Calculating zonal statistics using Python and a GeoTIFF file with multiple bands is made easy with the rasterio and rasterstats libraries. By following the steps outlined in this blog post, you can leverage the power of Python to analyze spatial data within predefined zones.
CODE:
import rasterio
from rasterstats import zonal_stats
# Step 2: Open the GeoTIFF file
with rasterio.open('raster.tif') as src:
# Step 3: Iterate over each band
for band in range(1, src.count + 1):
# Step 4: Calculate zonal statistics
stats = zonal_stats('zones.shp', src.read(band), affine=src.transform, stats="mean")
# Step 5: Process the results
# For example, print the mean value
print(f"Band {band}: Mean = {stats[0]['mean']}")
If you want a json output:
import rasterio
from rasterstats import zonal_stats
import json
# Step 2: Open the GeoTIFF file
with rasterio.open('raster.tif') as src:
# Step 3: Iterate over each band
for band in range(1, src.count + 1):
# Step 4: Calculate zonal statistics
stats = zonal_stats('zones.shp', src.read(band), affine=src.transform, stats="mean")
# Step 5: Save the results to a JSON file
output_file = f'band{band}_stats.json'
with open(output_file, 'w') as f:
json.dump(stats, f)
# Step 6: Process the results
# For example, print the mean value
print(f"Band {band}: Mean = {stats[0]['mean']}")
# Optionally, you can also load the JSON file and further process the statistics
with open(output_file, 'r') as f:
loaded_stats = json.load(f)
# Perform additional processing with loaded_stats
In this updated code, we added the necessary code to save the zonal statistics for each band to a JSON file. The json.dump function is used to write the statistics to the JSON file. The name of the output file is generated dynamically based on the band number.
After saving the JSON file, you can process the statistics further, print them, or load the JSON file at a later time for additional analysis. The code demonstrates loading the JSON file (band{band}_stats.json) and assigns the loaded statistics to the loaded_stats variable. You can perform any desired additional processing on the loaded statistics as needed.
Remember to replace ‘raster.tif’ with the path to your actual GeoTIFF file and ‘zones.shp’ with the path to your shapefile defining the zones.
Final Thoughts:
Zonal statistics are a fundamental tool in spatial analysis, enabling the extraction of valuable information from raster datasets. Python provides a straightforward and efficient way to calculate zonal statistics for GeoTIFF files with multiple bands. By utilizing the rasterio and rasterstats libraries, you can explore and analyze raster data within specific zones, opening up a wide range of possibilities for spatial data analysis.