Raster Classification and Reclassification
modelling Level 3

Raster Classification and Reclassification

Convert elevation to slope classes. Transform NDVI to vegetation categories. Turn temperature into climate zones. Classification transforms continuous raster values into discrete categories using thresholds, ranges, and decision rules. This model derives classification methods and shows how to choose breakpoints.

Prerequisites: thresholding, decision rules, classification schemes, histogram analysis

Updated 14 min read

1. The Question

How do you convert a continuous elevation raster into slope classes: “flat”, “gentle”, “steep”, “very steep”?

Reclassification transforms raster values using decision rules:

Examples:

  • Slope categories: 0-5° = flat, 5-15° = gentle, 15-30° = steep, >30° = very steep
  • Land cover from NDVI: <0.2 = bare, 0.2-0.4 = sparse veg, 0.4-0.6 = moderate, >0.6 = dense
  • Habitat suitability: Combine elevation + slope + aspect into “suitable” vs “unsuitable”
  • Fire risk zones: Temperature + humidity + vegetation → low/medium/high risk

The mathematical question: Given continuous input values, how do we assign them to discrete classes efficiently and meaningfully?

Key decisions:

  • Number of classes: Too few → information loss; too many → complexity
  • Breakpoints: Where to split? Equal intervals? Natural breaks? Quantiles?
  • Edge handling: Is 15.0° “gentle” or “steep”?

2. The Conceptual Model

Classification vs. Reclassification

Classification: Assign raw values to meaningful categories

  • Satellite imagery → land cover classes
  • Temperature values → climate zones

Reclassification: Transform one categorical raster to another

  • 10 land cover types → 3 broad categories (urban/forest/agriculture)
  • Detailed soil types → simplified drainage classes

Both use the same mathematical framework.

Classification Schemes

1. Equal Interval

Divide value range into equal-width bins.

\[\text{Class } i: \left[\min + i \cdot \frac{\max - \min}{n}, \min + (i+1) \cdot \frac{\max - \min}{n}\right)\]

Example: Elevation 0-1000m, 5 classes → each class spans 200m

Pros: Simple, intuitive
Cons: May have empty classes or very unbalanced distribution

2. Quantiles (Equal Count)

Each class contains equal number of pixels.

k% quantile: Value below which k% of data falls.

Example: 4 classes → breakpoints at 25th, 50th, 75th percentiles

Pros: Balanced class sizes
Cons: Breakpoints may not align with natural boundaries

3. Natural Breaks (Jenks)

Minimize within-class variance, maximize between-class variance.

Objective: Find breaks that create most homogeneous classes.

Algorithm: Dynamic programming to optimize:

\[\min \sum_{i=1}^{k} \sum_{x \in \text{class}_i} (x - \bar{x}_i)^2\]

Pros: Respects data distribution
Cons: Computationally expensive, breakpoints change with data

4. Standard Deviation

Classes based on deviations from mean.

\[\text{Class boundaries: } \mu - 2\sigma, \mu - \sigma, \mu, \mu + \sigma, \mu + 2\sigma\]

Pros: Statistical meaning (normal distribution)
Cons: Assumes normal distribution (often violated)

5. Manual/Expert

Domain expert specifies meaningful thresholds.

Example: Slope classes from geomorphology literature

  • 0-2°: Flat (flooding possible)
  • 2-5°: Gentle (easy to build on)
  • 5-15°: Moderate (erosion risk increases)
  • 15-30°: Steep (difficult access)
  • 30°: Very steep (landslide risk)

Pros: Incorporates domain knowledge
Cons: Subjective, may not fit specific dataset


3. Building the Mathematical Model

Simple Threshold Classification

Binary classification:

\[z_{\text{out}} = \begin{cases} 1 & \text{if } z_{\text{in}} \geq T \\ 0 & \text{if } z_{\text{in}} < T \end{cases}\]

Example: Water detection from elevation

  • Threshold $T = 0$ m (sea level)
  • Output: 1 = land, 0 = water

Multi-Class Range Classification

Define breakpoints: $b_0 < b_1 < b_2 < \cdots < b_n$

Classification function:

\[\text{class}(z) = \begin{cases} 1 & \text{if } b_0 \leq z < b_1 \\ 2 & \text{if } b_1 \leq z < b_2 \\ \vdots \\ n & \text{if } b_{n-1} \leq z < b_n \end{cases}\]

Implementation:

def classify(value, breaks):
    for i, break_value in enumerate(breaks[1:]):
        if value < break_value:
            return i + 1
    return len(breaks)

Lookup Table Reclassification

Map specific input values to output values.

Lookup table:

Input Value Output Value
1 (Forest) 1 (Vegetation)
2 (Grass) 1 (Vegetation)
3 (Crops) 1 (Vegetation)
4 (Urban) 2 (Developed)
5 (Water) 3 (Water)

Function:

\[z_{\text{out}} = \text{LUT}[z_{\text{in}}]\]

Efficient with arrays/dictionaries.

Fuzzy Classification

Instead of hard boundaries, use membership functions.

Example - “Moderate slope” membership:

\[\mu_{\text{moderate}}(s) = \begin{cases} 0 & s < 5 \\ \frac{s - 5}{10} & 5 \leq s < 15 \\ 1 & 15 \leq s < 25 \\ \frac{35 - s}{10} & 25 \leq s < 35 \\ 0 & s \geq 35 \end{cases}\]

Value between 0 and 1 indicates degree of membership.

Advantage: Represents uncertainty at boundaries.


4. Worked Example by Hand

Problem: Classify this temperature raster (°C) into 3 categories using equal intervals.

Input:

    j=0  j=1  j=2  j=3
i=0  10   15   20   25
i=1  12   18   22   28
i=2  14   16   24   30
i=3  11   19   26   32

Categories:

  • Cold (1)
  • Moderate (2)
  • Hot (3)

Solution

Step 1: Find range

\(\min = 10°C, \quad \max = 32°C\) \(\text{range} = 32 - 10 = 22°C\)

Step 2: Calculate interval width

\[\text{width} = \frac{22}{3} = 7.33°C\]

Step 3: Define breakpoints

  • $b_0 = 10$
  • $b_1 = 10 + 7.33 = 17.33$
  • $b_2 = 17.33 + 7.33 = 24.67$
  • $b_3 = 32$

Classes:

  • Cold (1): [10, 17.33)
  • Moderate (2): [17.33, 24.67)
  • Hot (3): [24.67, 32]

Step 4: Classify each cell

Row 0:

  • 10 < 17.33 → 1 (Cold)
  • 15 < 17.33 → 1
  • 20 ∈ [17.33, 24.67) → 2 (Moderate)
  • 25 ≥ 24.67 → 3 (Hot)

Row 1:

  • 12 → 1, 18 → 2, 22 → 2, 28 → 3

Row 2:

  • 14 → 1, 16 → 1, 24 → 2, 30 → 3

Row 3:

  • 11 → 1, 19 → 2, 26 → 3, 32 → 3

Output:

    j=0  j=1  j=2  j=3
i=0   1    1    2    3
i=1   1    2    2    3
i=2   1    1    2    3
i=3   1    2    3    3

Class counts:

  • Cold (1): 6 cells
  • Moderate (2): 6 cells
  • Hot (3): 4 cells

Not perfectly balanced (would be 5.33 each) because we used equal intervals, not quantiles.


5. Computational Implementation

Below is an interactive raster classification tool.

Try this:

  • Equal interval: Fixed-width bins (may be unbalanced)
  • Quantile: Balanced class sizes (breaks at data percentiles)
  • Standard deviation: Statistical bins (assumes normal distribution)
  • Manual: Set your own thresholds (red lines on histogram)
  • Adjust class count: See how distribution changes
  • Histogram: Red lines show where breaks occur in data

Key insight: Method choice dramatically affects results—no single “correct” classification.


6. Interpretation

Slope Classification Example

From DEM to actionable information:

1. Calculate slope (degrees) from DEM
2. Classify:
   - 0-2°: Suitable for farming, flooding risk
   - 2-5°: Good for construction
   - 5-15°: Moderate difficulty, erosion control needed
   - 15-30°: Forestry, recreation only
   - >30°: Hazard zones, protect from development

Result: Planning tool, not just numbers.

NDVI to Land Cover

Thresholds from literature:

NDVI < 0.1: Water, barren land
0.1-0.2: Sparse vegetation (desert)
0.2-0.4: Grassland, shrubland
0.4-0.6: Cropland, mixed vegetation
0.6-0.8: Dense vegetation (forest)
>0.8: Very dense vegetation (rainforest)

Validated against ground truth from field surveys.

Multi-Criteria Suitability

Combine multiple factors:

slope_class = classify(slope, [0, 5, 15, 30])
aspect_class = classify(aspect, [0, 90, 180, 270, 360])
soil_class = reclassify(soil_type, lookup_table)

suitability = (slope_class == 1) AND 
              (aspect_class IN [2, 3]) AND
              (soil_class IN [1, 2])

Boolean result: Suitable (1) or not (0).


7. What Could Go Wrong?

Arbitrary Breakpoints

Equal interval on skewed data:

Data: [1, 1, 2, 2, 2, 3, 3, 50]
Equal intervals (4 classes):
  [1, 13.25): 7 values → Class 1
  [13.25, 25.5): 0 values → Class 2
  [25.5, 37.75): 0 values → Class 3
  [37.75, 50]: 1 value → Class 4

Problem: Empty classes, unbalanced.

Solution: Use quantiles or remove outliers first.

Sensitivity to Outliers

One extreme value shifts all breakpoints:

Data: [10, 12, 14, 15, 16, 18, 20, 1000]
Equal intervals with outlier → huge bins

Solution:

  • Remove outliers before classification
  • Use robust statistics (median, IQR)
  • Clip extreme values

Loss of Information

Continuous to categorical loses detail:

Original: 15.2°, 15.8° (0.6° difference)
Classified: Both → Class 2 "gentle" (appear identical)

Original: 14.9°, 15.1° (0.2° difference)  
Classified: 14.9° → Class 1, 15.1° → Class 2 (appear very different)

Problem: Boundary artifacts.

Solution: Use buffer zones or fuzzy classification.

Inappropriate Method

Quantiles on categorical data:

Land cover codes: [1, 1, 1, 2, 2, 3, 3, 3]
Quantile classification → meaningless

Solution: Only classify continuous data. Reclassify categorical via lookup tables.


8. Extension: Unsupervised Classification

Automated clustering finds natural groups in data.

K-means algorithm:

1. Initialize k cluster centers randomly
2. Assign each pixel to nearest center
3. Recompute centers as mean of assigned pixels
4. Repeat 2-3 until convergence

For multi-band imagery:

pixel = [band1, band2, band3, ..., bandN]
distance = sqrt(sum((pixel - center)²))

Advantage: No manual thresholds needed.

Disadvantage: Classes may not align with semantic categories.

Example: Classify Landsat image (7 bands) into 10 land cover types automatically.


9. Math Refresher: Quantiles and Percentiles

Definition

p-th quantile ($Q_p$): Value below which fraction $p$ of data falls.

Example: Median = 0.5 quantile (50th percentile)

Calculation

For sorted data $x_1 \leq x_2 \leq \cdots \leq x_n$:

Position:

\[\text{pos} = p \times (n - 1) + 1\]

If position is integer: $Q_p = x_{\text{pos}}$

If fractional: Interpolate between $x_{\lfloor\text{pos}\rfloor}$ and $x_{\lceil\text{pos}\rceil}$

Example: Find 0.25 quantile of [1, 2, 3, 4, 5]

\[\text{pos} = 0.25 \times (5 - 1) + 1 = 2\] \[Q_{0.25} = x_2 = 2\]

For Classification

Divide into k equal-count classes:

Breakpoints at quantiles: $Q_{1/k}, Q_{2/k}, \ldots, Q_{(k-1)/k}$

Example: 4 classes → breaks at 0.25, 0.5, 0.75 quantiles


Summary

  • Classification converts continuous rasters to discrete categories
  • Five methods: Equal interval, quantile, natural breaks, standard deviation, manual
  • Equal interval: Fixed-width bins, simple but may be unbalanced
  • Quantile: Equal-count bins, balanced but may group dissimilar values
  • Natural breaks: Minimizes within-class variance, computationally expensive
  • Manual thresholds: Domain expert knowledge, most meaningful for applications
  • Reclassification: Uses lookup tables to transform categorical rasters
  • Fuzzy classification: Membership functions instead of hard boundaries
  • Applications: Slope classes, land cover mapping, suitability analysis
  • Challenges: Outliers, information loss, boundary artifacts
  • Method choice depends on data distribution and application requirements

This completes Cluster K (Raster Foundations)! We’ve covered resampling (33), map algebra (34), and classification (35).

Next: Model 36 launches Cluster M (Terrain Analysis) with viewshed and line-of-sight analysis!


References