Raster Classification and Reclassification

Converting continuous values to discrete categories

2026-02-27

Canada’s national land cover dataset — 30-metre resolution, updated roughly every five years from Landsat imagery — classifies every pixel in the country into one of seventeen categories: water, wetland, cropland, urban, treed upland, and so on. The classification is the product of a decision process applied to hundreds of millions of pixels, each represented by a stack of spectral reflectance values in multiple bands. The output — discrete categories on a continuous landscape — is what land managers, carbon accountants, and biodiversity assessors actually work with. But the categories are not in the data; they are imposed on it. And the choice of where to draw the boundaries between classes has a larger effect on the results than most users appreciate.

Raster classification converts continuous fields — elevation, slope, temperature, spectral reflectance, vegetation index — into discrete categories. The conversion requires two decisions: how many classes to create, and where to place the breakpoints between them. Equal-interval classification (divide the range into equal steps) is intuitive but produces classes with very unequal numbers of pixels if the data is skewed. Quantile classification (equal numbers of pixels per class) preserves the count distribution but may split a cluster of very similar values. Natural breaks (Jenks optimisation) minimises within-class variance and often produces the most cartographically satisfying result, but the algorithm is quadratic in the number of classes and requires care on large datasets. This model derives and compares all three schemes, introduces the confusion matrix for evaluating classification accuracy, and discusses when each approach is appropriate.

1. The Question

How do you convert a continuous elevation raster into slope classes: “flat”, “gentle”, “steep”, “very steep”?

Reclassification transforms raster values using decision rules:

Examples: - Slope categories: 0-5° = flat, 5-15° = gentle, 15-30° = steep, >30° = very steep - Land cover from NDVI: <0.2 = bare, 0.2-0.4 = sparse veg, 0.4-0.6 = moderate, >0.6 = dense - Habitat suitability: Combine elevation + slope + aspect into “suitable” vs “unsuitable” - Fire risk zones: Temperature + humidity + vegetation → low/medium/high risk

The mathematical question: Given continuous input values, how do we assign them to discrete classes efficiently and meaningfully?

Key decisions: - Number of classes: Too few → information loss; too many → complexity - Breakpoints: Where to split? Equal intervals? Natural breaks? Quantiles? - Edge handling: Is 15.0° “gentle” or “steep”?


2. The Conceptual Model

Classification vs. Reclassification

Classification: Assign raw values to meaningful categories - Satellite imagery → land cover classes - Temperature values → climate zones

Reclassification: Transform one categorical raster to another - 10 land cover types → 3 broad categories (urban/forest/agriculture) - Detailed soil types → simplified drainage classes

Both use the same mathematical framework.

Classification Schemes

1. Equal Interval

Divide value range into equal-width bins.

\text{Class } i: \left[\min + i \cdot \frac{\max - \min}{n}, \min + (i+1) \cdot \frac{\max - \min}{n}\right)

Example: Elevation 0-1000m, 5 classes → each class spans 200m

Pros: Simple, intuitive
Cons: May have empty classes or very unbalanced distribution

2. Quantiles (Equal Count)

Each class contains equal number of pixels.

k% quantile: Value below which k% of data falls.

Example: 4 classes → breakpoints at 25th, 50th, 75th percentiles

Pros: Balanced class sizes
Cons: Breakpoints may not align with natural boundaries

3. Natural Breaks (Jenks)

Minimize within-class variance, maximize between-class variance.

Objective: Find breaks that create most homogeneous classes.

Algorithm: Dynamic programming to optimize:

\min \sum_{i=1}^{k} \sum_{x \in \text{class}_i} (x - \bar{x}_i)^2

Pros: Respects data distribution
Cons: Computationally expensive, breakpoints change with data

4. Standard Deviation

Classes based on deviations from mean.

\text{Class boundaries: } \mu - 2\sigma, \mu - \sigma, \mu, \mu + \sigma, \mu + 2\sigma

Pros: Statistical meaning (normal distribution)
Cons: Assumes normal distribution (often violated)

5. Manual/Expert

Domain expert specifies meaningful thresholds.

Example: Slope classes from geomorphology literature - 0-2°: Flat (flooding possible) - 2-5°: Gentle (easy to build on) - 5-15°: Moderate (erosion risk increases) - 15-30°: Steep (difficult access) - >30°: Very steep (landslide risk)

Pros: Incorporates domain knowledge
Cons: Subjective, may not fit specific dataset


3. Building the Mathematical Model

Simple Threshold Classification

Binary classification:

z_{\text{out}} = \begin{cases} 1 & \text{if } z_{\text{in}} \geq T \\ 0 & \text{if } z_{\text{in}} < T \end{cases}

Example: Water detection from elevation - Threshold T = 0 m (sea level) - Output: 1 = land, 0 = water

Multi-Class Range Classification

Define breakpoints: b_0 < b_1 < b_2 < \cdots < b_n

Classification function:

\text{class}(z) = \begin{cases} 1 & \text{if } b_0 \leq z < b_1 \\ 2 & \text{if } b_1 \leq z < b_2 \\ \vdots \\ n & \text{if } b_{n-1} \leq z < b_n \end{cases}

Implementation:

def classify(value, breaks):
    for i, break_value in enumerate(breaks[1:]):
        if value < break_value:
            return i + 1
    return len(breaks)

Lookup Table Reclassification

Map specific input values to output values.

Lookup table:

Input Value Output Value
1 (Forest) 1 (Vegetation)
2 (Grass) 1 (Vegetation)
3 (Crops) 1 (Vegetation)
4 (Urban) 2 (Developed)
5 (Water) 3 (Water)

Function:

z_{\text{out}} = \text{LUT}[z_{\text{in}}]

Efficient with arrays/dictionaries.

Fuzzy Classification

Instead of hard boundaries, use membership functions.

Example - “Moderate slope” membership:

\mu_{\text{moderate}}(s) = \begin{cases} 0 & s < 5 \\ \frac{s - 5}{10} & 5 \leq s < 15 \\ 1 & 15 \leq s < 25 \\ \frac{35 - s}{10} & 25 \leq s < 35 \\ 0 & s \geq 35 \end{cases}

Value between 0 and 1 indicates degree of membership.

Advantage: Represents uncertainty at boundaries.


4. Worked Example by Hand

Problem: Classify this temperature raster (°C) into 3 categories using equal intervals.

Input:

    j=0  j=1  j=2  j=3
i=0  10   15   20   25
i=1  12   18   22   28
i=2  14   16   24   30
i=3  11   19   26   32

Categories: - Cold (1) - Moderate (2) - Hot (3)

Solution

Step 1: Find range

\min = 10°C, \quad \max = 32°C \text{range} = 32 - 10 = 22°C

Step 2: Calculate interval width

\text{width} = \frac{22}{3} = 7.33°C

Step 3: Define breakpoints

Classes: - Cold (1): [10, 17.33) - Moderate (2): [17.33, 24.67) - Hot (3): [24.67, 32]

Step 4: Classify each cell

Row 0: - 10 < 17.33 → 1 (Cold) - 15 < 17.33 → 1 - 20 ∈ [17.33, 24.67) → 2 (Moderate) - 25 ≥ 24.67 → 3 (Hot)

Row 1: - 12 → 1, 18 → 2, 22 → 2, 28 → 3

Row 2: - 14 → 1, 16 → 1, 24 → 2, 30 → 3

Row 3: - 11 → 1, 19 → 2, 26 → 3, 32 → 3

Output:

    j=0  j=1  j=2  j=3
i=0   1    1    2    3
i=1   1    2    2    3
i=2   1    1    2    3
i=3   1    2    3    3

Class counts: - Cold (1): 6 cells - Moderate (2): 6 cells - Hot (3): 4 cells

Not perfectly balanced (would be 5.33 each) because we used equal intervals, not quantiles.


5. Computational Implementation

Below is an interactive raster classification tool.

<label>
  Classification method:
  <select id="class-method">
    <option value="equal-interval" selected>Equal Interval</option>
    <option value="quantile">Quantile (Equal Count)</option>
    <option value="std-dev">Standard Deviation</option>
    <option value="manual">Manual Thresholds</option>
  </select>
</label>
<label>
  Number of classes:
  <input type="range" id="n-classes" min="2" max="8" step="1" value="5">
  <span id="n-classes-value">5</span>
</label>
<div id="manual-controls" style="display:none;">
  <label>
    Threshold 1:
    <input type="range" id="manual-t1" min="0" max="100" step="5" value="30">
    <span id="manual-t1-val">30</span>
  </label>
  <label>
    Threshold 2:
    <input type="range" id="manual-t2" min="0" max="100" step="5" value="60">
    <span id="manual-t2-val">60</span>
  </label>
</div>
<label>
  Show histogram:
  <input type="checkbox" id="show-histogram" checked>
</label>
<canvas id="classify-canvas" width="700" height="400" style="border: 1px solid #ddd;"></canvas>
<p id="class-distribution"></p>

Try this: - Equal interval: Fixed-width bins (may be unbalanced) - Quantile: Balanced class sizes (breaks at data percentiles) - Standard deviation: Statistical bins (assumes normal distribution) - Manual: Set your own thresholds (red lines on histogram) - Adjust class count: See how distribution changes - Histogram: Red lines show where breaks occur in data

Key insight: Method choice dramatically affects results—no single “correct” classification.


6. Interpretation

Slope Classification Example

From DEM to actionable information:

1. Calculate slope (degrees) from DEM
2. Classify:
   - 0-2°: Suitable for farming, flooding risk
   - 2-5°: Good for construction
   - 5-15°: Moderate difficulty, erosion control needed
   - 15-30°: Forestry, recreation only
   - >30°: Hazard zones, protect from development

Result: Planning tool, not just numbers.

NDVI to Land Cover

Thresholds from literature:

NDVI < 0.1: Water, barren land
0.1-0.2: Sparse vegetation (desert)
0.2-0.4: Grassland, shrubland
0.4-0.6: Cropland, mixed vegetation
0.6-0.8: Dense vegetation (forest)
>0.8: Very dense vegetation (rainforest)

Validated against ground truth from field surveys.

Multi-Criteria Suitability

Combine multiple factors:

slope_class = classify(slope, [0, 5, 15, 30])
aspect_class = classify(aspect, [0, 90, 180, 270, 360])
soil_class = reclassify(soil_type, lookup_table)

suitability = (slope_class == 1) AND 
              (aspect_class IN [2, 3]) AND
              (soil_class IN [1, 2])

Boolean result: Suitable (1) or not (0).


7. What Could Go Wrong?

Arbitrary Breakpoints

Equal interval on skewed data:

Data: [1, 1, 2, 2, 2, 3, 3, 50]
Equal intervals (4 classes):
  [1, 13.25): 7 values → Class 1
  [13.25, 25.5): 0 values → Class 2
  [25.5, 37.75): 0 values → Class 3
  [37.75, 50]: 1 value → Class 4

Problem: Empty classes, unbalanced.

Solution: Use quantiles or remove outliers first.

Sensitivity to Outliers

One extreme value shifts all breakpoints:

Data: [10, 12, 14, 15, 16, 18, 20, 1000]
Equal intervals with outlier → huge bins

Solution: - Remove outliers before classification - Use robust statistics (median, IQR) - Clip extreme values

Loss of Information

Continuous to categorical loses detail:

Original: 15.2°, 15.8° (0.6° difference)
Classified: Both → Class 2 "gentle" (appear identical)

Original: 14.9°, 15.1° (0.2° difference)  
Classified: 14.9° → Class 1, 15.1° → Class 2 (appear very different)

Problem: Boundary artifacts.

Solution: Use buffer zones or fuzzy classification.

Inappropriate Method

Quantiles on categorical data:

Land cover codes: [1, 1, 1, 2, 2, 3, 3, 3]
Quantile classification → meaningless

Solution: Only classify continuous data. Reclassify categorical via lookup tables.


8. Extension: Unsupervised Classification

Automated clustering finds natural groups in data.

K-means algorithm:

1. Initialize k cluster centers randomly
2. Assign each pixel to nearest center
3. Recompute centers as mean of assigned pixels
4. Repeat 2-3 until convergence

For multi-band imagery:

pixel = [band1, band2, band3, ..., bandN]
distance = sqrt(sum((pixel - center)²))

Advantage: No manual thresholds needed.

Disadvantage: Classes may not align with semantic categories.

Example: Classify Landsat image (7 bands) into 10 land cover types automatically.


9. Math Refresher: Quantiles and Percentiles

Definition

p-th quantile (Q_p): Value below which fraction p of data falls.

Example: Median = 0.5 quantile (50th percentile)

Calculation

For sorted data x_1 \leq x_2 \leq \cdots \leq x_n:

Position:

\text{pos} = p \times (n - 1) + 1

If position is integer: Q_p = x_{\text{pos}}

If fractional: Interpolate between x_{\lfloor\text{pos}\rfloor} and x_{\lceil\text{pos}\rceil}

Example: Find 0.25 quantile of [1, 2, 3, 4, 5]

\text{pos} = 0.25 \times (5 - 1) + 1 = 2

Q_{0.25} = x_2 = 2

For Classification

Divide into k equal-count classes:

Breakpoints at quantiles: Q_{1/k}, Q_{2/k}, \ldots, Q_{(k-1)/k}

Example: 4 classes → breaks at 0.25, 0.5, 0.75 quantiles


Summary

This completes Cluster K (Raster Foundations)! We’ve covered resampling (33), map algebra (34), and classification (35).

Next: Model 36 launches Cluster M (Terrain Analysis) with viewshed and line-of-sight analysis!