Computer Vision

Segmentation and Detection Metrics

Object detection performance is primarily measured by the Intersection over Union (IoU) and Mean Average Precision (mAP).
Semantic and instance segmentation rely on pixel-wise accuracy, Dice coefficients, and the Jaccard Index to evaluate overlap.
Metrics must account for both classification accuracy (is the object correct?) and localization precision (is the boundary accurate?).
Choosing the right metric depends on the class imbalance and the specific requirements of the downstream application.

Why It Matters

Autonomous driving

In autonomous driving, companies like Waymo and Tesla use mAP and IoU to evaluate how well their perception systems detect pedestrians, cyclists, and other vehicles. Because safety is paramount, these metrics are often calculated at different IoU thresholds (e.g., mAP@0.5 vs mAP@0.75) to ensure the model is not just detecting objects, but localizing them with high precision. Failure to maintain high IoU can lead to incorrect path planning and collision risks.

Medical imaging

In medical imaging, radiologists use segmentation metrics like the Dice coefficient to evaluate automated tumor detection in MRI scans. Because tumors have irregular, fuzzy boundaries, pixel-wise accuracy is insufficient; the Dice coefficient ensures that the model captures the full extent of the lesion without including too much healthy tissue. This is critical for surgical planning and radiation therapy, where the precision of the mask directly impacts the treatment area.

Retail automation

In retail automation, companies like Amazon Go utilize object detection metrics to track items being removed from shelves. By monitoring the mAP of their shelf-scanning cameras, they ensure that the system correctly identifies products even when they are partially occluded by a customer's hand or another item. High-precision detection is essential for maintaining accurate inventory counts and ensuring that customers are billed correctly for the items they pick up.

How it Works

The Philosophy of Evaluation

In computer vision, we do not simply ask "did the model get it right?" because "right" is a complex concept. When a model detects a car, it must get the location (bounding box) correct and the classification (label) correct. If the box is shifted slightly, is it still a success? If the model detects two boxes for one car, is that a failure? Metrics for segmentation and detection are designed to answer these nuanced questions by quantifying the spatial alignment between the model's output and the ground truth.

Object Detection: The Precision-Recall Trade-off

Object detection models, such as YOLO or Faster R-CNN, produce a set of bounding boxes, each with an associated confidence score. To evaluate these, we define a threshold for IoU (e.g., 0.5). If the IoU between a predicted box and a ground truth box is greater than this threshold, we count it as a True Positive (TP). If the IoU is lower, it is a False Positive (FP). If we miss a ground truth object entirely, it is a False Negative (FN). By varying the confidence threshold, we generate a Precision-Recall curve. The area under this curve (AUC) gives us the Average Precision (AP) for a single class. Averaging this across all classes gives us the mAP.

Segmentation: Pixel-Level Precision

Segmentation tasks—whether semantic (classifying every pixel) or instance (separating individual objects)—require a different approach. Since we are dealing with masks, we compare the predicted set of pixels against the ground truth set of pixels. The Dice coefficient is the most common metric here. Unlike detection, where we care about the box, in segmentation, we care about the "shape" and "boundary." A model might have high accuracy but fail to capture the thin, irregular boundaries of an object. Metrics like the Boundary IoU have been developed to specifically penalize models that struggle with these edges.

Handling Class Imbalance and Edge Cases

Real-world datasets are rarely balanced. In autonomous driving, "road" pixels vastly outnumber "pedestrian" pixels. If we used simple accuracy, a model that predicts "road" for every pixel would achieve 99% accuracy but would be useless. This is why we use weighted metrics or class-balanced mAP. Furthermore, edge cases like "occlusion"—where one object hides another—pose significant challenges. In these scenarios, standard IoU might fail to distinguish between the two objects, requiring more sophisticated metrics like Panoptic Quality (PQ), which combines segmentation and detection metrics to provide a holistic view of scene understanding.

Common Pitfalls

Confusing Accuracy with IoU Many beginners assume that 90% pixel accuracy means a model is performing well. In segmentation, if 99% of the image is background, a model that predicts "background" for every pixel gets 99% accuracy but has an IoU of 0 for the object of interest.
Ignoring the Confidence Threshold Learners often forget that mAP is dependent on the confidence threshold of the model. Changing the threshold changes the precision and recall, meaning the same model can have different mAP scores depending on how the evaluation pipeline is configured.
Over-relying on a single IoU threshold Relying solely on IoU=0.5 can hide the fact that a model is "lazy" and produces loose boxes. Using mAP across multiple IoU thresholds (like the COCO metric) provides a much more robust assessment of localization quality.
Assuming NMS is part of the metric Non-Maximum Suppression is a post-processing step, not a metric itself. Beginners often confuse the filtering of boxes with the evaluation of those boxes, leading to errors in how they implement their evaluation loops.

Sample Code

Python

import numpy as np

def calculate_iou(boxA, boxB):
    # box format: [x1, y1, x2, y2]
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])

    # Compute area of intersection
    interArea = max(0, xB - xA) * max(0, yB - yA)
    
    # Compute area of both boxes
    boxAArea = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
    boxBArea = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])
    
    # IoU formula
    iou = interArea / float(boxAArea + boxBArea - interArea)
    return iou

# Example usage:
pred = [50, 50, 150, 150]
gt = [60, 60, 160, 160]
print(f"IoU Score: {calculate_iou(pred, gt):.4f}")
# Output: IoU Score: 0.6809