Deduplication#

Removes duplicate masks when the same object appears in multiple perspective views.

Algorithm#

Masks are deduplicated using IoU (Intersection over Union) on spherical polygons:

  1. Process perspectives sequentially
  2. For each mask, check overlap with existing masks
  3. If IoU >= threshold or intersection ratio >= 0.5, merge via union
  4. Otherwise, add as new mask

SphereMaskDeduplicationEngine#

SphereMaskDeduplicationEngine #

SphereMaskDeduplicationEngine(min_iou: float = DEFAULT_MIN_IOU, min_intersection_ratio: float = DEFAULT_MIN_INTERSECTION_RATIO)

Engine for detecting and removing duplicate sphere masks.

Uses GeoPandas for polygon intersection calculations with IoU-based deduplication strategy.

Attributes:

Name Type Description
min_iou

Minimum IoU threshold to consider masks as duplicates.

min_intersection_ratio

Minimum intersection ratio threshold.

Initialize the deduplication engine.

Parameters:

Name Type Description Default
min_iou float

Minimum IoU to consider masks as duplicates.

DEFAULT_MIN_IOU
min_intersection_ratio float

Minimum intersection/min(area) ratio.

DEFAULT_MIN_INTERSECTION_RATIO
Source code in src/panosam/dedup/detection.py
def __init__(
    self,
    min_iou: float = DEFAULT_MIN_IOU,
    min_intersection_ratio: float = DEFAULT_MIN_INTERSECTION_RATIO,
):
    """Initialize the deduplication engine.

    Args:
        min_iou: Minimum IoU to consider masks as duplicates.
        min_intersection_ratio: Minimum intersection/min(area) ratio.
    """
    self.min_iou = min_iou
    self.min_intersection_ratio = min_intersection_ratio
    # Cache for pre-loaded GeoDataFrames keyed by mask identifier
    self._gdf_cache: dict[str, gpd.GeoDataFrame] = {}

check_duplication #

check_duplication(mask_1: SphereMaskResult, mask_2: SphereMaskResult) -> bool

Check if two masks are duplicates based on spatial overlap.

Parameters:

Name Type Description Default
mask_1 SphereMaskResult

First sphere mask.

required
mask_2 SphereMaskResult

Second sphere mask.

required

Returns:

Type Description
bool

True if masks are considered duplicates.

Source code in src/panosam/dedup/detection.py
def check_duplication(
    self, mask_1: SphereMaskResult, mask_2: SphereMaskResult
) -> bool:
    """Check if two masks are duplicates based on spatial overlap.

    Args:
        mask_1: First sphere mask.
        mask_2: Second sphere mask.

    Returns:
        True if masks are considered duplicates.
    """
    intersection = self._intersect_masks(mask_1, mask_2)

    if intersection is None:
        return False

    # Check IoU threshold
    if intersection.iou >= self.min_iou:
        return True

    # Check intersection ratio threshold
    if intersection.intersection_ratio >= self.min_intersection_ratio:
        return True

    return False

deduplicate_list #

deduplicate_list(masks: List[SphereMaskResult], use_union: bool = True) -> List[SphereMaskResult]

Remove duplicates within a single list of masks using incremental merging.

This handles objects that span multiple frames correctly by maintaining a master list and comparing each mask against it. When masks overlap, they can be merged using polygon union to capture the full extent of the object.

Parameters:

Name Type Description Default
masks List[SphereMaskResult]

List of sphere masks.

required
use_union bool

If True, merge overlapping masks using polygon union. If False, keep only the highest-scoring mask.

True

Returns:

Type Description
List[SphereMaskResult]

Deduplicated list of sphere masks.

Source code in src/panosam/dedup/detection.py
def deduplicate_list(
    self, masks: List[SphereMaskResult], use_union: bool = True
) -> List[SphereMaskResult]:
    """Remove duplicates within a single list of masks using incremental merging.

    This handles objects that span multiple frames correctly by maintaining
    a master list and comparing each mask against it. When masks overlap,
    they can be merged using polygon union to capture the full extent of the object.

    Args:
        masks: List of sphere masks.
        use_union: If True, merge overlapping masks using polygon union.
                  If False, keep only the highest-scoring mask.

    Returns:
        Deduplicated list of sphere masks.
    """
    if len(masks) <= 1:
        return list(masks)

    # Pre-load all GeoDataFrames into cache for faster processing
    self._preload_gdfs(masks)

    try:
        # Use incremental merging approach
        master_list: List[SphereMaskResult] = []

        for mask in masks:
            # Check overlaps ONE BY ONE (not all at once)
            indices_to_remove = []
            masks_to_merge = [mask]
            new_mask_survives = True

            for i, master_mask in enumerate(master_list):
                if not self.check_duplication(mask, master_mask):
                    continue

                # Found an overlap - compare scores
                if mask.score >= master_mask.score:
                    # New mask wins this comparison
                    indices_to_remove.append(i)
                    if use_union:
                        masks_to_merge.append(master_mask)
                else:
                    # Existing mask wins
                    if use_union:
                        # Merge into existing mask
                        indices_to_remove.append(i)
                        masks_to_merge = [master_mask, mask]
                    else:
                        # Discard new mask
                        new_mask_survives = False
                    break

            if new_mask_survives:
                # Remove all masks that are being merged
                for i in sorted(indices_to_remove, reverse=True):
                    master_list.pop(i)

                if use_union and len(masks_to_merge) > 1:
                    # Merge overlapping masks
                    merged = self._merge_masks(masks_to_merge)
                    if merged is not None:
                        master_list.append(merged)
                        self._update_cache_for_merged(merged)
                    else:
                        # Merge failed - keep the best scoring one
                        best = max(masks_to_merge, key=lambda m: m.score)
                        master_list.append(best)
                        self._update_cache_for_merged(best)
                else:
                    # No union - just add the mask
                    master_list.append(mask)
                    self._update_cache_for_merged(mask)

        # Final pass: validate and fix all masks
        return [self._validate_and_fix_mask(m) for m in master_list]
    finally:
        # Clear cache to free memory
        self._clear_cache()

deduplicate_frames #

deduplicate_frames(frames: List[List[SphereMaskResult]], use_union: bool = True) -> List[SphereMaskResult]

Deduplicate masks across multiple frames using incremental merging with union.

Processes frames one by one, maintaining a master list. Each mask from a new frame is compared against the entire master list. When masks overlap, they can be merged using polygon union to capture the full extent of objects that span multiple frames.

Parameters:

Name Type Description Default
frames List[List[SphereMaskResult]]

List of frames, each containing a list of sphere masks.

required
use_union bool

If True, merge overlapping masks using polygon union. If False, keep only the highest-scoring mask.

True

Returns:

Type Description
List[SphereMaskResult]

Deduplicated list of sphere masks.

Source code in src/panosam/dedup/detection.py
def deduplicate_frames(
    self,
    frames: List[List[SphereMaskResult]],
    use_union: bool = True,
) -> List[SphereMaskResult]:
    """Deduplicate masks across multiple frames using incremental merging with union.

    Processes frames one by one, maintaining a master list. Each mask from
    a new frame is compared against the entire master list. When masks overlap,
    they can be merged using polygon union to capture the full extent of objects
    that span multiple frames.

    Args:
        frames: List of frames, each containing a list of sphere masks.
        use_union: If True, merge overlapping masks using polygon union.
                  If False, keep only the highest-scoring mask.

    Returns:
        Deduplicated list of sphere masks.
    """
    if not frames:
        return []

    # Pre-load all GeoDataFrames from all frames into cache
    all_masks = [mask for frame in frames for mask in frame]
    self._preload_gdfs(all_masks)

    try:
        # Start with the first frame as the master list
        master_list: List[SphereMaskResult] = list(frames[0])

        # Process each subsequent frame
        for frame_masks in tqdm(
            frames[1:],
            desc="Deduplicating",
            unit="frame",
            initial=1,
            total=len(frames),
        ):
            for mask in frame_masks:
                # Check overlaps one by one
                indices_to_remove = []
                masks_to_merge = [mask]
                new_mask_survives = True

                for i, master_mask in enumerate(master_list):
                    if not self.check_duplication(mask, master_mask):
                        continue

                    # Found an overlap - compare scores
                    if mask.score >= master_mask.score:
                        # New mask wins this comparison
                        indices_to_remove.append(i)
                        if use_union:
                            masks_to_merge.append(master_mask)
                    else:
                        # Existing mask wins
                        if use_union:
                            # Merge into existing mask
                            indices_to_remove.append(i)
                            masks_to_merge = [master_mask, mask]
                        else:
                            new_mask_survives = False
                        break

                if new_mask_survives:
                    # Remove all masks that are being merged
                    for i in sorted(indices_to_remove, reverse=True):
                        master_list.pop(i)

                    if use_union and len(masks_to_merge) > 1:
                        # Merge overlapping masks
                        merged = self._merge_masks(masks_to_merge)
                        if merged is not None:
                            master_list.append(merged)
                            self._update_cache_for_merged(merged)
                        else:
                            # Merge failed - keep the best scoring one
                            best = max(masks_to_merge, key=lambda m: m.score)
                            master_list.append(best)
                            self._update_cache_for_merged(best)
                    else:
                        # No union or single mask - just add the mask
                        master_list.append(mask)
                        self._update_cache_for_merged(mask)

        # Final pass: validate and fix all masks
        return [self._validate_and_fix_mask(m) for m in master_list]
    finally:
        # Clear cache to free memory
        self._clear_cache()

Data Classes#

PolygonIntersection dataclass #

PolygonIntersection(polygon_1_area: float, polygon_2_area: float, intersection_area: float, union_area: float, iou: float, intersection_ratio: float)

Result of polygon intersection analysis.

Attributes:

Name Type Description
polygon_1_area float

Area of the first polygon.

polygon_2_area float

Area of the second polygon.

intersection_area float

Area of intersection.

union_area float

Area of union.

iou float

Intersection over Union ratio.

intersection_ratio float

Intersection area / min(area1, area2).