Evaluation

Module: cutoop.eval_utils.

This module contains gadgets used to evaluate pose estimation models.

The core structure of evaluation is the dataclass DetectMatch , inheriting from the dataclass GroundTruth . In other words, it contains all information required for metrics computation.

Evaluation Data Structure

classDetectMatch(gt_affine: ndarray, gt_size: ndarray, gt_sym_labels: ndarray[Any, dtype[object_]], gt_class_labels: ndarray, pred_affine: ndarray, pred_size: ndarray, image_path: list[str] | str | None = None, camera_intrinsics: list[CameraIntrinsicsBase] | CameraIntrinsicsBase | None = None, detect_scores: None | ndarray = None, gt_n_objects: None | Counter = None)

Prediction and its matched ground truth data provided for metric computation.

Containing 6D pose (rotation, translation, scale) information for ground truth and predictions, as well as ground truth symmetry labels.

propertycamera_intrinsics = None

Camera intrinsics.

propertydetect_scores = None

Confidence output by the detection model (N,).

propertygt_affine: ndarray

Ground truth 3D affine transformation for each instance (Nx4x4), without scaling .

propertygt_class_labels: ndarray

Ground truth class label of each object.

propertygt_n_objects = None

The number of objects annotated in GT ( NOT the number of detected objects).

propertygt_size: ndarray

Ground truth bounding box side lengths (Nx3), a.k.a. bbox_side_len in image_meta.ImageMetaData .

propertygt_sym_labels: ndarray[Any, dtype[object_]]

Ground truth symmetry labels (N,) (each element is a rotation.SymLabel ).

>>> list(map(str, result.gt_sym_labels))
['y-cone', 'x-flip', 'y-cone', 'y-cone', 'none', 'none', 'none', 'none', 'x-flip']

propertyimage_path = None

Path to the rgb image (for drawing).

propertypred_affine: ndarray

Prediction of 3D affine transformation for each instance (Nx4x4).

propertypred_size: ndarray

Prediction of 3D bounding box sizes (Nx3).

method__getitem__(index): DetectMatch

Use slice or integer index to get a subset sequence of data.

Note that gt_n_objects would be lost.

>>> result[1:3]
DetectMatch(gt_affine=array([[[ 0.50556856,  0.06278258,  0.86049914, -0.2933575 ],
        [-0.57130486, -0.72301346,  0.3884099 ,  0.5845589 ],
        [ 0.6465379 , -0.68797517, -0.3296649 ,  1.7387577 ],
        [ 0.        ,  0.        ,  0.        ,  1.        ]],

       [[ 0.38624182,  0.00722773, -0.92236924,  0.02853431],
        [ 0.6271931 , -0.735285  ,  0.25687516,  0.6061586 ],
        [-0.6763476 , -0.67771953, -0.28853098,  1.6325369 ],
        [ 0.        ,  0.        ,  0.        ,  1.        ]]],
      dtype=float32), gt_size=array([[0.34429058, 0.00904833, 0.19874577],
       [0.32725419, 0.1347834 , 0.32831856]]), gt_sym_labels=array([SymLabel(any=False, x='half', y='none', z='none'),
       SymLabel(any=False, x='none', y='any', z='none')], dtype=object), gt_class_labels=array([48,  9]), pred_affine=array([[[ 0.50556856,  0.06278258,  0.86049914, -0.2933575 ],
        [-0.57130486, -0.72301346,  0.3884099 ,  0.5845589 ],
        [ 0.6465379 , -0.68797517, -0.3296649 ,  1.7387577 ],
        [ 0.        ,  0.        ,  0.        ,  1.        ]],

       [[ 0.38624182,  0.00722773, -0.92236924,  0.02853431],
        [ 0.6271931 , -0.735285  ,  0.25687516,  0.6061586 ],
        [-0.6763476 , -0.67771953, -0.28853098,  1.6325369 ],
        [ 0.        ,  0.        ,  0.        ,  1.        ]]],
      dtype=float32), pred_size=array([[0.3535623 , 0.00863385, 0.21614327],
       [0.31499929, 0.1664661 , 0.41965601]]), image_path=['../../misc/sample/0000_color.png', '../../misc/sample/0000_color.png'], camera_intrinsics=[CameraIntrinsicsBase(fx=1075.41, fy=1075.69, cx=631.948, cy=513.074, width=1280, height=1024), CameraIntrinsicsBase(fx=1075.41, fy=1075.69, cx=631.948, cy=513.074, width=1280, height=1024)], detect_scores=None, gt_n_objects=None)

methodcalibrate_rotation(silent = False): DetectMatch

Calibrate the rotation of pose prediction according to gt symmetry labels using rotation.rot_canonical_sym() .

Parameters: silent – enable this flag to hide the tqdm progress bar.

Note

This function does not modify the value in-place. Instead, it produces a new calibrated result.

Returns: a new DetectMatch .

methodconcat(items: list[DetectMatch]): DetectMatch

Concatenate multiple result into a single one.

Note that the detect_scores ( gt_n_objects ) should either be all None or all ndarray ( int ).

Returns: a DetectMatch combining all items.

methodcriterion(computeIOU = True, use_precise_rot_error = False): tuple[ndarray, ndarray, ndarray]

Compute IoUs, rotation differences and translation shifts. It is useful if you need to compute other custom metrics based on them.

When setting computeIOU to False , it returns an numpy array of zeros.

Parameters: use_precise_rot_error – Use analytic method for rotation error calculation instead of discrete method (enumeration). The results should be a little smaller.

Returns: (ious, theta_degree, shift_cm), where

ious: (N,).
theta_degree: (N,), unit is degree.
shift_cm: (N,), unit is cm.

>>> iou, deg1, sht = result.criterion()
>>> iou, deg2, sht = result.criterion(use_precise_rot_error=True)
>>> assert np.abs(deg1 - deg2).max() < 0.05

methoddraw_image(path = './result.png', index: int | None = None, image_root = '', draw_gt = True, draw_pred = True, draw_label = True, draw_pred_axes_length: None | float = None, draw_gt_axes_length: None | float = None, thickness = 1)

Draw bbox of gt and prediction on the image. Require image_path and camera_intrinsics to be set.

Parameters:

path – output path for rendered image
index – which prediction to draw; set default value None to draw everything on the same image.
image_root – root directory of the image, to which assuming image_path stores relative path.
draw_gt – whether to draw gt bbox.
draw_pred – whether to draw predicted bbox.
draw_label – whether to draw symmetry label on the object
draw_pred_axes_length – specify a number to indicate the length of axes of the predicted pose.
draw_gt_axes_length – specify a number to indicate the length of axes of the gt pose.
thickness – specify line thickness.

>>> result.draw_image(
...     path='source/_static/gr_1.png'
... ) # A
>>> result.draw_image(
...     path='source/_static/gr_2.png',
...     index=4,
...     draw_label=False,
...     draw_pred_axes_length=0.5,
... ) # B
>>> result.draw_image(
...     path='source/_static/gr_3.png',
...     draw_gt=False,
... ) # C
>>> result.draw_image(
...     path='source/_static/gr_4.png',
...     draw_pred=False,
...     draw_label=False,
...     draw_gt_axes_length=0.3,
...     thickness=2,
... ) # D

A(left top): Draw all boxes.
B(right top): Draw one object.
C(left bottom): Draw predictions.
D(right bottom): Draw GT with poses.

methodfrom_detection(detection: DetectOutput, pred_affine: ndarray, pred_size: ndarray): DetectMatch

Construct matching result from the output of a detection model.

methodfrom_gt(gt: GroundTruth, pred_affine: ndarray, pred_size: ndarray): DetectMatch

Construct matching result from GT (i. e. use GT detection).

methodmetrics(iou_thresholds = [0.25, 0.5, 0.75], pose_thresholds = [(5, 2), (5, 5), (10, 2), (10, 5)], iou_auc_ranges = [(0.25, 1, 0.075), (0.5, 1, 0.005), (0.75, 1, 0.0025)], rot_auc_range = (0, 5, 0.01), trans_auc_range = (0, 10, 0.01), pose_auc_ranges = [((0, 5, 0.05), (0, 2, 0.02)), ((0, 5, 0.05), (0, 5, 0.05)), ((0, 10, 0.1), (0, 2, 0.02)), ((0, 10, 0.1), (0, 5, 0.05))], auc_keep_curve = False, criterion = None, use_precise_rot_error = False): Metrics

Compute several pose estimation metrices.

Parameters:

iou_thresholds – threshold list for computing IoU acc. and mAP.
pose_thresholds – rotation-translation threshold list for computing pose acc. and mAP.
iou_auc_ranges – list of ranges to compute IoU AUC.
pose_auc_range – degree range and shift range.
auc_keep_curve – enable this flag to output curve points for drawing.
criterion – since the computation of IoU is slow, you may cache the result of cutoop.eval_utils.DetectMatch.criterion() and provide it here, in exactly the same format.
use_precise_rot_error – See eval_utils.DetectMatch.criterion() .

Returns: the returned format can be formalized as

3D IoU:
- average (mIoU): per-class average IoU and mean average IoU.
- accuracy: pre-class IoU accuracy and mean accuracy over a list of thresholds.
- accuracy AUC: normalised AUC of the IoU’s accuracy-thresholds curve.
- average precision (detected mask, if providing pred_score): per-class IoU average precision and mean average precision, using detection confidence (mask score) as recall.
Pose:
- average: per-class average rotation and translation error.
- accuracy: pre-class degree-shift accuracy and mean accuracy over a list of thresholds.
- accuracy AUC: normalised AUC of the rotation’s, translation’s and pose’s accuracy-thresholds curve. For pose, AUC is generalised to “volume under surface”.
- accuracy AUC: AUC of the IoU accuracy-thresholds curve.
- average precision (detected mask, if providing pred_score): per-class degree-shift average precision and mean average precision over a list of thresholds, using detection confidence (mask score) as recall.

classDetectOutput(gt_affine: numpy.ndarray, gt_size: numpy.ndarray, gt_sym_labels: numpy.ndarray[Any, numpy.dtype[numpy.object_]], gt_class_labels: numpy.ndarray, image_path: 'list[str] | str | None' = None, camera_intrinsics: 'list[CameraIntrinsicsBase] | CameraIntrinsicsBase | None' = None, detect_scores: 'None | ndarray' = None, gt_n_objects: 'None | Counter' = None)

propertycamera_intrinsics = None

Camera intrinsics.

propertydetect_scores = None

Confidence output by the detection model (N,).

propertygt_affine: ndarray

Ground truth 3D affine transformation for each instance (Nx4x4), without scaling .

propertygt_class_labels: ndarray

Ground truth class label of each object.

propertygt_n_objects = None

The number of objects annotated in GT ( NOT the number of detected objects).

propertygt_size: ndarray

Ground truth bounding box side lengths (Nx3), a.k.a. bbox_side_len in image_meta.ImageMetaData .

propertygt_sym_labels: ndarray[Any, dtype[object_]]

Ground truth symmetry labels (N,) (each element is a rotation.SymLabel ).

>>> list(map(str, result.gt_sym_labels))
['y-cone', 'x-flip', 'y-cone', 'y-cone', 'none', 'none', 'none', 'none', 'x-flip']

propertyimage_path = None

Path to the rgb image (for drawing).

methodconcat(items: list[DetectOutput]): DetectOutput

Concatenate multiple result into a single one.

Note that the detect_scores ( gt_n_objects ) should either be all None or all ndarray ( int ).

Returns: a DetectOutput combining all items.

classGroundTruth(gt_affine: ndarray, gt_size: ndarray, gt_sym_labels: ndarray[Any, dtype[object_]], gt_class_labels: ndarray, image_path: list[str] | str | None = None, camera_intrinsics: list[CameraIntrinsicsBase] | CameraIntrinsicsBase | None = None)

This dataclass is a subset of Detectmatch , which can be constructed directly from GT image data before running the inference process.

propertycamera_intrinsics = None

Camera intrinsics.

propertygt_affine: ndarray

Ground truth 3D affine transformation for each instance (Nx4x4), without scaling .

propertygt_class_labels: ndarray

Ground truth class label of each object.

propertygt_size: ndarray

Ground truth bounding box side lengths (Nx3), a.k.a. bbox_side_len in image_meta.ImageMetaData .

propertygt_sym_labels: ndarray[Any, dtype[object_]]

Ground truth symmetry labels (N,) (each element is a rotation.SymLabel ).

>>> list(map(str, result.gt_sym_labels))
['y-cone', 'x-flip', 'y-cone', 'y-cone', 'none', 'none', 'none', 'none', 'x-flip']

propertyimage_path = None

Path to the rgb image (for drawing).

methodconcat(items: list[GroundTruth]): GroundTruth

Concatenate multiple result into a single one.

Returns: a GroundTruth combining all items.

Metrics Data Structure

classMetrics(class_means: ClassMetrics, class_metrics: dict[str, ClassMetrics])

Bases: object

See DetectMatch.metrics() .

propertyclass_means: ClassMetrics

The mean metrics of all occurred classes.

propertyclass_metrics: dict[str, ClassMetrics]

mapping from class label to its metrics.

methoddump_json(path: str, mkdir = True)

Write metrics to text file as JSON format.

Parameters: mkdir – enable this flag to automatically create parent directory.

methodload_json(path: str): Metrics

Load metrics data from json file

classClassMetrics(iou_mean: float, iou_acc: 'list[float]', iou_auc: 'list[AUCMetrics]', iou_ap: 'list[float] | None', deg_mean: float, sht_mean: float, pose_acc: 'list[float]', deg_auc: cutoop.eval_utils.AUCMetrics, sht_auc: cutoop.eval_utils.AUCMetrics, pose_auc: 'list[AUCMetrics]', pose_ap: 'list[float] | None')

Bases: object

propertydeg_auc: AUCMetrics

Rotation (towards 0) AUC.

propertydeg_mean: float

Average rotation error (unit: degree).

propertyiou_acc: list[float]

IoU accuracies over the list of thresholds.

propertyiou_ap: list[float] | None

IoU average precision over the list of thresholds.

propertyiou_auc: list[AUCMetrics]

IoU (towards 1) AUC over a list of ranges.

propertyiou_mean: float

Average IoU.

propertypose_acc: list[float]

Pose accuracies over the list of rotation-translation thresholds.

propertypose_ap: list[float] | None

Pose error average precision.

propertypose_auc: list[AUCMetrics]

Pose error (both towards 0) VUS over a list of ranges.

propertysht_auc: AUCMetrics

Translation (towards 0) AUC.

propertysht_mean: float

Average tralsnation error (unit: cm).

classAUCMetrics(auc: float, xs: 'list[list[float]] | list[float] | None', ys: 'list[list[float]] | list[float] | None')

Bases: object

propertyauc: float

The normalised (divided by the range of thresholds) AUC (or volume under surface) value.

propertyxs: list[list[float]] | list[float] | None

If preserved, this field contains the x coordinates of the curve.

propertyys: list[list[float]] | list[float] | None

If preserved, this field contains the y coordinates of the curve.

Functions

functionbipart_maximum_match(weights_NxM: ndarray, min_weight = 0): tuple[ndarray, ndarray]

Find a maximum matching of a bipart graph match with approximately maximum weights using linear_sum_assignment .

Parameters:

weights_NxM – weight array of the bipart graph
min_weight – (int): minimum valid weight of matched edges

Returns: (l_matches, r_matches)

l_match: array of length N containing matching index in r, -1 for not matched
r_match: array of length M containing matching index in l, -1 for not matched

functioncompute_average_precision(pred_indicator_M: ndarray, pred_scores_M: ndarray, N_gt: int | None = None): float

Calculate average precision (AP).

The detailed algorithm is

Sort pred_indicator_M by pred_scores_M in decreasing order, and compute precisions as $p_{i} = \frac{1}{i} j = 1 \sum i pred_indicator_M[i]$
Note
According to this issue , when encountering equal scores for different indicator values, the order of them affect the final result of AP. To eliminate multiple possible outputs, we add a second sorting key — sort indicator value from high to low when they possess equal scores.
Compute recalls by ( $0 \leq r_{i} \leq 1, r_{0} = 0$ ) $r_{i} = \frac{1}{N_gt} j = 1 \sum i pred_indicator_M[i]$
Suffix maximize the precisions: $P_{i} = j \geq i max {p_{j}}$
Compute average precision ( ref ) $A P = i = 1 \sum n (r_{i} - r_{i - 1}) \times P_{i}$

Parameters:

Parameters:

pred_indicator_M – A 1D 0-1 array, 0 for false positive, 1 for true positive.
pred_scores_M – Confidence (e.g. IoU) of the prediction, between 0, 1 .
N_gt – Number of ground truth instances. If not provided, it is set to M.

Returns: the AP value. If M == 0, return NaN.

>>> inds = np.array([1, 0, 1, 0, 1, 0, 0])
>>> scores = np.array([1, 1, 1, 1, 1, 1, 1])
>>> float(cutoop.eval_utils.compute_average_precision(inds, scores))
0.4285714328289032

functioncompute_mask_matches(gt_masks_HxWxN: ndarray, gt_classes_N: ndarray, pred_scores_M, pred_masks_HxWxM: ndarray, pred_classes_M: None | ndarray = None, iou_threshold = 0.5)

Finds matches between prediction and ground truth instances. Two matched objects must belong to the same class. Under this restriction, the match with higher IoU is of greater chance to be selected.

Scores are used to sort predictions.

Returns: (gt_match, pred_match, overlaps)

gt_match: 1-D array. For each GT mask it has the index of the matched predicted mask, otherwise it is marked as -1.
pred_match: 1-D array. For each predicted mask, it has the index of the matched GT mask, otherwise it is marked as -1.
overlaps: M, N IoU overlaps (negative values indicate distinct classes).

functioncompute_masks_ious(masks1_HxWxN: ndarray, masks2_HxWxM: ndarray): ndarray

Computes IoU overlaps between each pair of two sets of masks.

masks1, masks2: Height, Width, instances

Masks can be float arrays, the value > 0.5 is considered True, otherwise False.

Returns: N x M ious

functiongroup_by_class(class_labels: ndarray, *values: ndarray)

Separate the original sequence into several subsequences, of which each belongs to a unique class label. The corresponding class label is also returned.

Returns: a tuple (*sequences, labels) where

sequences is a list of $N$ lists of $C$ ndarrays, where $N$ is the number of values , $C$ is the number of occurred classes.
labels is a ndarray of length $C$ .

>>> from cutoop.eval_utils import group_by_class
>>> group_by_class(np.array([0, 1, 1, 1, 1, 0, 1]), np.array([0, 1, 4, 2, 8, 5, 7]))
(array([0, 1]), [array([0, 5]), array([1, 4, 2, 8, 7])])