Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance for large amount of bounding boxes #20

Open
dinarkino opened this issue Nov 19, 2021 · 1 comment
Open

Performance for large amount of bounding boxes #20

dinarkino opened this issue Nov 19, 2021 · 1 comment

Comments

@dinarkino
Copy link

I noticed that calculation of metrics for a large amount of data takes a lot of time and also only one CPU is used at a high level (even when async_mode is set to True). Some test with an original example from README:

import numpy as np
from mean_average_precision import MetricBuilder

# [xmin, ymin, xmax, ymax, class_id, difficult, crowd]
gt = np.array([
    [439, 157, 556, 241, 0, 0, 0],
    [437, 246, 518, 351, 0, 0, 0],
    [515, 306, 595, 375, 0, 0, 0],
    [407, 386, 531, 476, 0, 0, 0],
    [544, 419, 621, 476, 0, 0, 0],
    [609, 297, 636, 392, 0, 0, 0]
])

# [xmin, ymin, xmax, ymax, class_id, confidence]
preds = np.array([
    [429, 219, 528, 247, 0, 0.460851],
    [433, 260, 506, 336, 0, 0.269833],
    [518, 314, 603, 369, 0, 0.462608],
    [592, 310, 634, 388, 0, 0.298196],
    [403, 384, 517, 461, 0, 0.382881],
    [405, 429, 519, 470, 0, 0.369369],
    [433, 272, 499, 341, 0, 0.272826],
    [413, 390, 515, 459, 0, 0.619459]
])

# print list of available metrics
print(MetricBuilder.get_metrics_list())

# create metric_fn
metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=1)

# add some samples to evaluation
for i in range(10):
    metric_fn.add(preds, gt)

# compute PASCAL VOC metric
print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")

# compute PASCAL VOC metric at the all points
print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")

# compute metric COCO metric
print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")

On my machine, this code takes around 300ms. When I change the number of times when we add preds and gt to the metric_fn from 10 to 1000, it takes 10 seconds, from 1000 to 10000 2 minutes. That seems like a drastic change. And it corresponds to around 10000 * 8 = 80000 boxes. I noticed such behaviour when I trained the detection model, and it took around 10 minutes to measure metrics on validation. Moreover, in my case, htop shows a load of only one processor at ~100% level whereas others at the same level as before metrics calculation.

Is it expected to have such a long computation time for a large number of bounding boxes? Are there some workarounds to make computation faster?

@bes-dev
Copy link
Owner

bes-dev commented Nov 19, 2021

@dinarkino What about "async mode". This mode allows to predict bounding boxes by network and compute metric in parallel processes, it doesn't provide to parallel computation of metric.

And I think it expected behaviour when more data requires more computations. Our implementation has some parts that can be computed parallel on side of numpy, other parts are sequently. I think it can be more optimized, but we didn't it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants