Recently, researchers from Nankai University, Oxford University and the University of California at Merced proposed a new module Res2Net for target detection tasks. The new module can be used with existing ones. Other excellent modules are easily integrated, and the test performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet without increasing the computational load.
In 2015, the resNet made by four Chinese, including He Yuming, became famous, and it can accelerate the training of neural networks very quickly. .
Recently, Res2Net, jointly developed by researchers from Nankai University, Oxford University and the University of California at Merced, can be easily integrated with other existing excellent modules. : Test performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet without increasing the computational load.
Further ablation studies and experimental results on representative computer vision tasks, ie target detection, class activation mapping and significant target detection, further validating Res2Net relative to existing The superiority of the baseline method of technology.
Multi-scale representation for visual tasks is of great significance for target detection, semantic segmentation and significant target detection tasks. Through CNN's new module Res2Net, it is possible to achieve better CNN backbone-based models (such as ResNet, ResNeXt and DLA) than before. Better performance.
Res2Net: No load on computing, feature extraction is more powerful
Representing features on multiple scales is important for many visual tasks. Recent advances in the Convolutional Neural Network (CNN) backbone continue to demonstrate stronger multi-scale representation capabilities to achieve consistent performance gains across a wide range of applications. However, most existing methods represent multi-scale features in a layer-wise manner.
In this paper, the researchers are in a single residual block. Constructing a layered residual class connection,A new building block for CNN is proposed, Res2Net, which represents multi-scale features at a more granular level and increases the receptive field of each network layer (receptive fields) range.
In the above figure, the left side is the basic structure of the CNN network architecture, and the right side is the newly proposed Res2Net module. The new module has a stronger multi-scale feature extraction capability, but the computational load is similar to the left architecture. Specifically, the new module replaces the filter set with a smaller 3×3 filter, while different filter sets can be connected in a hierarchical residual style. The connection form inside the module is similar to the Residual Network (ResNet), so it is named Res2Net.
Integration with other existing modules
The Res2Net module proposed in this article can be integrated into The most advanced backbone CNN models, such as ResNet, ResNeXt and DLA. The researchers evaluated the Res2Net module on all of these models and demonstrated consistent performance gains relative to the baseline model on widely used data sets such as CIFAR-100 and ImageNet.
Because the separate Res2Net module has no specific requirements for the overall network structure,The multi-scale representation capability of the Res2Net module is also independent of CNN's hierarchical feature aggregation model, so the Res2Net module can be easily integrated into other existing excellent CNN models. Such as ResNet, ResNeXt and DLA. The integrated model can be called Res2Net, Res2NeXt, and Res2Net-DLA.
Res2Net Module Performance and Test Results
ImageNet Dataset Test Results
ImageNet dataset Top1 and Top5 test results
Res2Net-50 tests the error rate results on different scales of the ImageNet dataset. The parameter w is the filter width and s is the scale
CIFAR-100 dataset test results
CIFAR-100 dataset Top1 error rate, and model size
CIFAR-100 dataset detection at different model sizes Accuracy
Remote comparison of class activation mappings for ResNet-50 and Res2Net-50
Visual comparison of semantic segmentation results of ResNet-101 and Res2Net-101
ResNet-50 and Res2Net-50 significant target test results comparison (Figure 7)
Conclusion and future direction
Res2Net has a simple structure and excellent performance, which can further explore the multi-scale representation of CNN at a finer level. Res2Net reveals a new dimension, namely "Scale", except depth, width and Beyond the existing dimensions of the cardinality, “scale” is an essential and more effective factor.
Res2Net module can be easily updated with existing ones Module integration. Image classification results for the CIFAR100 and ImageNet benchmarks show that networks using the Res2Net module consistently perform better in competition with rivals, including ResNet, ResNeXt, and DLA.
Res2Net performance superiority has been demonstrated in several representative computer vision tasks, including class activation mapping, object detection, and significant object detection. . Multi-scale representation is critical to developing a broader range of applications in the future.
The relevant source code for this article will be publicly released after the paper has been received.