
1. Although the CNN parameters in the algorithm can be shared in all categories, since the recommendation window is generated first, it is equivalent to generating a large number of sub-pictures for each picture. These sub-pictures are input into the CNN for training, and the calculation amount is large. low efficiency.
2. Since the size of the Region proposals is different, it is necessary to form a size of the fixed size. Image distortion occurs during this process.
The only specific class-specific calculations mentioned in the paper are point multiplication and NMS between features map and SVM weights. This means that the weights of the SVM are not shared. Each class has a separate SVM. It can be seen from the overall structure diagram of the algorithm that all regions generated by one image share a CNN, but each SVM and Box. The regression is independent of the category.
Some methods that can improve the algorithm mAP are mentioned in the algorithm. Among the more useful ones are:
For each SGD iteration, mini-batch contains 32 positive boxes (including all categories) and 96 background boxes, mini-batch size=128.
Selective search is used to generate about 2000 candidate regions for a single image. Its goal is to reduce the amount of computation by reducing the number of redundant candidate areas caused by sliding windows or exhaustive methods. This part of the content has been mentioned slightly before, the implementation is not manually implemented, the direct call to Selective Search can be achieved through pip installation.
The content of this part mainly refers to the original text, and Wang Bin _ICT's answer.
The purpose of this section is to fine-tune the regions produced in Selective Search to make the border closer to the real border. It is equivalent to making a regression between the proposed area and the original area.
The main idea is:
1. For each region, the corresponding activation map is calculated by CNN, which represents the proposed region.
2. The tensor of the activation map is resized to get the representation of the original area.
This value is then multiplied by the weight matrix W to obtain a representation of the original region P.
. Such a representation is actually a transformation of the original coordinates.
The 5 here represents 5th conv
3 through this
The regression calculation is performed on the coordinates of the real region G and the transformation of the proposed region coordinates to learn the weight matrix W.
Through this process, you can see that the input of the regression is P (including the features corresponding to P and the coordinates used to calculate the transformation) and G (coordinates), and the output is W.
The specific ideas of Bounding box regression are not described in detail, and there are detailed explanations in the papers and links.
This part of the content mainly refers to the blog, the original: Kang Xing Tianxia
The purpose of this technique is to select the highest score window for many intersecting sliding windows and suppress other low score and high overlap windows through IoU. For example, in the target detection process, there may be many windows around an object, then you need to select the window with the highest score and remove other windows with high IoU; when you remove all the windows larger than the IoU threshold, select the score again. High windows for IoU screening.
1. Take a picture and generate the proposed area through Selective Search;
2. For each proposed area, train through trained CNN and SVM to predict the category of the proposed area;
3. Adjust the area box by the Bbox regression parameter corresponding to the category. This step is not synchronized with the SVM during the training process, because the training process knows the category, but in the prediction process. In order to use the SVM to determine the category, in order to use the corresponding category
4. Use non-maximum suppression to reduce the number of frames in the entire image.
Network structure: This article made some improvements based on faster rcnn: First, based on a newly developed loss function called center loss [33], we design a new multi-task loss function in the Fa...
R-CNN is a series of tutorials, and Faster R-CNN has made a qualitative breakthrough. Let's take a look at the difference between them. From RCNN to fast RCNN, to the faster RCNN in this paper, the fo...
Original link:Cascade R-CNN detailed interpretation The purpose of writing this article is to analyze the original text and help you understand the essence of the article. If there are errors, I hope ...
Article catalog Abstract Introduction reason RPN Training program Faster R-CNN Overall process Conv layers RPN cls reg anchor Translation-Invariant Anchors Muti-Scale Anchors as Regression References ...
R-CNN CVPR2014 1. Advantages and disadvantages advantage: The first application of CNN to target detection is a milestone in the application of CNN to target detection. &...
Question 1: What is the reason for the fine-tuning phase and the SVM stage threshold? The fine-tuning phase is due to the prefacience of CNN to small samples, requires a lot of training data, so the I...
R-CNN Regions of CNN features (R-CNN) use depth models to solve object recognition. Specific steps: Sective Search. Selective search is used for each input image to select multiple high quality offer ...
R-CNN is a very classic method in the field of target detection. Compared with traditional manual features, R-CNN introduces a convolutional neural network to extract depth features, followed by a cla...
Mask R-CNN adds a branch to predict the split mask based on Faster R-CNN, as shown in the figure above. The black part is the original Faster-RCNN and the red part is the modification on the Faster-RC...
Interpretation: The difference between ROI Align and ROI Pooling:http://blog.leanote.com/post/[email protected]/b5f4f526490b ROI Align code:https://github.com/katotetsuro/roi_align/b...