[ ]EAST: An Efficient and Accurate Scene Text Detector

1 Overview

EASTThis is one of my favorite papers related to text detection this year, because a great god has reappeared[2]And easy to learn.

The reason for liking is mainly in the results, there are a few points, the effect is very good without introducing additional data; usePVAThe skeleton processing speed is relatively fast; the experiment is more complete, inCOCO-TextExperiments have also been done. What can be improved, the best effect on different databases is different skeletons,So it's a bit unfair.2There are also cases where the frame is inaccurate, especially if the angle is intuitively feltregressionNot very reliable, this parameter has too much influence, but there is no right to speak without experimenting, it is pure speculation.

From a principle point of view, it mainly combines the following points

1Adopteddirect regression.No designanchorordefault box,Proceed directlydense pixel prediction。

2Network architecture is combinedUType network, which also incorporates pixel segmentationloss。

2 The internetframeframe

First look at the main network, it is relatively clear.From the fourthstageReverse connection tostage1.In order to reduce the amount of calculation, after each feature fusion, there arechannelThe operation of the number, which is the other articlebottleneck.But according to this law, it should bef1Pick up1*1*256with3*3*256, If not added, the number of channels is asymmetric,f1The impact will be greater, but this kind of thing is unclear, subject to experiment.

Then the main part of the text focuses on how to construct the regression target and loss function.

3 Regression target and loss function

3.1Indirect regressionwith Direct regression

Indirect regression reference[1],EASTUseddirect regression, So it is necessary to explain. Generally likefaster rcnn，ssdSuch frames will preset a few or dozensanchorsOr calleddefaultboxBox, as shown in the blue dotted box on the left side of the figure below. These boxes arefeature mapAny point on the same is the same, but the size of the object is ever-changing, soindirect regressionThe method is to learn the green box on the basis of the blue box, the content to be learned is the relative position, considering the gradient explosion,What is the convergence? The objective function and the loss function of regression need to be carefully designed. My rookie is definitely not designed, and it is good to understand. General papers are appliedfast rcnnDefinition inside.

Direct regressionget ridanchor,Directly toground truthRegression, as shown on the right, the goal of regression can be relative to the current pointground truthThe distance relationship between the four vertices requires8The advantage of this variable is that there is no need to regress the angle, because the angle variable is really too important. It is very small, and it is a thousand miles away. The disadvantage is that it can only use the loss of smooth L1. There is no scale invariance.

Now let’s take a closer lookEASTReturn goal.Due toICDAR15In the figure, the slanted quadrilateral is marked. The author tried two methods, one is to regress the quadrilateral, and the other is to fit the quadrilateral using the rotating rectangle, and then return to the rotating rectangle.According to the result, the result of rotating the rectangle is better, so I’m lazy here, just looking at the rotating rectangle.

As shown above,Indicated by the yellow dotted box on the leftgroundtruth,The solid pink frame is the fitted rotating rectangle. The red, blue, green, and yellow arrows indicate the target to be returned, that is, the distance from the point to the four sides, although the regression variable only needs4, But still need to return to perspective. In addition to this, whether you use 8 variables or 5 variables, you must define the starting point of the quadrilateral. With the scheme adopted in the current EAST code, I personally feel that there must be some deficiencies, because within a certain angle range, small changes It will cause a change in the starting point determination, which is not very reasonable. Of course, novices can't think of a solution.

3.2 Definition of loss

3.2.1Regression loss

Let's move on to the definition of regression loss first. For the four distance variables, the author usesIOULoss, possessionscaleImmutability.

The loss of angle is

Trigonometric functions can be normalized to 0 to 1, and in。

Finally, the loss of this part of the regression is the weighted addition of the above two losses, and the weight of the angle loss is10. In the code implementation, the implementer first uses the sigmoid output to normalize to (0, 1), and then multiplies it by the corresponding range. For example, the range of the position is the size, and the angle is more convenient. All the text boxes are limited to -45 In the range of 45 degrees, for example, a 90-degree frame can also be replaced by 0 degrees~

3.2.2 Segmentation loss

Mentioned aboveEASTThe loss of segmentation was used. However, for the sake of robustness, the author has not used all the points inside the rotating rectangle as pixel points of text. Guessing reasons such as the first letter of some words are uppercase and others are lowercase, then many of the marked edges are not text pixels. In addition, because The reason for the fitting of the rotating rectangle is that the rotating rectangle does not fit the text well. The author chose a proportional box inside, referring to the light green box below. But it feels like the other parts of the rotating rectangle are marked asnot careIt's better.

The definition of loss is adoptedHEDIt's more classicweighted sample lossBecause there are more negative samples, a coefficient is added to the loss of positive and negative samples. The more samples in this category, the smaller the weight.

The final integrated loss consists of regression loss and segmentation loss, both with equal weights, both1.

4 Maximum suppression

Found that many related articles now have to design their ownNMS。After reading the code, I feel that it is not affected by soft NMS and other things. The main reason is due to the inaccuracy of the regression. Personally, I think this is more like a big trick.

[1] He W, Zhang X Y, Yin F, et al. Deep DirectRegression for Multi-Oriented Scene Text Detection[J]. arXiv preprintarXiv:1703.08289, 2017.

[2]https://github.com/argman/EAST

Intelligent Recommendation

EAST text detector application

EAST text detector application Introduction Primary introduction Model application Environmental preparation Basic parameters Text detection function application effect Introduction Recently, a projec...

Introduction to the text detector paper --- Scene Text Detector Overview

Introduction to the text detector paper --- Scene Text Detector Overview Overview Methods Summary Pixel based （segmentation） Anchor based Fusion I have recently begun to explore the algorithm of text ...

Text detection actual combat: use OpenCV to implement text detection (EAST text detector)

In this tutorial, you will learn how to use OpenCV to use the East text detector to detect the text in the image. East text detector requires us to run OpenCV 3.4.2 or OpenCV 4 on our system. Thesis o...

Overview of natural scene text detection techniques (CTPN, SegLink, EAST)

The article is reproduced from: Foreword Text recognition is divided into two specific steps: the detection of text and the recognition of text, both of which are indispensable, especially text detect...

Python+opencv+EAST to do natural scene text detection (transfer)

Mark, thank the author for sharing! English original link:https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/ Reminder: Author's implementation of Python's text detectio...