Task: Text detection (can detect slanted text)

contributions
- Proposed an End-to-End full convolutional network to solve the text detection problem
- Can generate geometric annotations in two formats: quardrangles or rotated boxes according to specific applications
- Improved state-of-the-art method
The core idea of the algorithm: the main idea comes fromU-Net, U-shaped structure is used to obtain 1, pixel-level segmentation prediction results. 2. Pixel-level geometric prediction results. According to the results of 1 and 2, the coordinates of the four vertices of each bounding box can be calculated. Then use NMS to delete redundant and duplicate bounding boxes.
Algorithm flow:
- Training phase
- Test phase
- The U-Net network structure based on ResNet is shown in the figure below
Algorithm details:
- How to calculate ground truth?
  - The ground truth corresponding to the score: is to shrink the original bounding box by 0.3r inward according to the short side length r. In fact, I don’t know why this step is necessary to remove noise?
  - Ground truth corresponding to geometry: Let's take RBOX type data as an example, as shown in the figure below. For each point inside the bounding box, we calculate their distance to the top, bottom, left, and right sides, and calculate the angle. For the points outside the bounding box, we set the ground truth to 0.
- Loss function?
  - For score: We are using balanced cross-entropy. This can balance the impact of the imbalance of positive and negative samples. Its definition is shown below. The implementation code is as follows:
    
    $L_{s} = b a l a n c e d - x e n t (\hat{Y}, Y^{*}) = - β Y^{*} l o g (\hat{Y}) - (1 - β) (1 - Y^{*}) l o g (1 - \hat{Y}) β = 1 - \frac{\sum_{y^{*} \in Y^{*}} y^{*}}{| Y^{*} |}$
```
def cross_entropy(y_true_cls, y_pred_cls, training_mask):
    '''
    :param y_true_cls: numpy array
    :param y_pred_cls: numpy array
    :param training_mask: numpy array
    :return:
    '''
    # eps = 1e-10
    # y_pred_cls = y_pred_cls * training_mask + eps
    # y_true_cls = y_true_cls * training_mask + eps
    # shape = list(np.shape(y_true_cls))
    # beta = 1 - (np.sum(np.reshape(y_true_cls, [shape[0], -1]), axis=1) / (1.0 * shape[1] * shape[2]))
    # cross_entropy_loss = -beta * y_true_cls * np.log(y_pred_cls) - (1 - beta) * (1 - y_true_cls) * np.log(
    #     1 - y_pred_cls)
    # return np.mean(cross_entropy_loss)
    eps = 1e-10
    y_pred_cls = y_pred_cls * training_mask + eps
    y_true_cls = y_true_cls * training_mask + eps
    each_y_true_sample = tf.split(y_true_cls, num_or_size_splits=FLAGS.batch_size_per_gpu, axis=0)
    each_y_pred_sample = tf.split(y_pred_cls, num_or_size_splits=FLAGS.batch_size_per_gpu, axis=0)
    loss = None
    for i in range(FLAGS.batch_size_per_gpu):
        cur_true = each_y_true_sample[i]
        cur_pred = each_y_pred_sample[i]
        beta = 1 - (tf.reduce_sum(cur_true) / (FLAGS.input_size * FLAGS.input_size))
        cur_loss = -beta * cur_true * tf.log(cur_pred) - (1-beta) * (1-cur_true) * tf.log((1-cur_pred))
        if loss is None:
            loss = cur_loss
        else:
            loss = loss + cur_loss
    return tf.reduce_mean(loss)
```
  - For geometry: we calculate its IoU loss, which is defined as follows:
    
    $L_{A A B B} = - l o g I o U (\hat{R}, R^{*}) = - l o g \frac{| \hat{R} \cap R^{*} |}{| \hat{R} \cup R^{*} |} w_{i} = m i n (\hat{d_{2}}, d_{2}^{*}) + m i n (\hat{d_{4}}, d_{4}^{*}) h_{i} = m i n (\hat{d_{3}}, d_{3}^{*}) + m i n (\hat{d_{1}}, d_{1}^{*}) \hat{R} = (\hat{d_{1}} + \hat{d_{3}}) * (\hat{d_{2}} + \hat{d_{4}}) R^{*} = (d_{1}^{*} + d_{3}^{*}) * (d_{2}^{*} + d_{4}^{*}) | \hat{R} \cap R^{*} | = w_{i} * h_{i} | \hat{R} \cup R^{*} | = \hat{R} + R^{*} - | \hat{R} \cap R^{*} |$
doubt?
- In the specific implementation, the mask is not upscaled to the size of the original image, but only upscaled to 1/4 the size of the original image. It is said that this can better detect small text information. I don't know the principle?
- In the specific implementation, regarding the loss of the score, the balanced cross-entropy loss is not used but the dice loss is used.

Intelligent Recommendation

Introduction to the text detector paper --- Scene Text Detector Overview

Introduction to the text detector paper --- Scene Text Detector Overview Overview Methods Summary Pixel based （segmentation） Anchor based Fusion I have recently begun to explore the algorithm of text ...

EAST text detector application

EAST text detector application Introduction Primary introduction Model application Environmental preparation Basic parameters Text detection function application effect Introduction Recently, a projec...

[Learning] paper Feature Pyramid Based Scene Text Detector

Feature Pyramid Based Scene Text Detector Publications: ICDAR 2017 Author: MengYi En, Beijing University of Technology Content: OCR, text detection in multi-scale scene Abstract Question: CNN network ...

"TextBoxes++: A Single-Shot Oriented Scene Text Detector" paper notes

1 Overview The method given in this article is to solve the problem of rotating text detection. Therefore, the method TextBoxes++ of the article can detect slanted text. The method of detecting text i...

【paper reading】Semantic Flow for Fast and Accurate Scene Parsing

Article Directory 1 Introduction and Related Work 2 Method 2.1 Network Architectures 2.2 Flow Alignment Module(FAM) 3 Experiments 3.1 on Cityscapes 4 Conclusion Paper address Code address 1 Introducti...

More Recommendation

A rough reading paper: An Efficient and Accurate Algorithm for the Perspecitve-n-PointProblem

Preface: I am an ordinary fourth-year engineering student at the Zhanwu slag level. The reading of this article originated from my graduation project, because I found it very difficult during the read...

Scene Text Detection with Scribble Lines Paper Reading

Scene Text Detection with Scribble Lines Scene text test with doodle lines 2021 AAAI Papers analysis Scene Text Detection with Scribble Lines Core thinking Training and reasoning details Result analys...

[Interpretation of the paper] Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks

Perface Recently, I was curious about the detection problem of large and long text lines in the text detection of the scene. So I investigated the detection results of the ICDAR2017MLT data set and fo...

Text Recognition: EAST paper notes

Foreword The first blog, found himself after reading the papers do not always understand the deep and over a period of time some of the concepts will be blurred, so I plan to in the future for somethi...

Paper reading PixelLink: Detecting Scene Text via Instance Segmentation

originalhttps://arxiv.org/pdf/1801.01315.pdf Abstract The most advanced scene text detection algorithm is based on deep learning, which relies on bounding box regression and performs at least two pred...

[Paper reading] EAST: An Efficient and Accurate Scene Text Detector

Task: Text detection (can detect slanted text)

Intelligent Recommendation

More Recommendation

Copyright DMCA © 2018-2026 - All Rights Reserved - www.programmersought.com User Notice