tags: Pytorch assault Machine learning Depth study artificial intelligence algorithm python
@TOC
Summarize the advantages and disadvantages of comparison of MSE loss functions, MAE loss functions, and smooth l1_loss loss functions
Mean Square Error, MSEIt is the most common error in the returns loss function, which is between the predicted value f (x) and the target value y.Difference square and meanThe formula is as follows:

The picture belowMean square errorThe curve distribution of the value, where the minimum value is the position of the predicted value is the target value. We can see more rapidly increases with the increase in the loss of the error.

If there is an off group point in the sample, MSE will give a higher weight of the group point, which sacrifices the prediction effect of other normal point data, and ultimately reducing the overall model performance. As shown below:

visible,Using the MSE loss function, the influence of the group is large, although only 5 out of the sample, butFitted straight lineStill comparisonBe biased。
Average absolute error (MAE)Is another common regression loss function, it is the target value and the predicted valueWestern absolute value and meanRepresented the average error amplitude of the predicted value, without the need to consider the direction of the error (Note:Average deviation error MBEIt is the error in the direction considering, the residual and the range is 0 to ∞, the formula is as follows:

The effect of MAE is better than the MSE for data on the outstanding data on the above.

Obviously, using the MAE loss function, the impact of the group points is small, and the fitted straight line can better characterize the distribution of normal data.
If the off group (anomaly value) needs to be detected, you can select MSE as a loss function; if the outbound point is only treated as a damaged data, MAE can be selected as the loss function.
In shortMAE is more stable as the loss function and is not sensitive to the outgamous value, but its derivative is discontinuous, and the efficiency is low. In addition, in deep learning, the convergence is slow. The MSE derivative is highly solved, but it is sensitive to the outgamous value, but the derived derivative of the extent can be used to avoid this.
Under certain circumstancesThe above two loss functions cannot meet the needs. For example, if the target value of 90% of the sample in the data is 150, 10% is left between 0 and 30. Then the model using MAE as a loss function may ignore 10% of the abnormal point, and the predicted value of all samples is 150. This is because the model is predicted in meditile. The model using MSE will give a lot of predicted values between 0 and 30 because the model is offset to an abnormal point.
In this situationBoth MSE and MAE are unsuitable, simple ways to transform target variables, or use other loss functions, such as huber, log-cosh, and positioner loss.



The loss functions used in the FASTER R-CNN and the return of the border are SMOOTH (L_1) as a loss function. In fact, as the name suggests, SMOoth L1 is said to L1 after the smooth, said that the shortcomings of L1 losses is that there is a discount, not smooth, how do you make it smooth?
SMOOTH L1 loss functionfor:

among them,

Smooth L1 can limit gradient from two aspects:

As can be seen from the above, the function is actually a segment function, actually between [-1, 1], which is l2 loss, which solves the loss of L1, outside the [-1, 1] interval, In fact, L1 losses, which solves the problem of outbound gradient explosion.
Pytorch implementation 1
torch.nn.SmoothL1Loss(reduction='mean')
Pytorch implementation 2
def _smooth_l1_loss(input, target, reduction='none'):
# type: (Tensor, Tensor) -> Tensor
t = torch.abs(input - target)
ret = torch.where(t < 1, 0.5 * t ** 2, t - 0.5)
if reduction != 'none':
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
return ret
You can also add a parameter BETA to control, what range of errors use MSE, what is the error used by MAE.
Pytorch implementation 3
def smooth_l1_loss(input, target, beta=1. / 9, reduction = 'none'):
"""
very similar to the smooth_l1_loss from pytorch, but with
the extra beta parameter
"""
n = torch.abs(input - target)
cond = n < beta
ret = torch.where(cond, 0.5 * n ** 2 / beta, n - 0.5 * beta)
if reduction != 'none':
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
return ret
For most CNN networks, we generally use L2-Loss instead of L1-Loss because L2-LOSS convergence speed is much faster than L1-LOSS.
For border prediction regression, the square loss function (L2 loss) can also be selected, but the disadvantage of the L2 norm is that when there is an outlier, these points will account for the main components of LOSS. For example, the true value is 1, predicted 10 times, there is a forecast value of 1000, the remaining predicted value is about 1, obviously the Loss value is mainly determined by 1000. Therefore, FASTRCNN uses a slightly slower absolute loss function (SMOoth L1 loss), which is growing as the error is raised, not the square growth.
The difference between the SMOoth L1 and L1 Loss functions is that the derivative of L1 LOSS is not unique at 0 points, which may affect convergence. SMOoth L1 solution is to use a square function near 0 points to make it smoother.
Advantages of SMOoth L1
Manually realized the forward and backward transmission of a full connection neural network with 10 hidden layers (mainly completing school homework ... tat) Handwriting the RELU activation function a...
MSELOSS loss function Chinese name is: mean square loss function, the formula is as follows: (xi-yi) square The dimensions of Loss, X, Y here are the same, which may be a vector or matrix, i is a subs...
All learning algorithms in deep learning must have a function that minimizes or maximizes, called a loss function, or "objective function" or "cost function". The loss function is ...
Fully connected layer solves MNIST: Only one fully connected layer solves the MNIST dataset Neural Network Propagation: Explained the weight update process This series of articles is to summarize the ...
1. Bceloss The code implementation provides two ways: one is to call the official NN.Bceloss API, and the other is the custom function implementation: 2. Celoss The calculation process of the CE descr...
Deep learning - loss function Before due to personal laziness made a very poor piece of paper, now correct the error, send a note is learning tensorflow In the process of constructing the neural netwo...
1.tf.nn.l2_loss tf.nn.l2_loss(t, name=None) L2 Loss.Computes halfthe L2 normofa tensorwithoutthesqrt: The function of this function is to use the L2 norm to calculate the error value of the tensor, bu...
Classification problems and regression problems are two major categories of supervised learning. The commonly used loss function for classification problems is cross entropy tensorflow implementation ...
Loss function: In machine learning tasks, most supervised learning algorithms will have an objective function (Objective Function), and the algorithm optimizes the objective function, which is called ...