Neural Network Algorithm and Python (5): Prepare a Neural Network

reference

Python neural network programming (by Tariq Rashid)

Gradient descent method

In the previous section, we have briefly introduced the basic idea of the gradient descent method in neural networks (that is, the step size of the weight update is guided by the rate of change of the error. Of course, in reality, we also need to consider the learning rate, that is, then Multiplied by a parameter).
So how to use this principle in the algorithm, obviously the core problem in the implementation of the algorithm is how to find the error rate of change. When it comes to the rate of change, the derivative is naturally the choice that one should make. The problem has become very detailed here.

We might as well stand at a high point to see our current progress. Imagine that we have established a neural network according to the general hierarchical model, and selected the S function as the activation function of each neuron, and randomly selected some numbers as the weights of each transmission chain.

Next, we care about how to guide us to update the functions of each transfer chain through known inputs and outputs. Finally, we determined the method of using back propagation errors, so that originally we only had errors in the output layer (that is, the error of each neuron in the output layer is already), in the back propagation method (feedforward error through the weight of the transfer chain ), Each neuron in each layer of the entire neural network has an error value (under a set of inputs and outputs).

At this point, the question becomes, how should each neuron update the transmission chain weights with errors. Obviously,The error of each neuron is determined by the weight of all the transmission chains connected to it. Then the weight of each input chain has an effect on the neuron,And what we actually need to do is to update the weight of each chain, so we actually find the partial derivative when we derive. That is, the error of the neuron finds the partial derivative of the weight of the input chain connecting the neuron on a certain day.

The formula shown in the sky, that is, the error E = tk-ok of a certain neuron calculates the partial derivative result of the weight Wj, k of an input chain connected to it. oj is the value of this neuron input to the transmission chain, and the result of the sum is the input received by the neuron (including other chains).

Prepare data

The value calculated from the above formula (that is, the rate of change of the error for a chain) is multiplied by the learning rate L we choose to obtain the expected adjustment of the chain weight. In this tedious process, the matrix and computer can help us quickly update the new weights of all the chains.

Next, we care about how to properly select the appropriate input and initial weights for our neural network. Observe the S function in the figure below:

We will find that the S function only has a larger slope in the interval (-1, 1). Since we chose the gradient descent method, we hope that the input value acts on the interval where the slope of the S function changes greatly, so as to help our neural network learn.

So once we choose the S function as the activation function, we should try to keep the input data within the range of (-1, 1). Similarly, observing the output of the S function, we will find that if the input we choose is in the range (-1, 1), then our output will not exceed 1. So in general, the output target value we use is (0.01 to 0.99). Because 0.0 and plus or minus 1.0 are too ideal, they are generally not selected.

Because the input and output values are relatively small, we should not set too large values when setting the initial weights to avoid saturation of the network (that is, the activation function is too flat), so it is generally chosen between -1.0 and 1.0.

In order to maintain the basic characteristics of the input, the mathematician rigorously calculated the rule that the weight should be the square root of the total number of neuron afferent chains and take the reciprocal down sampling. If a neuron has three input chains, the initial weight range should be within the range of negative three-thirds to positive thirds. Of course, many details should depend on the situation during the actual operation, such as the choice of activation function.

At this point, the architecture, principles, and data selection of a neural network have been discussed. Next, we can try to use Python to build a real neural network.

Intelligent Recommendation

Neural Network 5 optimizer

Neural network parameter optimizer The method of boot parameter optimization, 1.SGD No momentum W t + 1 = W t − l r ∗ ∂ l o s s ∂ W t W_{t+1}=W_t-lr\;\ast\;\frac{\partial ...

Experiment 5 Neural Network

Code addressGithub Experiment description Road traffic mainly includes two aspects: road passenger traffic and road freight traffic. According to research, the volume of road transportation in a certa...

Recurrent neural network (5)

1. Time Series Model 1.1 Characteristics of time series model Target: Sequence data. For example, text is a sequence of letters and words; speech is a sequence of syllables; video is a sequence of ima...

# Chapter 5 Neural Network

Standard BP The result is: (line label, predicted value, actual value) 0 0.00433807645401 [0] 1 0.00532600009637 [0] 2 0.00427358256359 [0] 3 0.0208781402544 [0] 4 0.0189059379881 [0] 5 0.989732982598...

Figure Neural Network 5

Cluster-GCN Figure Neural Network has been successfully applied to many nodes or side predictive tasks, however, training in the oversized map is still challenging. Mainly face two questions: As the n...