, The fastest update of the latest chapter of my super dark technology empire!
For a given perceptron, its weight and threshold are also given, representing a decision strategy. Therefore, we can change this strategy by adjusting the weights and thresholds.
Regarding the threshold threshold, one thing that needs to be pointed out here is that in order to express it more conveniently, it is generally expressed by its inverse number: b=-threshold, where b is called bias.
In this way, the previous rule for calculating output is modified as follows: if w1x1+w2x2+w3x3+...+b>0, output output=1, otherwise output=0.
And the weight w1=w2=-2, then b=3.
Obviously, only when x1=x2=1, output=0, because (?2)*1+(?2)*1+3=?1, which is less than 0. In the case of other inputs, output=1.
So in actual situations, this is actually a "NAND gate"!
In computer science, the NAND gate is a special one among all the door components, and it can express any other door components through combination. This is called NAND gate universality (GateUniversality).
Since the perceptron can express a NAND gate by setting appropriate weight and bias parameters, it can theoretically express any other gate components.
Therefore, the perceptrons can also be connected to each other to form a computer system like the example in the previous three bodies.
But this doesn't seem to be a surprise. We already have a ready-made computer, which just complicates things.
What a single perceptron can do is very limited. To make complex decisions, it is necessary to connect multiple sensors.
However, the actual network may have tens of thousands or even hundreds of thousands of parameters. If you manually configure these parameters one by one, I am afraid this task will never be completed.
The most distinctive feature of neural networks lies here.
We do not specify all the parameters for the network, but provide training data, let the network learn during training, and find the most appropriate value for all parameters in the learning process.
The general idea of operation is this: we tell the network what output we expect when the input is a certain value. Each piece of such training data is called a training example.
This process is equivalent to a specific example when the teacher teaches students some abstract knowledge:
Generally speaking, the more examples we cite, the better we can express that abstract knowledge. This is also true in the training of neural networks.
We can inject thousands of training samples into the network, and then the network automatically sums up the abstract knowledge hidden behind these samples.
The embodiment of this knowledge lies in the value of all weights and bias parameters of the network.
Assuming that each parameter has an initial value, when we input a training sample, it will calculate a unique actual output value based on the current parameter value.
This value may be different from the expected output value. Imagine that at this time, we can try to adjust the value of certain parameters to make the actual output value and the expected output value as close as possible.
After all the training samples are input, the network parameters are adjusted to the optimal values. At this time, the actual output value and the expected output value are infinitely close each time, so the training process is over.
Assuming that during the training process, the network has been able to give the correct (or nearly correct) response to tens of thousands of samples, then input a piece of data it has not seen before, and it should also have a high probability of giving our expectations Decision-making. This is how a neural network works.
But there is another question here. In the training process, when the actual output value and the expected output value are different, how to adjust each parameter?
Of course, before thinking about how to do this, you should also figure out: Is this method feasible to obtain the desired output by adjusting the parameters?
In fact, this method is basically not feasible for perceptron networks.
For example, in the perceptron network with 39 parameters in the figure above, if we keep the input unchanged and we change the value of a certain parameter, the final output is basically completely unpredictable.
It may change from 0 to 1 (or from 1 to 0), or it may remain unchanged. The key to this problem is that input and output are both binary and can only be 0 or 1.
If the entire network is regarded as a function (with input and output), then this function is not continuous.
Therefore, in order to make training possible, we need a neural network whose input and output can be continuous on real numbers. Thus, sigmoid neurons appeared.
Sigmoid neuron (sigmoidneuron) is the basic structure (of course not the only structure) often used in modern neural networks. It is similar in structure to the perceptron, but has two important differences.
First, its input is no longer limited to 0 and 1, but can be any real number between 0 and 1.
Second, its output is no longer limited to 0 and 1, but the weighted sum of each input plus the bias parameter, and a calculation called the sigmoid function is used as the output.
Specifically, assuming z=w1x1+w2x2+w3x3+...+b, then the output =σ(z), where:σ(z)=1/(1+e-z).
σ(z) is a smooth, continuous function. Moreover, its output is also a real number between 0 and 1. This output value can be directly used as the input of the next layer of neurons, keeping it between 0 and 1.
It is conceivable that after assembling the neural network with sigmoid neurons, the input and output of the network become continuous. That is to say, when we make a small change to the value of a parameter, its output is only produced Minor changes. This makes it possible to gradually adjust the parameter value training.
In history, many researchers have tried it. Michael Nielsen's book "NeuralNetworksandDeepLearning" also mentioned this example.
This neural network has only one hidden layer belongs to shallow neural networks (shallowneuralnetworks). And true deep neural networks (deepnerualnetworks), then there will be multiple hidden layers.
The neuron system is designed and manufactured using the design method of the left and right hemispheres.
On the far right is the output layer, with 10 neuron nodes, which represent the recognition results of 0, 1, 2, ..., 9. Of course, due to the limitation of the sigmoid function σ(z), each output must be a number between 0 and 1.
Then after we get a set of output values, which output value is the largest, the final recognition result is it.
During training, the output format is: the correct digital output is 1, the other outputs are 0, and the hidden layer and the output layer are also fully connected.
The neural network has 784*15+15*10=11910 weight parameters, and 15+10=25 bias parameters. The total number of parameters is 11910+25=11935.
This is an amazing number.