XOR Introduction to Neural Networks, Part 1 AI, or something like that

API | Posted on 17.04.2024 |

As I said, there are many different kinds of activation functions — tanh, relu, binary step — all of which have their own respective uses and qualities. For this example, we’ll be using what’s called the logistic sigmoid function. xor neural network The most important thing to remember from this example is the points didn’t move the same way (some of them did not move at all). That effect is what we call “non linear” and that’s very important to neural networks.

Attempt #1: The Single Layer Perceptron

As our XOR problem is a binary classification problem, we are using binary_crossentropy loss. In this example, it is only one output and 5 inputs, but it could be any number. The number of inputs and outputs is usually defined by your problem, the intermediate is to allow it to fit more exact to what you need (which comes with some other implications).

Classification

Next, we will use the function np.random.shuffle() on the variable index_shuffle. Similarly, if we were to use the decision boundary line for the NAND operator here, it will also classify 2 out of 3 points correctly. To better visualize the above classification, let’s see the graph below.

Weights and Biases

Like I said earlier, the random synaptic weight will most likely not give us the correct output the first try. So we need a way to adjust the synpatic weights until it starts producing accurate outputs and “learns” the trend. Notice the artificial neural net has to output ‘1’ to the green and black point, and ‘0’ to the remaining ones. In other words, it need to separate the green and black points from the purple and red points.

While taking the Udacity Pytorch Course by Facebook, I found it difficult understanding how the Perceptron works with Logic gates (AND, OR, NOT, and so on). I decided to check online resources, but as of the time of writing this, there was really no explanation on how to go about it. So after personal readings, I finally understood how to go about it, which is the reason for this medium post.

Training algorithms would adjust weights based on input data and its prediction error. It is an advanced computational model mostly used to simulate how the human brain works. The network is organized in the field of artificial intelligence (AI).

To solve the XOR problem, we need to introduce multi-layer perceptrons (MLPs) and the backpropagation algorithm. MLPs are neural networks with one or more hidden layers between the input and output layers. These hidden layers allow the network to learn non-linear relationships between the inputs and outputs.

Transfer learning reduces the amount of training data required and speeds up the training process. It also improves the accuracy of models by leveraging knowledge learned from related tasks. In the case of XOR problem, transfer learning can be applied by using pre-trained models on similar binary classification tasks.

The classic multiplication algorithm will have complexity as O(n3). However, unsupervised learning techniques may not always provide accurate results compared to supervised learning techniques that rely on labeled examples. Although RNNs are suitable for processing sequential data, they pose a challenge when it comes to solving the XOR problem. This is because XOR problem requires memorizing information over long periods of time which is difficult for RNNs.

Then “1” means “this weight is going to multiply the first input” and “2” means “this weight is going to multiply the second input”. A clear non-linear decision boundary is created here with our generalized neural network, or MLP. Neural networks fall under supervised learning in machine learning, where the network is trained on a labeled dataset to predict new, unseen data. Developing neural networks involves exciting trips across intelligence, math, engineering, and neuroscience. Thanks to this path, they have evolved from simple to complex systems.

First, we’ll have to assign random weights to each synapse, just as a starting point. We then multiply our inputs times these random starting weights. We start with random synaptic weights, which almost always leads to incorrect outputs. These weights will need to be adjusted, a process I prefer to call “learning”. Let us understand why perceptrons cannot be used for XOR logic using the outputs generated by the XOR logic and the corresponding graph for XOR logic as shown below. We’ll initialize our weights and expected outputs as per the truth table of XOR.

This architecture, while more complex than that of the classic perceptron network, is capable of achieving non-linear separation.
And now let’s run all this code, which will train the neural network and calculate the error between the actual values of the XOR function and the received data after the neural network is running.
It also improves the accuracy of models by leveraging knowledge learned from related tasks.
Perceptrons got a lot of attention at that time and later on many variations and extensions of perceptrons appeared with time.

Feedforward neural networks are a type of artificial neural network where the information flows in one direction, from input to output. For example, we can take the second number of the data set. The hidden layer h1 is obtained after applying model OR on x_test, and h2 is obtained after applying model NAND on x_test. Then, we will obtain our prediction h3 by applying model AND on h1 and h2. Now that we have created the class for the Logistic regression, we need to create the model for AND logical operator.

Using a random number generator, our starting weights are $.03$ and $0.2$. As we move downwards the line, the classification (a real number) increases. When we stops at the collapsed points, we have classification equalling 1. All the previous images just shows the modifications occuring due to each mathematical operation (Matrix Multiplication followed by Vector Sum). Because their coordinates are positive, so the ReLU does not change their values. It happened because their negative coordinates were the y ones.

Empirically, it is better to use the ReLU instead of the softplus. Furthermore, the dead ReLU is a more important problem than the non-differentiability at the origin. Then, at the end, the pros (simple evaluation and simple slope) outweight the cons (dead neuron and non-differentiability at the origin). If you https://forexhero.info/ want to read another explanation on why a stack of linear layers is still linear, please access this Google’s Machine Learning Crash Course page. Following the development proposed by Ian Goodfellow et al, let’s use the mean squared error function (just like a regression problem) for the sake of simplicity.

The input vector $x $ is then turned to scalar value and passed into a non-linear sigmoid function. This sigmoid function compresses the whole infinite range into a more comprehensible range between 0 and 1. For the XOR gate, the truth table on the left side of the image below depicts that if there are two complement inputs, only then the output will be 1.

Attempt #1: The Single Layer Perceptron

Classification

Weights and Biases

Leave a Comment Cancel Reply