Why is it that a perceptron or a single layered neural network can't solve the XOR problem, or problems that are linearly inseparable.

Before someone tells me to use google, I have. All the neural network guides that are online or in books are so overly complicated with masses of equations that describe everything - and the english language is totally abandoned. I can't get my head around why exactly it is that we cannot draw a line between the two and why do we even need to draw a line?!

The perceptron learning algorithm is something else I'm a bit confused on so help would be welcomed!

Also the perceptron itself cannot do XOR but why with a hidden layer with the multilayered perceptron can we distinguish between linearly inseparable problems?

Here is the thing with the "drawing a line". Draw a simple cartesian plot. Consider that 1 = true and -1 = false. Put a dot in the graph for each possibility (1,1), (1,-1), (-1,-1), (-1,1). According to XOR logic, only the points (-1,1) and (1,-1) yield "true" after a XOR. Can you draw a single line on that graph such that the two "true" points are on one side of it and the two "false" points are on the other?

The point with perceptrons is that all they can do is compute a linear combination of the inputs and use that to fire a 0 or 1 output (or through some other basis function). A "linear combination" is another word for a line, in fact, a line is defined by a linear combination of the coordinate variables (in this case, the coordinate variables are the inputs). So, literally, the only thing that a perceptron can do is draw a line and tell you on which side the input lies (0 or 1). So, this is why we say that a perceptron can only deal with a linearly separable problem, because that's all a perceptron does.

If you have a second layer in your ANN, you can get it to do two linear separations (e.g., two lines) and then combine those two outputs to figure out if the XOR output should 1 or 0. So, with one hidden layer, you can solve the XOR problem. It's that simple.

You must understand that ANNs are not magical. I used to think, when I first was interested in this, that ANNs were like "simulating a brain" and that they could do awesome things. In reality, they don't. Yes, they kinda work the same as the neurons in a brain, but the big difference is that the structure of the neurons in the brain is so deep (many many layers), with cross-layer connections, with cycles, with states and dynamics. In other words, a brain-like ANN goes way beyond our current capabilities, both in working out the math and in simulating it.

You must treat ANNs as just one of many techniques in machine learning, and often not a particularly good one. Bayesian inference methods, support-vector machines, clustering methods, locally-weighted regressions, Q-learning, genetic algorithms, etc., etc., are amongst many methods that work very well and are used extensively these days (especially in my field of robotics), you don't see people using ANNs too much.

As for explanation of the learning methods, for ANNs, the learning is almost always a gradient descent method (so called "back-propagation"). You must see it as just that, a gradient descent method, forget about the fluff and the stupid vocabulary used by ANN fanatics.

Get a machine learning textbook, and go in order, by the time you get to ANNs, you'll see what I mean.

Thank you for the very detailed description it certainly helps my understanding a lot. I think the topics of ANNs and GAs is not very difficult but when you read information about it in the papers/books that have been published they overcomplicate simple matters