Hi

I'm working on implementing a neural network, but I'm having trouble calculating the error gradient. The problem is I don't know much about calculus and can't understand what exactly to do.

I found this Web page that explains it quite well, but I still just can't get it.
http://www.willamette.edu/~gorr/classes/cs449/linear2.html

Basically the part I'm trying to implement is the last function in that table.
delta weight = u * (t sub o - y sub o) * y sub i
I know that u is the learning rate, that t is the target, and that yo is the actual output. I don't understand why it's multiplied by yi (which i presume is the input). Is it the input received by the node in question or is it something else?

Any help is greatly appreciated

ps. I posted this in C++ because my implementation is in C++

2
Contributors
2
Replies
5
Views
9 Years
Discussion Span
Last Post by bitRAKE

Yes, in [TEX] \Delta w_i = \mu(t_o - y_o)y_i [/TEX], the term $$y_i$$ is an individual input and $$w_i$$ is the weight that is applied to that particular input.

The notation in the tutorial that you linked to is confusing because they use $$y_o$$ to represent the output, and also use $$y$$ to represent the input vector (with $$y_i$$ representing each individual input).

You seem to understand the first part of the derivative (equation 4).

To see why the partial derivative in equation (5) equals $$y_i$$, remember that [TEX]w[/TEX] is a vector of weights, [TEX]y[/TEX] is a vector of inputs and the output $$y_o$$ is the dot product [TEX]wy[/TEX]. So

$$y_o = w_1y_1 + w_2y_2 + ... + w_iy_i + ... + w_ny_n$$

But when you take a partial derivative of $$y_o$$ with respect to [TEX]w_i[/TEX], all of the other [TEX]w_j[/TEX] terms are constants, and all of the [TEX]y_j[/TEX] terms are constants. The derivative of $$w_1y_1$$ wrt $$w_i$$ equals 0, and similarly for all the other terms except for $$w_iy_i$$, so all that's left is
$$\frac{\partial y_o}{\partial w_i} = \frac{\partial}{\partial w_i}(w_iy_i) = y_i$$

Hope that helps.