Monday, March 19, 2012

Artificial neural networks Day 03

The multilayer feedforward neural network is the workhorse of the Neural Network Toolbox software. It can be used for both function fitting and pattern recognition problems. With the addition of a tapped delay line, it can also be used for prediction problems (see Focused Time-Delay Neural Network (timedelaynet)).

Feedforward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear relationships between input and output vectors. The linear output layer is most often used for function fitting (or nonlinear regression) problems.

On the other hand, if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function (such as logsig). This is the case when the network is used for pattern recognition problems (in which a decision is being made by the network).

The work flow for the general neural network design process has seven primary steps:

  1. Collect data
  2. Create the network
  3. Configure the network
  4. Initialize the weights and biases
  5. Train the network
  6. Validate the network (post-training analysis)
  7. Use the network

The following steps demonstrate how to solve a fitting problem using a feedforward neural network.

1a. Define the input e.g. load house_dataset; % This house_dataset includes the input and target called houseInputs and houseTargets, respectively.
1b. Define the target % houseTargets as the target
2. Define the network architecture i.e. net=feedforwardnet;
3. Configure the network i.e. net=configure(net,houseInputs,houseTargets); % Initialize the weights and biases (you can initialise them again by using net=init(net).
4. Initialize the weights and biases if needed.
5. Train the network i.e. [net,tr]= train(net, houseInputs, houseTargets); %Command train initilise weights and biases as well. The configure command is not used here. 'tr' is the training record.
6. Validate the network i.e. plotperf(tr) % Plot the performance progress
7. Use the network i.e. a = net(houseInputs(:,5)) % Find the output for the input vector at column 5.
OR a = net(houseInputs); % Print all the outputs for the input 'houseInputs'.

In particular, to generate some sample code to reproduce the function fitting examples shown above, you can run the neural fitting GUI, nftool. Select the house pricing data from the GUI, and after you have trained the network, click the Advanced Script button on the final pane of the GUI.

If you are interested in using a multilayer neural network for pattern recognition, use the pattern recognition GUI, nprtool. It will lead you through a similar set of design steps for pattern recognition problems, and can then generate example code demonstrating the many options that are available for pattern recognition networks.

You would normally use Levenberg-Marquardt training for small and medium size networks, if you have enough memory available. If memory is a problem, then there are a variety of other fast algorithms available. For large networks you will probably want to use trainscg or trainrp.

There could be three different error surfaces for a multilayer network. The problem is that nonlinear transfer functions in multilayer networks introduce many local minima in the error surface. As gradient descent is performed on the error surface, depending on the initial starting conditions, it is possible for the network solution to become trapped in one of these local minima. Although a multilayer backpropagation network with enough neurons can implement just about any function, backpropagation does not always find the correct weights for the optimum solution.

Networks are also sensitive to the number of neurons in their hidden layers. Too few neurons can lead to underfitting. Too many neurons can contribute to overfitting, in which all training points are well fitted, but the fitting curve oscillates wildly between these points.

In multilayer networks, sigmoid transfer functions are generally used in the hidden layers. These functions become essentially saturated when the net input is greater than three (exp (−3) ≅ 0.05). If this happens at the beginning of the training process, the gradients will be very small, and the network training will be very slow. In the first layer of the network, the net input is a product of the input times the weight plus the bias. If the input is very large, then the weight must be very small in order to prevent the transfer function from becoming saturated. It is standard practice to normalize the inputs before applying them to the network.

Generally, the normalization step is applied to both the input vectors and the target vectors in the data set. In this way, the network output always falls into a normalized range. The network output can then be reverse transformed back into the units of the original target data when the network is put to use in the field.

Most of the network creation functions in the toolbox, including the multilayer network creation functions, such as feedforwardnet, automatically assign processing functions to your network inputs and outputs. These functions transform the input and target values you provide into values that are better suited for network training.

When training multilayer networks, the general practice is to first divide the data into three subsets. The first subset is the training  set, which is used for computing the gradient and updating the network weights and biases. The second subset is the validation  set. The error on the validation set is monitored during the training process. The validation error normally decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set typically begins to rise. The network weights and biases are saved at the minimum of the validation set error.

Mean Squared Error is the average squared difference between outputs and targets. Lower values are better. Zero means no error.

Regression R values measure the correlation between outputs and targets. An R value of 1 means a close relationship, 0 a random relationship.

Regression for the output? How to compute?
What is training state?
What are nonlinear regression problems?

References
MATLAB 2011b Help Documentation

No comments:

Post a Comment

Mounting USB drives in Windows Subsystem for Linux

Windows Subsystem for Linux can use (mount): SD card USB drives CD drives (CDFS) Network drives UNC paths Local storage / drives Drives form...