An autoencoder is an artificial neural network that is capable of learning various coding patterns. The simple form of the autoencoder is just like the multilayer perceptron, containing an input layer or one or more hidden layers, or an output layer. The significant difference between the typical multilayer perceptron and feedforward neural network and autoencoder is in the number of nodes at the output layer. In the case of the autoencoder, the output layer contains the same amount of nodes as in the input layer. Instead of predicting target values as per the output vector, the autoencoder has to predict its inputs. The broad outline of the learning mechanism is as follows.
For each input x,
Do a feedforward pass to compute activation functions provided at all the hidden layers and output layers
Find the deviation between the calculated values with the inputs using the appropriate error function
Backpropagate the error to update weights
Repeat the task till satisfactory output.
If the number of nodes in the hidden layers is fewer than the input/output nodes, then the activations of the last hidden layer are considered as a compressed representation of the inputs. When the hidden layer nodes are more than the input layer, an autoencoder can potentially learn the identity function and become useless in the majority of the cases.
Convolutional Neural Networks
A convolutional neural network (CNN) is another variant of the feedforward multilayer perceptron. It is a type of feedforward neural network, where the individual neurons are ordered in a way that they respond to all overlapping regions in the visual area.
Deep CNN works by consecutively modeling small pieces of information and combining them deeper in the network. One way to understand them is that the first layer will try to identify edges and form templates for edge detection. Then, the subsequent layers will try to combine them into simpler shapes and eventually into templates of different object positions, illumination, scales, etc. The final layers will match an input image with all the templates, and the final prediction is like a weighted sum of all of them. So, deep CNNs can model complex variations and behavior, giving highly accurate predictions.
Such a network follows the visual mechanism of living organisms. The cells in the visual cortex are sensitive to small subregions of the visual field, called a receptive field. The subregions are arranged to cover the entire visual area, and the cells act as local filters over the input space. The backpropagation algorithm is used to train the parameters of each convolution kernel. Further, each kernel is replicated over the entire image with the same parameters. There are convolutional operators which extract unique features of the input. Besides the convolutional layer, the network contains a rectified linear unit layer, pooling layers to compute the max or average value of a feature over a region of the image, and a loss layer consisting of application-specific loss functions. Image recognition and video analysis and natural language processing are major applications of such a neural network.