Are neural networks really intelligent? The first artificial neural network (ANN) was invented in 1958 by psychologist Frank Rosenblatt. It was intended to model how the human brain processed visual data and learned to recognize objects. Other researchers have since used similar ANNs to study human cognition. Artificial neural networks (ANNs) can mimic non-linear models and are directly inspired by, and partially modeled on biological neural networks. They are capable of modeling and processing nonlinear relationships between inputs and outputs in parallel using different activation functions.
ANNs began as an attempt to exploit the architecture of the human brain to perform tasks that conventional algorithms had little success with. They soon reoriented towards improving empirical results, mostly abandoning attempts to remain true to their biological precursors. An artificial neural network consists of a collection of simulated neurons. Each neuron is a node that is connected to other nodes via links that correspond to biological axon-synapse-dendrite connections. Each link has a weight, which determines the strength of one node’s influence on another as shown in the figure below.
The above figure represents probably the simplest ANN known as a feed-forward neural network wherein connections between the nodes do not form a cycle or a loop. The above diagram covers many of the neural networks that have been introduced. Some have prevailed over the others in their evolution. One of the key architecture not mentioned above would be the Convolutional neural network (CNN).
Let us explore some of the different types of neural networks that have been used for different applications so far. Asimov Institute has provided a great diagram summarising different types of neural networks as shown below. We shall follow the evolution of neural networks however we shall not go deep into any one of them here.
A Hopfield network is a form of recurrent artificial neural network popularized by John Hopfield in 1982, but described earlier by Little in 1974. It introduced the concept of connecting the output back to the input cell.
A Boltzmann machine (also called stochastic Hopfield network with hidden units) is a type of stochastic recurrent neural network. They were invented in 1985 by Geoffrey Hinton and Terry Sejnowski. It introduced the concept of a probabilistic hidden cell in addition to back feeding the input cell.
A deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural networks, composed of multiple layers of latent variables, with connections between the layers but not between units within each layer. It used the concepts of probabilistic hidden cells in addition to feeding the output back to the input cell.
Thus Hopfield network, Boltzmann machine, and Deep belief network seem to represent a trend of evolution amongst neural networks wherein concepts of back feeding to input cell, use of probabilistic hidden cell, and matching input-output cells have been progressively used. One could further argue that some of these ideas came from Markov Chain which did use the probabilistic hidden cell concept but not the feedback from output to input. We could observe this chain of evolution from left to right in the bottom of the above figure.
Recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feed-forward neural networks, RNNs can use their internal state (memory) to process variable-length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. RNNs were based on Davis Rumelhart’s work in 1986 that built on the Hopfield networks.
Long short-term memory (LSTM) is a deep learning system that builds on RNN and includes a recurrent memory cell in its architecture. LSTM can learn tasks that require memories of events that happened thousands or even millions of discrete time steps earlier. LSTM works even given long delays between significant events and can handle signals that mix low and high-frequency components. LSTM can learn to recognize context-sensitive languages unlike previous models based on hidden Markov models and similar concepts. LSTM networks were invented in 1997 by Hochreiter and Schmidhuber.
Gated recurrent units (GRUs) use a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho and others. The GRU is like a long short-term memory (LSTM) with a forget gate but has fewer parameters than LSTM. GRU’s performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs have been shown to exhibit even better performance on certain smaller and less frequent datasets. However, GRUs have been known to struggle to learn simple languages that LSTM can learn relatively easily.
Convolutional neural network (CNN) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, and medical image analysis amongst others. Convolutional networks were inspired by a biological vision in the animal visual cortex. A practical application of the CNNs was introduced in 2012 via AlexNet by Alex Krizhevsky et al.
As one can gather from the evolution of artificial neural networks over more than half a century, development in this space has alternated from trying to mimic how a human brain works to achieving high performing neural networks for certain tasks. In this process, the more widely used artificial neural networks today seem to be CNN, RNN, and LSTM amongst a few others. Why have these networks prevailed over others? Is it because they have wider applicability? Is it because they have become more efficient? Or have they packed some intelligence that other networks have not?
Neural networks based algorithms have largely combined machine power and programming platforms for their execution. Processing non-linear patterns that early neural networks were focusing on could be achieved largely with machine power and the use of various activation functions. It is only when neural network models started focusing on text, speech and images it was discovered that machine power and activation functions were not sufficient. This is a fact that Alex Krizhevsky et al demonstrated with AlexNet in 2012. In CNN the concept of ‘convolution’ and pooling was introduced for the first time that significantly improved image recognition. CNN is able to preserve a two-dimensional structure of an image that earlier neural networks could not.
It appears that by focusing on different types of applications, neural networks progressed in their processing ability. The progression came about by mimicking a certain aspect of the human/animal brain and incorporating the same in a certain type of neural network’s processing schema. Clearly, this points to some aspect of intelligence entering into neural networks courtesy of different applications. Hence wider the range of applications, the more intelligent should neural networks used.