A research notes/proposal
Jie Bao
Dept of Computer Science
Iowa State University
Ames, IA 50010
baojie@cs.iastate.edu
2002-12-21
Had neural network finished its golden ages? Since the tasks it can undertake are quite limited, can it really be a tool as wonderful as we thought in the early 1990s?
When some people loss confidence or just look neural nets as another fancy theory but can hardly handle the real world problem, is it because we have paid too much expectation on a child? Compared with the biological neural network, the artificial neural network(ANN) is so simple that, if we look the brain as a computer, the most complex neural network of today can only be analogized to a counter or at most a simple ALU. If we don't criticize a logical circuit because it can only undertake certain simple task, why we are too strict on contemporary ANN?
It may be due to the excessive excitation in the early 1980s when BP and other new kinds of ANN were proposed. When the non-linear mapping ability of ANN was found, it revived from the dark age after the Minsky's critique on perceptron. Dozens of ANN were then proposed during that decade and were applied in various areas, such as control, memory, classification, forecasting, and so on. However, the omnipotence of ANN was somehow exaggerated since it usually no more than a nonlinear mapping(BP) or nonlinear dynamic equation( eg. Hopfield). When people recovered from the "neural network fever", some of them felt disappointed and began moving their eyes on other new approaches.
What ANNs can do and what they can't do? Are they structurally complex enough to handle real world problems such as face recognition or automatic driving? Contemporary ANNs share some common characteristics:
- Their basic unit is usually uniform, or of very limited types.
-
The units are organized also in a somehow uniform way. That means the connection
density between neurons is usually uniform in different part of the neural
net. (ART, SOM and some competitive neural networks are less uniform)
-
The scale of network can't be too large, usually less than thousand in engineering
application.
- No high-level hierarchy in the network, different
parts in the net are usually functional same or belong to only several
fixed types.
Such a network actually can hardly be called a "neural network" if compared with the complexity of biological neural network. Its complexity may be even less than the complexity of a single hypercolumn in the cortex. Since even the biological neuron group of similar complexity is not powerful enough to be an "intelligent" unit, why we expect contemporary simple ANN can do better than it?
Brain is a system of 1010 neurons and has obviously hierarchical architecture. The top hierarchy is the division of Cerebellum, Thalamus, Hypothalamus, Cortex, Hippocampus, and Cerebrum. The Cerebrum is also divided into two hemispheres; left hemisphere is responsible for logical reasoning and the right one is good at conception association. . The cortex of brain is also divided into many functional areas, such as vision area and olfaction area. In each of those areas, cells are arranged in a very orderly way: they are organized in narrow columns, which work together for special stimulation or task.
This type of hierarchical organization has many advantages compared with classical ANN.
Reading: Macrostructure of brain
Human Physiology Chapter 8 CNS, School of Medicine and Health Sciences, University of North Dakota,
Functional Areas of the Brain: Mid-sagittal view |
||||||||||||||||||||||||
Reading: Microstructure of brain
Hypercolumns in Visual Cortex Hubel & Wiesel (1977) - visual cortex organised into 2 x 2 mm hypercolumns. Each hypercolumn represents a region of the visual field.Each hypercolumn contains a complete set of orientation columns from Visuelle Kognition, Adrian Schwaninger |
It's natural to simulate the hierarchy of the brain in ANN. Or say, we can use simple neural networks or neuron groups as basic units to construct more complex neural network, and the learning task is allotted at global and local levels.
Actually, such methods had already been applied in both structural and functional design of ANN.
In the functionally hierarchical design, the learning task is divided into smaller pieces and each of them is undertaken by a single net; the learning result will then be combined together to give an overall final result. Research of this approach is mostly on how to decompose tasks: to extract primary component/feature, to minimize interaction between subtasks, and to divide knowledge-base and learning parts. Modular neural network is focused on this approach (see Modular Learning in Neural Network by Tomas Hrycej).
The structurally hierarchical design emphasizes on connect smaller network into a big one. The member network can be homogeneous(like in ensemble learning) or heterogeneous(like in hybrid learning). By this way, we can construct a "network of networks" and it can fulfill task better than any sub network, or, in some cases, can do a complex work that any of it's sub networks can't do.
Reading: Different hierarchy design in traditional neural network |
Actually, even some basic neural nets also have hierarchical structure. For example, the MP model - Perceptron ¨C Multi-layer Perceptron ¨CBP network sequence is a typical hierarchy architecture. However, they still have limited performance on robustness and learning with large scale dataset. Because the connection in such network is uniform, knowledge's inner structure can't be utilized to simplify the model. Large volume high-dimension dataset needs so many nodes that the network will be very difficult to learn and work.
Reading: Hierarchy in Feed forward network
Evolutionary Tree of Neural Network, by Jie Bao, 2002-03-07
Hierarchy O: Hierarchy II: - It's a group of paralleling connected multi-layer
percetrons The core of backpropagation learning is the backward distribution of errors. In fact, it follows the same approach of percetron: gradient descent. The learning process can be viewed as the cooperation of a group of multi-layer perptrons. |
Our task is to
- Determine a general method for task decomposition among subnets
-
Try different connection between neural nets: language way (as in agent)
or stimulus way (as weight)
- Use neural net as agent in a hierarchical
agent system.
The learning algorithms may be different for specific applications and neural networks, but the general approach should be the same(see the hierarchical competition for synergetic neural network below). Just like we use subroutines in software engineering and use encapsulated chips in hardware engineering, we can try to make neural net as "Intelligent block" in our intelligence engineering.
The hierarchy nature is quite friendly to distributed and incremental learning. Future application research of HierNN can be focused on this problem.
Reading: Hierarchical Competition in Synergetic Neural Networkby Jie Bao, June 2001 By the discussion about matching subnet, we can find the classic Haken model is not eligible for very large scale dataset because of the difficulty in constructing adjacent vector. We can solve this problem by hierarchical competition. Other than a sole global competition among all order parameters, we can introduce local competition into the evolution process ( also the competition process) of order parameters when the dataset is high-dimensional and large-scale. There are two possible approaches. 1) Hierarchical Competition in Matching layer ( sub-domain competition) Classify prototype patterns into groups in the matching subnet. The competition first begins among groups and then the winner order parameter (represents the group / sub domain the input vector may be in) will join local competition inside that sub domain. 2) Hierarchical Competition in Competition Layer ( Local Competition between order parameters ) From the perspective of space cost, since the similarity between most of patterns is quite low, we can convert differential function groups into matrix sparse matrix differential equation and then solve these lower dimensional equations. All those two approaches are equivalent to decompose the synergetic neural network into some sub network, each sub net does a local computation( lower dimensional) and then joins the global higher dimensional competition |
Appendix: Implementation of Neural Network
Copyright, Jie Bao, since Oct 2000
[Return to Jie Bao's Homepage]