Abstract
Context:
Recently, the singular points of neural networks have attracted attention from the artificial intelligence community, and their interesting properties have been demonstrated. The objective of this study is to provide an overview of studies on the singularities of complexvalued neural networks.Evidence Acquisition:
This review is based on the relevant literature on complexvalued neural networks and singular points.Results:
Review of the studies and available literature on the subject area shows that the singular points of complexvalued neural networks have negative effects on learning, as do those of realvalued neural networks. However, the nature of the singular points in complexvalued neural networks is superior in quality, and the methods for improving the learning performance have been proposed.Conclusions:
A complexvalued neural network could be a promising learning method from the viewpoint of a singularity.Keywords
1. Context
An artificial neural network is a machine learning model and has recently attracted attention owing to its humanlike intelligence such as learning and generalization (1).
A complexvalued neural network extends (realvalued) parameters (weights and threshold values) in a realvalued neural network to complex numbers. It is advantageous because the goodnatured behavior of a complex number to rotate is assured. Thus, it is suitable for the information processing of complexvalued data and twodimensional data and has been applied to various fields such as communications, image processing, biological information processing, landmine detection, wind prediction, and independent component analysis (ICA) (2). Recently, a complexvalued firingrate model was presented (3), which is an attempt to implement a neural network of an actual brain with complex numbers.
Generally, a hierarchical structure causes singular points. For example, consider a threelayered realvalued neural network. If a weight between a hidden neuron and an output neuron is equal to zero, then no value of the weight vector between the hidden neuron and the input neurons affects the output value of the realvalued neural network. Then, the weight vector is called an unidentifiable parameter, and is a singular point. It has been shown that singular points affect the learning dynamics of learning models and that they can cause a standstill in learning (46).
This paper reviews the current state of studies on the singularities of complexvalued neural networks.
2. Evidence Acquisition
This review is based on the relevant literature on complexvalued neural networks and singular points. Although the results in the literature are obtained by mathematical analyses and computer simulations, the use of mathematical expressions has been avoided as much as possible in this paper for the sake of simplicity.
3. Results
3.1. A Single ComplexValued Neuron
The properties of the singular points of a complexvalued neuron constituting a complexvalued neural network have been described (79). There are two types of complexvalued neurons: a complexvalued neuron whose parameters (weight and threshold) are expressed with orthogonal coordinates (e.g., x + iy) and a complexvalued neuron whose parameters are expressed with polar coordinates (e.g., r exp[iθ]), called a polarvariable complexvalued neuron. It is trivial that the former complexvalued neuron model does not have any singular points. However, the polarvariable complexvalued neuron has many singular points. The singular points bring about various properties in the polarvariable complexvalued neuron model.
Firstly, the parameters of a polarvariable complexvalued neuron are unidentifiable. That is, if an amplitude parameter r is equal to zero, then r exp[iθ] z = 0 holds for any input signal z, and no value of the phase parameter θ affects the output value of a complexvalued neuron. Thus, we cannot identify the value of θ by learning. Therefore, it is verified that θ is an unidentifiable parameter, and a polarvariable complexvalued neuron has an unidentifiable nature.
Secondly, a plateau phenomenon could occur during learning when using the steepest descent method with a squared error. That is a learning period in which the learning error cannot be reduced occurs during learning, where a “learning period” and “learning error” are usually used as proper names in the field of neural networks. Thus, it has been experimentally suggested that unidentifiable parameters (singular points) degrade the learning speed (79).
Finally, it was suggested experimentally that the steepest gradient descent method with an amplitudephase error (10) and the complexvalued natural gradient descent method (11) are effective for improving the learning performance.
3.2. ThreeLayered ComplexValued Neural Network
Since there have been no studies on threelayered complexvalued neural networks consisting of polarvariable complexvalued neurons, threelayered complexvalued neural networks consisting of only complexvalued neurons represented by orthogonal coordinates are discussed in this section.
3.2.1. Local Minima of a ThreeLayered ComplexValued Neural Network
Consider a threelayered complexvalued neural network with L input neurons, H hidden neurons, and one output neuron. The hierarchical structure of the threelayered complexvalued neural network yields three types of redundancies (Figure 1) (12). (a) In the upper part of Figure 1, the hidden neuron j of the complexvalued neural network never influences the output neuron because the weight v_{j} between the hidden neuron j and the output neuron is equal to zero. Thus, we can remove the hidden neuron j. (b) In the middle part of Figure 1, the output of the hidden neuron j of the complexvalued neural network is only a constant k because the weight vector between the input neurons and the hidden neuron j is equal to zero: w_{j} = 0; then, we can remove the hidden neuron j and replace the threshold of the output neuron v_{0}with v_{0} + k. (c) In the lower part of Figure 1, we can remove the hidden neuron j_{2} and replace the weight v_{j1} between the hidden neuron j_{1} and the output neuron with v_{j1} + qv_{j2}, where q = 1, 1, i, or i, and v_{j2} is the weight between the hidden neuron j_{2} and the output neuron because w~j1=qw~j2, where w~j1 is the vector that consists of the weight vector between the input neurons and the hidden neuron j_{1} and the threshold of the hidden neuron j_{1}, and w~j2 is the vector that consists of the weight vector between the input neurons and the hidden neuron j_{2} and the threshold of the hidden neuron j_{2}.
Three Types of Redundancies of a ThreeLayered ComplexValued Neural Network (© 2013, Elsevier. Used With Permission), % Not Yet Obtained
The three types of redundancies described above yield the critical point at which the learning error is unchanged (12). There are three types of critical points: a local minimum, local maximum, and saddle point, which can be identified using the Hessian, as is well known. In the case of realvalued neural networks, the redundancies corresponding to redundancies (a) and (b) of the complexvalued neural network described above inevitably yield saddle points, and the redundancy corresponding to redundancy (c) of the complexvalued neural network described above yields saddle points or local minima according to the conditions (13). Fukumizu and Amari (13) confirmed that the local minima caused 50,000 plateaus using computer simulations, which had a strong negative influence on learning. It was proved that most of local minima that Fukumizu and Amari (13) discovered could be resolved by extending the realvalued neural network to complex numbers; most of the critical points caused by the hierarchical structure of the complexvalued neural network are saddle points, which is a prominent property of the complexvalued neural network (12). Note that such local minima are only those caused by the hierarchical structures of the complexvalued neural network; Local minima of the other types might exist in the complexvalued neural network. Recently, it has been shown that there exists a reducibility of another type (called exceptional reducibility) (14). It is important to clarify how the exceptional reducibility is related to the local minima of complexvalued neural networks.
3.2.2. Learning Dynamics of the ThreeLayered ComplexValued Neural Network in the Neighborhood of Singular Points
The linear combination structure in the updating rule for the learnable parameters of a complexvalued neural network increases the speed moving away from the singular points; the complexvalued neural network could not be easily influenced by the singular points (15).
Consider a 111 complexvalued neural network (one input neuron, one hidden neuron, and one output neuron) and a 212 realvalued neural network (two input neurons, one hidden neuron, and two output neurons) for the sake of simplicity. The number of learnable parameters (weights and thresholds) of the 212 realvalued neural network is seven, which is almost equal to the number of learnable parameters eight of the 111 complexvalued neural network. Thus, the comparison of the learning dynamics using those neural networks is fair.
The average learning dynamics are investigated, assuming that the standard gradient learning method is used. The following are the explanatory equations of the learning dynamics of the two neural networks:
Here, ∅ is a realvalued activation function. A splittype activation function ∅(x) + i∅(y) is used for the complexvalued neuron, where i = √1 and z = x + iy is the net input into the complexvalued neuron. For example, if a = b = c and u_{1} = u_{2} = u_{3}, then Δ (Parameter of the complexvalued neural network) = 2a∅(u_{1}) = 2∆ (Parameter of the realvalued neural network) holds. Moreover, Δ (Parameter of the complex neural network) cannot be easily equal to zero because a∅(u_{1}) is not necessarily equal to zero, even if one term in b∅(u_{2}) is almost equal to zero. Thus, we can assume that the speed of the complexvalued neural network moving away from the singularity is faster than that of the realvalued neural network.
3.2.3. Construction of ComplexValued Neural Networks That do Not Have Critical Points Based on a Hierarchical Structure
It has been shown that the decomposition of highdimensional neural networks into lowdimensional neural networks equivalent to the original neural networks yields neural networks that have no critical points based on the hierarchical structure (16). As for the case of complexvalued neural networks, a 222 threelayered complexvalued neural network can be constructed from a 111 threelayered quaternionic neural network. Such a complexvalued neural network does not comparatively suffer from negative effects caused by singular points during learning because it has no critical points based on a hierarchical structure.
The practical implementation of the 222 complexvalued neural network having no critical points based on a hierarchical structure is as follows.
1. Consider a 111 quaternionic neural network (called NET 1 here). Let the weight between the input neuron and the hidden neuron be A = a + ib +jc +kd ϵ Q and the weight between a hidden neuron and an output neuron be B = α + iB +jγ +kδ ϵ Q, where Q represents the set of quaternions. The quaternion is a fourdimensional number and was invented by W. R. Hamilton in 1843 (17). Let C = p + iq + jr +ks ϵ Q denote the threshold of the hidden neuron and D = µ + iv + jρ +kσ ϵ Q represent the threshold of the output neuron. For a technical reason, we assume that D = 0. The activation functions are defined by the following equations:
for the hidden neuron, and:
for the output neuron. For the sake of simplicity, we omitted the additional assumptions (see (16) for the details).
2. Create a 222 complexvalued neural network (called NET 2 here) by decomposing NET 1 described above, where a quaternion is decomposed into two complex numbers. That is, the quaternion A = a + ib + jc + kd ϵ Q representing the quaternionic weight between the input neuron and the hidden neuron is decomposed into the two complex numbers a′ = a +ib ϵ C and c′ = c + id ϵ C, where C is the set of complex numbers. Here, we used the CayleyDickson notation: the weight A between the input neuron and the hidden neuron of NET 1 can be written using CayleyDickson notation as follows:
where a’ = a + ib ϵ C and c’ = c + id ϵ C.
Similarly, the quaternion B = α + iβ + jγ +kδ ϵ Q representing the quaternionic weight between the hidden neuron and the output neuron is decomposed into the two complex numbers α′ = α +iβ ϵ C and dγ′ = γ + iδ ϵ C. The quaternion C = p + iq + jr + ks ϵ Q representing the quaternionic threshold of the hidden neuron is decomposed into two complex numbers p′ = p + iq ϵ C and r′ = r + is ϵ C. We use the activation function defined by the following equations for NET 2:
for the hidden neuron, and
for the output neuron.
NET 2 has no critical points based on a hierarchical structure (as a complexvalued neural network). See the literature (16) for the proof.
4. Conclusions
The author feels that the research results presented in this paper are probably only scratching the surface of the characteristics of the singularities of complexvalued neural networks. We believe that the results reviewed in this paper will be a clue to analyze the various types of singular points and to provide more excellent complexvalued neural networks. The problem of whether or not the complexvalued neural network has fewer local minima than the realvalued neural network remains unsolved. This is an important but difficult problem. In Section 3.2.3, a method for constructing a neural network that has no critical points based on a hierarchical structure is described. This is a theoretical result, and its empirical study is desired in future work.
Acknowledgements
References

1.
Bengio Y. Learning deep architectures for AI. USA: Now Publishers Inc; 2009. p. 1127.

2.
Nitta T. ComplexValued Neural Networks: Utilizing HighDimensional Parameters. Pennsylvania, USA: Idea Group Inc (IGI); 2009. 504 p.

3.
Schaffer ES, Ostojic S, Abbott LF. A complexvalued firingrate model that approximates the dynamics of spiking networks. PLoS Comput Biol. 2013;9(10). eee1003301. [PubMed ID: 24204236]. https://doi.org/10.1371/journal.pcbi.1003301.

4.
Amari S, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Comput. 2006;18(5):100765. [PubMed ID: 16595057]. https://doi.org/10.1162/089976606776241002.

5.
Wei H, Zhang J, Cousseau F, Ozeki T, Amari S. Dynamics of learning near singularities in layered networks. Neural Comput. 2008;20(3):81343. [PubMed ID: 18045020]. https://doi.org/10.1162/neco.2007.1206414.

6.
Cousseau F, Ozeki T, Amari S. Dynamics of learning in multilayer perceptrons near singularities. IEEE Trans Neural Netw. 2008;19(8):131328. [PubMed ID: 18701364]. https://doi.org/10.1109/TNN.2008.2000391.

7.
Nitta T. On the Singularity of a Single ComplexValued Neuron [in Japanese]. IEICE Trans Inf SystD. 2010;J93D(8):161421.

8.
Nitta T. Plateau in a Polar Variable ComplexValued Neuron. ICAART2014 the 6th International Conference on Agents and Artificial Intelligence. 68 March 2014; Institut Universitaire de Technologie, Angers, France. p. 52631.

9.
Nitta T. Learning dynamics of a single polar variable complexvalued neuron. Neural Comput. 2015;27(5):112041. [PubMed ID: 25774543]. https://doi.org/10.1162/NECO_a_00729.

10.
Savitha R, Suresh S, Sundararajan N. Projectionbased fast learning fully complexvalued relaxation neural network. IEEE Trans Neural Netw Learn Syst. 2013;24(4):52941. [PubMed ID: 24808375]. https://doi.org/10.1109/TNNLS.2012.2235460.

11.
Nitta T. Natural Gradient Descent for Training Stochastic ComplexValued Neural Networks. Int J Adv Comput Sci Appl. 2014;5(7):1938. https://doi.org/10.14569/ijacsa.2014.050729.

12.
Nitta T. Local minima in hierarchical structures of complexvalued neural networks. Neural Netw. 2013;43:17. [PubMed ID: 23466503]. https://doi.org/10.1016/j.neunet.2013.02.002.

13.
Fukumizu K, Amari S. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Netw. 2000;13(3):31727. https://doi.org/10.1016/s08936080(00)000095.

14.
Kobayashi M. Exceptional reducibility of complexvalued neural networks. IEEE Trans Neural Netw. 2010;21(7):106072. [PubMed ID: 20550989]. https://doi.org/10.1109/TNN.2010.2048040.

15.
Nitta T. Learning Dynamics of the ComplexValued Neural Network in the Neighborhood of Singular Points. J Comput Commun. 2014;2(1):2732. https://doi.org/10.4236/jcc.2014.21005.

16.
Nitta T. Construction of Neural Networks that Do Not Have Critical Points Based on Hierarchical Structure. Int J Adv Comput Sci Appl. 2013;4(9):6873. https://doi.org/10.14569/ijacsa.2013.040911.

17.
Gürlebeck K, Habetha K, Sprößig W. Holomorphic functions in the plane and ndimensional space. USA: Springer; 2008.