TY - JOUR
T1 - Neural network with unbounded activation functions is universal approximator
AU - Sonoda, Sho
AU - Murata, Noboru
N1 - Funding Information:
The authors would like to thank the anonymous reviewers for fruitful comments and suggestions to improve the quality of the paper. The authors would like to express their appreciation toward Dr. Hideitsu Hino for his kind support with writing the paper. This work was supported by JSPS KAKENHI Grand Number 15J07517 .
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2017/9
Y1 - 2017/9
N2 - This paper presents an investigation of the approximation property of neural networks with unbounded activation functions, such as the rectified linear unit (ReLU), which is the new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions. By showing three reconstruction formulas by using the Fourier slice theorem, the Radon transform, and Parseval's relation, it is shown that a neural network with unbounded activation functions still satisfies the universal approximation property. As an additional consequence, the ridgelet transform, or the backprojection filter in the Radon domain, is what the network learns after backpropagation. Subject to a constructive admissibility condition, the trained network can be obtained by simply discretizing the ridgelet transform, without backpropagation. Numerical examples not only support the consistency of the admissibility condition but also imply that some non-admissible cases result in low-pass filtering.
AB - This paper presents an investigation of the approximation property of neural networks with unbounded activation functions, such as the rectified linear unit (ReLU), which is the new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions. By showing three reconstruction formulas by using the Fourier slice theorem, the Radon transform, and Parseval's relation, it is shown that a neural network with unbounded activation functions still satisfies the universal approximation property. As an additional consequence, the ridgelet transform, or the backprojection filter in the Radon domain, is what the network learns after backpropagation. Subject to a constructive admissibility condition, the trained network can be obtained by simply discretizing the ridgelet transform, without backpropagation. Numerical examples not only support the consistency of the admissibility condition but also imply that some non-admissible cases result in low-pass filtering.
KW - Admissibility condition
KW - Backprojection filter
KW - Bounded extension to L
KW - Integral representation
KW - Lizorkin distribution
KW - Neural network
KW - Radon transform
KW - Rectified linear unit (ReLU)
KW - Ridgelet transform
KW - Universal approximation
UR - http://www.scopus.com/inward/record.url?scp=84960887035&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960887035&partnerID=8YFLogxK
U2 - 10.1016/j.acha.2015.12.005
DO - 10.1016/j.acha.2015.12.005
M3 - Article
AN - SCOPUS:84960887035
SN - 1063-5203
VL - 43
SP - 233
EP - 268
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
IS - 2
ER -