Activation Functions
Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark
Last updated
Copyright Continuum Labs - 2023
Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark
Last updated
Activation functions (AFs) are functions that we apply in neural networks after (typically) applying an affine transformation combining weights and input features.
They are typically non-linear functions.
The rectified linear unit, or ReLU, has been the most popular in the past decade, although the choice is architecture dependent, and many alternatives have emerged in recent years. In this section, you will find a constantly updating list of activation functions.
The June 2022 paper begins by highlighting the importance of AFs in introducing non-linearity into neural networks, which is crucial for learning complex patterns and representations from data.
The authors then provide a detailed classification of AFs based on their characteristics and types, including Logistic Sigmoid/Tanh, Rectified Unit, Exponential Unit, Adaptive Unit, and Miscellaneous AFs.
One of the strengths of this survey is its in-depth coverage of each class of AFs.
The authors provide a thorough analysis of the properties and limitations of each AF, along with a discussion of their variants and improvements proposed in the literature. This information is particularly useful for practitioners looking to select an appropriate AF for their specific task and data type.
The paper also presents a clear and concise summary of the advantages and disadvantages of primary AFs, such as Logistic Sigmoid, Tanh, ReLU, and ELU, in terms of key factors like diminishing gradients, limited non-linearity, optimization difficulty, and computational efficiency.
Another valuable contribution of this survey is the performance comparison conducted on benchmark datasets of different modalities using 18 state-of-the-art AFs with various types of networks.
This empirical evaluation provides practical insights into the performance of different AFs in real-world scenarios, which can guide researchers and practitioners in their choice of AFs for specific applications.
The authors also compare their survey with existing surveys and performance analyses, highlighting the comprehensive nature of their work and its importance in the current landscape of deep learning research.
In summary, this comprehensive survey on activation functions (AFs) in deep learning provides a valuable resource for researchers and practitioners. The authors have thoroughly classified and analyzed a wide range of AFs, including Logistic Sigmoid and Tanh based, ReLU based, ELU based, and learning based adaptive AFs. The survey not only covers the theoretical aspects of AFs but also presents an extensive performance comparison on different types of data, such as image, text, and speech, using various state-of-the-art AFs and network architectures.
The authors highlight the strengths and limitations of each class of AFs, providing insights into their properties, such as output range, monotonicity, and smoothness. They also discuss the impact of weight initialization on the performance of AFs and the suitability of different AFs for various types of data and network architectures.
One of the key contributions of this paper is the experimental performance analysis, which compares 18 state-of-the-art AFs on benchmark datasets using different CNN models.
The results provide valuable guidance for practitioners in selecting appropriate AFs for their specific tasks and data types. For instance, the authors find that Softplus, ELU, and CELU perform well with MobileNet, while ReLU, Mish, and PDELU exhibit good performance with VGG16, GoogleNet, and DenseNet.
The convergence analysis of different AFs reveals that parametric AFs, such as PAU, PReLU, and PDELU, show better convergence as they can adapt to the data faster by learning parameters from the data. The authors also highlight the trade-off between accuracy and training time for different AFs, with ReLU, SELU, GELU, and Softplus striking a good balance.
The survey also provides recommendations for selecting AFs based on the insights gained from the analysis.
The authors emphasize the importance of matching the complexity of the AF with the complexity of the model and dataset to avoid overfitting or under-convergence. They suggest avoiding Logistic Sigmoid and Tanh AFs for CNNs due to poor convergence and recommend exploring recently proposed AFs such as Swish, Mish, and PAU for different problems.
In conclusion, this survey offers a thorough and systematic review of activation functions in deep learning, covering a wide range of theoretical and practical aspects. The authors have successfully organized and presented the vast literature on AFs, providing valuable insights and recommendations for the deep learning community. This paper will serve as a valuable reference for researchers and practitioners working on developing and applying deep learning models for various tasks and data types.