Environmental sound classification ( ESC) has become a very important research direction. However,the task of ESC becomescomplicated due to the?variety of environmental sounds,which cannot be characterized uniformly,and the susceptibility to noise. In orderto improve the recognition accuracy of?
ESC task,a classification method based on multi-channel feature and mixed attention model isproposed. Firstly,the ESC signal is converted into time-frequency,and the spectral features are extracted by a variety of filters,which arereconstructed into a three-channel feature map. Multi-channel features?
can make use of the complementarity between features to make upfor the lack of single feature information representation. Secondly, a hybrid classification model consisting of channels and time -frequency attention modules is introduced. The channel attention module calculates the feature map and assigns weights to differentchannels. The channel features with more valid information and better resolution for this type of sound will be assigned more weights.
The time-frequency attention module will focus on more valid information in the time domain and frequency domain. The proposedmethod can suppress the background noise,eliminate the redundancy,and improve the convergence speed and classification accuracy. Thecomparison experiment shows that the recognition accuracy reaches 96. 25% and 89. 56% on ESC-10 and ESC-50 datasets respectively,and 98. 40% on Urbansound8k datasets.