This paper presents an HMM-based automatic system for recognition of bird species from audio field recordings. It includes an improved unsupervised modelling of individual bird syllables and duration modelling. The acoustic signal is decomposed into isolated segments, each segment containing a temporal evolution of a detected sinusodal component. Modelling of bird syllables is performed using Hidden Markov models (HMMs). A set of syllables of bird vocalisations is discovered in an unsupervised manner by employing dynamic time warping and agglomerative hierarchical clustering. A novel iterative maximum likelihood procedure is used to train individual HMMs for syllables of each species. Modelling of the state duration is employed in a post-recognition stage by combining the likelihood of the acoustic and duration modelling. Experiments are performed on over 33 hours of field recordings, containing 30 bird species. Evaluations demonstrate that the use of the proposed unsupervised iterative HMM training procedure and the duration modelling provides in average 45% error rate reduction. The presented system recognises bird species with accuracy of 97.8% using 3 seconds of the detected signal.
|Number of pages||5|
|Publication status||Published - 20 Mar 2016|
|Event||IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016 - China, Shanghai, China|
Duration: 20 Mar 2016 → 25 Mar 2016
|Conference||IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016|
|Period||20/03/16 → 25/03/16|