Bird species recognition using HMM-based unsupervised modelling of individual syllables with incorporated duration modelling

Peter Jancovic, Munevver Kokuer, Masoud Zakeri, Martin Russell

Research output: Contribution to conference (unpublished)Paperpeer-review

7 Citations (Scopus)

Abstract

This paper presents an HMM-based automatic system for recognition of bird species from audio field recordings. It includes an improved unsupervised modelling of individual bird syllables and duration modelling. The acoustic signal is decomposed into isolated segments, each segment containing a temporal evolution of a detected sinusodal component. Modelling of bird syllables is performed using Hidden Markov models (HMMs). A set of syllables of bird vocalisations is discovered in an unsupervised manner by employing dynamic time warping and agglomerative hierarchical clustering. A novel iterative maximum likelihood procedure is used to train individual HMMs for syllables of each species. Modelling of the state duration is employed in a post-recognition stage by combining the likelihood of the acoustic and duration modelling. Experiments are performed on over 33 hours of field recordings, containing 30 bird species. Evaluations demonstrate that the use of the proposed unsupervised iterative HMM training procedure and the duration modelling provides in average 45% error rate reduction. The presented system recognises bird species with accuracy of 97.8% using 3 seconds of the detected signal.
Original languageEnglish
Number of pages5
DOIs
Publication statusPublished - 20 Mar 2016
EventIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016 - China, Shanghai, China
Duration: 20 Mar 201625 Mar 2016
http://www.icassp2016.org

Conference

ConferenceIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16
Internet address

Fingerprint

Dive into the research topics of 'Bird species recognition using HMM-based unsupervised modelling of individual syllables with incorporated duration modelling'. Together they form a unique fingerprint.

Cite this