Abstract
In this article, a new adaptive data-driven strategy for voice activity detection using empirical mode decomposition is proposed. Speech data are decomposed using an a posteriori, adaptive, data-driven EMD in the time domain to yield a set of physically meaningful intrinsic mode functions. Each IMF preserves the nonlinear and nonstationary property of the speech utterance. Among a set of IMFs, the IMF that contains source information dominantly called characteristic IMF can be identified and extracted by designing a zero-frequency filter-assisted peaking resonator. The detected CIMF is used to compute energy using short-term processing. Choosing proper threshold, voiced regions in speech utterances are detected using frame energy. The proposed framework has been studied on both clean speech utterance and noisy speech utterance. The proposed method is used for voice activity detection in the presence of white noise and shows encouraging result in the presence of white noise up to 0 dB.