Publication: CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry
dc.contributor.affiliation | DA-IICT, Gandhinagar | |
dc.contributor.author | Patil, Hemant | |
dc.contributor.author | Kachhi, Aastha | |
dc.contributor.author | Patil, Ankur T | |
dc.contributor.researcher | Patil, Ankur T (201621008) | |
dc.contributor.researcher | Kachhi, Aastha | |
dc.date.accessioned | 2025-08-01T13:09:02Z | |
dc.date.issued | 27-10-2023 | |
dc.description.abstract | Infant cry classification is an important area of research that involves distinguishing between normal and pathological cries. Traditional feature sets, such as Short-Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCC) have shown limitations due to poor spectral resolution caused by quasi-periodic sampling in high pitch-source harmonics. To address this, we propose to use Constant-Q Cepstral Coefficients (CQCC), which leverage geometrically-spaced frequency bins for improved representation of the fundamental frequency (F0) and its harmonics for infant cry classification. Two datasets, Baby Chilanto and In-House DA-IICT, were employed to evaluate the proposed feature set. We compared the CQCC against state-of-the-art feature sets, such as MFCC and Linear Frequency Cepstral Coefficients (LFCC) using Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) classifiers, with 10-fold cross-validation. The CQCC-GMM architecture achieved relatively better accuracy of 99.8% on the Baby Chilanto dataset and 98.24% on the In-House DA-IICT dataset. This work demonstrates the effectiveness of CQCC's form-invariance over traditional STFT-based spectrograms. Additionally, it explores parameter tuning and the impact of feature vector dimensions. The study presents cross-database and combined dataset scenarios, yielding an overall performance improvement of 1.59%. CQCC's robustness was also evaluated under various signal degradation conditions, including additive babble noise at different Signal-to-Noise Ratios (SNR). The performance was further compared with other feature sets using statistical measures, including F1-score, J-statistics, and latency analysis for practical deployment. Lastly, CQCC's results were compared with existing studies on the Baby Chilanto dataset. | |
dc.format.extent | 4713 - 4726 | |
dc.identifier.citation | Patil, Hemant A, Aastha Kachhi, and Ankur T. Patil, "CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry," IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE, ISSN: 2329-9304, pp. 1-14, 27 Oc. 2023, doi: 10.1109/TASLP.2023.3325971 | |
dc.identifier.doi | 10.1109/TASLP.2023.3325971 | |
dc.identifier.issn | 2329-9304 | |
dc.identifier.scopus | 2-s2.0-85181556679 | |
dc.identifier.uri | https://ir.daiict.ac.in/handle/dau.ir/1561 | |
dc.identifier.wos | WOS:001346763000003 | |
dc.language.iso | en | |
dc.publisher | IEEE | |
dc.relation.ispartofseries | Vol. 32; No. | |
dc.source | IEEE/ACM Transactions on Audio, Speech | |
dc.source.uri | https://ieeexplore.ieee.org/document/10298803/ | |
dc.title | CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | fdb7041b-280e-498b-b2ee-34f9bc351f4c | |
relation.isAuthorOfPublication.latestForDiscovery | fdb7041b-280e-498b-b2ee-34f9bc351f4c |