Publication:
CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry

dc.contributor.affiliationDA-IICT, Gandhinagar
dc.contributor.authorPatil, Hemant
dc.contributor.authorKachhi, Aastha
dc.contributor.authorPatil, Ankur T
dc.contributor.researcherPatil, Ankur T (201621008)
dc.contributor.researcherKachhi, Aastha
dc.date.accessioned2025-08-01T13:09:02Z
dc.date.issued27-10-2023
dc.description.abstractInfant cry classification is an important area of research that involves distinguishing between normal and pathological cries. Traditional feature sets, such as Short-Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCC) have shown limitations due to poor spectral resolution caused by quasi-periodic sampling in high pitch-source harmonics. To address this, we propose to use Constant-Q Cepstral Coefficients (CQCC), which leverage geometrically-spaced frequency bins for improved representation of the fundamental frequency (F0) and its harmonics for infant cry classification. Two datasets, Baby Chilanto and In-House DA-IICT, were employed to evaluate the proposed feature set. We compared the CQCC against state-of-the-art feature sets, such as MFCC and Linear Frequency Cepstral Coefficients (LFCC) using Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) classifiers, with 10-fold cross-validation. The CQCC-GMM architecture achieved relatively better accuracy of 99.8% on the Baby Chilanto dataset and 98.24% on the In-House DA-IICT dataset. This work demonstrates the effectiveness of CQCC's form-invariance over traditional STFT-based spectrograms. Additionally, it explores parameter tuning and the impact of feature vector dimensions. The study presents cross-database and combined dataset scenarios, yielding an overall performance improvement of 1.59%. CQCC's robustness was also evaluated under various signal degradation conditions, including additive babble noise at different Signal-to-Noise Ratios (SNR). The performance was further compared with other feature sets using statistical measures, including F1-score, J-statistics, and latency analysis for practical deployment. Lastly, CQCC's results were compared with existing studies on the Baby Chilanto dataset.
dc.format.extent4713 - 4726
dc.identifier.citationPatil, Hemant A, Aastha Kachhi, and Ankur T. Patil, "CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry," IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE, ISSN: 2329-9304, pp. 1-14, 27 Oc. 2023, doi: 10.1109/TASLP.2023.3325971
dc.identifier.doi10.1109/TASLP.2023.3325971
dc.identifier.issn2329-9304
dc.identifier.scopus2-s2.0-85181556679
dc.identifier.urihttps://ir.daiict.ac.in/handle/dau.ir/1561
dc.identifier.wosWOS:001346763000003
dc.language.isoen
dc.publisherIEEE
dc.relation.ispartofseriesVol. 32; No.
dc.sourceIEEE/ACM Transactions on Audio, Speech
dc.source.urihttps://ieeexplore.ieee.org/document/10298803/
dc.titleCQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry
dspace.entity.typePublication
relation.isAuthorOfPublicationfdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscoveryfdb7041b-280e-498b-b2ee-34f9bc351f4c

Files

Collections