Partial matching and search space reduction for QbE-STD

Madhavi, Maulik C; Patil, Hemant

Publication:
Partial matching and search space reduction for QbE-STD

Date

01-09-2017

Authors

Madhavi, Maulik C

Patil, Hemant

Publisher

Elsevier

Abstract

Query-by-Example approach of spoken content retrieval has gained much attention because of its feasibility in the absence of speech recognition and its applicability in a multilingual matching scenario. This approach to retrieve spoken content is referred to as Query-by-Example Spoken Term Detection (QbE-STD). The state-of-the-art QbE-STD system performs matching between the frame sequence of query and test utterance via Dynamic Time Warping (DTW) algorithm. In realistic scenarios, there is a need to retrieve the query which does not appear exactly in the spoken document. However, the appeared instance of query might have the different suffix, prefix or word order. The DTW algorithm monotonically aligns the two sequences and hence, it is not suitable to perform partial matching between the frame sequence of query and test utterance. In this paper, we propose novel partial matching approach between spoken query and utterance using modified DTW algorithm where multiple warping paths are constructed for each query and test utterance pair. Next, we address the research issue associated with search complexity of DTW and suggest two approaches, namely, feature reduction approach and Bag-of-Acoustic-Words (BoAW) model. In feature reduction approach, the number of feature vectors is reduced by averaging across the consecutive frames within phonetic boundaries. Thus, a lesser number of feature vectors require fewer number of comparisons and hence, DTW speeds up the search computation. The search�computation time�gets reduced by�46�49% with a slight degradation in performance as compared to no feature reduction case. In BoAW model, we construct term frequency-inverse document frequency��vectors at segment-level to retrieve audio documents. The proposed segment-level BoAW model is used to match test utterance with a query using��vectors and the scores obtained are used to rank the test utterance. The BoAW model gave more than�80% recall value on�70% top retrieval. To re-score the detection, we further employ DTW search or modified DTW search to retrieve the spoken query from the selected utterances using BoAW model. QbE-STD experiments are conducted on different international benchmarks, namely, MediaEval spoken�web search�SWS 2013 and MediaEval query-by-example search on speech QUESST 2014.

Citation

Maulik C. Madhavi, and Patil, Hemant A, "Partial matching and search space reduction for QbE-STD," Computer Speech & Language, vol. 45, pp. 58-82, Sept. 2017. doi: 10.1016/j.csl.2017.03.004

URI

https://ir.daiict.ac.in/handle/dau.ir/1540

Collections

Journal Article

Full item page

Publication:
Partial matching and search space reduction for QbE-STD

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Partial matching and search space reduction for QbE-STD

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Partial matching and search space reduction for QbE-STD