CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction

Shah, Dhwanil; Shah, Krish; Jagani, Manan; Shah, Agam; Chaudhury, Bhaskar

Publication:
CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction

dc.contributor.affiliation	DA-IICT, Gandhinagar
dc.contributor.author	Shah, Dhwanil
dc.contributor.author	Shah, Krish
dc.contributor.author	Jagani, Manan
dc.contributor.author	Shah, Agam
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.author	Chaudhury, Bhaskar
dc.contributor.researcher	Shah, Dhwanil (201901450)
dc.contributor.researcher	Shah, Krish (201901465)
dc.contributor.researcher	Jagani, Manan (201901295)
dc.date.accessioned	2025-08-01T13:09:36Z
dc.date.issued	18-03-2024
dc.description.abstract	The COVID-19 Numerical Claims Open Research Dataset (CONCORD) is a comprehensive, open-source dataset that extracts numerical claims from academic papers on COVID-19 research. To extract numerical claims, a weak-supervision based model is employed, leveraging its white-box, explainable nature and advantages over transformer-based models in terms of computational and manual annotation costs. Labelling functions are used to programmatically generate labels, incorporating techniques like pattern matching, external knowledge bases, phrase matching, and third-party models. An aggregator function reconciles overlapping or contradictory labels. The weak-supervision model is evaluated against established baselines and transformer based models, achieving a weighted F1-score of 0.932 and micro F1-score of 0.930 in extracting numerical claims.While the weak-supervision model showcases superior performance compared to baseline models, it is observed that transformer-based models achieve comparable results.CONCORD, comprising around 200,000 numerical claims extracted from over 57,000 COVID-19 research articles, serves as a valuable tool for knowledge discovery and understanding the chronological developments in various research areas associated with COVID-19. In conclusion, CONCORD, alongside the weak-supervision methodology, offers researchers a valuable resource, enhancing advancements in COVID-19 research while highlighting the significant potential of weak-supervision models within the broader biomedical domain.
dc.identifier.citation	Dhwanil Shah, Krish Shah, Manan Jagani, Agam Shah, and Chaudhury, Bhaskar, "CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction," Research Square, ISSN: 2693-5015, 18 Mar. 2024, doi: 10.21203/rs.3.rs-4076902/v1. [Preprint]
dc.identifier.doi	10.21203/rs.3.rs-4076902/v1
dc.identifier.issn	2693-5015
dc.identifier.scopus	2-s2.0-85204292352
dc.identifier.uri	https://ir.daiict.ac.in/handle/dau.ir/2066
dc.identifier.wos	WOS:001314794600001
dc.language.iso	en
dc.publisher	Research Square
dc.source	Research Square
dc.source.uri	https://www.researchsquare.com/article/rs-4076902/v1
dc.title	CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction
dspace.entity.type	Publication
relation.isAuthorOfPublication	d0ffe8b6-980b-4a74-bb54-7408522e6da7
relation.isAuthorOfPublication	d0ffe8b6-980b-4a74-bb54-7408522e6da7
relation.isAuthorOfPublication.latestForDiscovery	d0ffe8b6-980b-4a74-bb54-7408522e6da7

Collections

Journal Article

Publication: CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction

Files

Collections

Publication:
CONCORD: Enhancing COVID-19 Research with Weak-Supervision based Numerical Claim Extraction