Data blocking for partitioned data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Since last few years the data consumed and produced by various applications is increasing tremendously. This thesis aims to achieve faster query processing for this data. The overall work of the thesis is divided into three phases, data partitioning, data blocking, and data skipping. Data partitioning includes identifying hot and cold partitions of data and storing as separate data blocks. Partitioned data is stored contiguously on the disk and verified. Data blocking is storing the data blocks on disk such that all hot data blocks are stored together and all cold data blocks are stored together. Data skipping is performed in order to reduce the disk seek time while accessing the data form disk. Data partitioning and blocking is implemented on column oriented database system. Data blocking resulted in significant reduction in amount of data scanned and query response time. The results are obtained for query execution time on three different query categorization such as range queries, nested queries and aggregate queries. On an average for these three types of queries QET became 55 times faster for partitioned data. For the above query categorization data blocking and skipping on an average results in reduction of 97% data scan and hence by accelerates queries.