Repository logo
Collections
Browse
Statistics
  • English
  • हिंदी
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Theses and Dissertations
  3. M Tech Dissertations
  4. Comparative Performance Analysis of Column Family Databases : Cassandra and HBase

Comparative Performance Analysis of Column Family Databases : Cassandra and HBase

Files

202111032.pdf (1.85 MB)

Date

2023

Authors

Sheth, Vinay

Journal Title

Journal ISSN

Volume Title

Publisher

Dhirubhai Ambani Institute of Information and Communication Technology

Abstract

Up until now, relational databases have been unquestionably the most prevalenttype of databases used to handle data. The advent of cloud computing and bigdata has underlined the need for databases that are capable of managing and analyzingbig data. By allowing storage and retrieval of structured as well as unstructureddata, NoSQL databases circumvent the limitations of relational databases.Because of their support for schema flexibility, rapid data access and potential toscale up quickly, they have emerged as the favored choice for big data processing.These systems have several properties/parameters which can be tuned to achievespecific performance goals based on business needs. Having well-defined performanceobjectives assist us in articulating the acceptable trade-offs for our application.This motivates us to evaluate the performance of one such frequently usedNoSQL system: Cassandra. Apache Cassandra is an open-source, decentralized,distributed, fault-tolerant, highly available, elastically scalable, tunably consistent,row-oriented database. In order to accomplish the performance evaluation,we use the Yahoo! Cloud Serving Benchmark (YCSB) for benchmarking efforts.Our findings highlight that increasing thread count initially improves throughputand CPU utilization but later decreases it. Higher record count, consistency level,and dataset size lead to decreased throughput and increased latency. Strongerconsistency level also increases the CPU utilization. Increasing operation countimproves throughput but increases latency as well. These findings provide guidancefor optimizing Cassandra�s performance by adjusting these parameters.We also assess Apache HBase, another well-known NoSQL database, using YCSB.The relative performance of these databases under analytical as well as updateheavyworkloads is the primary focus of our investigation. Our test results demonstratethat for both workloads, Cassandra outperforms HBase in read operations,whereas HBase excels in write operations. This research quantifies the performancetraits of Cassandra and HBase, assisting developers and architects in choosingthe best database system for their big data applications.

Description

Keywords

Relational databases, SQL database, Apache Cassandra, Big data

Citation

Sheth, Vinay (2023). Comparative Performance Analysis of Column Family Databases : Cassandra and HBase. Dhirubhai Ambani Institute of Information and Communication Technology. viii, 56 p. (Acc. # T01118).

URI

http://ir.daiict.ac.in/handle/123456789/1177

Collections

M Tech Dissertations

Endorsement

Review

Supplemented By

Referenced By

Full item page
 
Quick Links
  • Home
  • Search
  • Research Overview
  • About
Contact

DAU, Gandhinagar, India

library@dau.ac.in

+91 0796-8261-578

Follow Us

© 2025 Dhirubhai Ambani University
Designed by Library Team