Theses and Dissertations
Permanent URI for this collectionhttp://ir.daiict.ac.in/handle/123456789/1
Browse
12 results
Search Results
Item Open Access Explanations by Counterfactual Argument in Recommendation Systems(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Pathak, Yash; Rana, ArpitIn recent advances in the domains of Artificial Intelligence (AI) and MachineLearning (ML), complex models are used. Due to their complexity and approaches,they have black box type of nature and raise the question of a trustworthy for decisionprocess especially in the high cost decisions scenario. To overcome thisproblem, users of these systems can ask for an explanation about the decisionwhich can be provided by system in various ways. One way of generating theseexplanations is by the help of Counterfactual (CF) arguments. Although there is adebate on how AI can generate these explanations, either by Correlation or CausalInference, in Recommendation Systems (RecSys) the aim is to generate these explanationswith minimum Oracle calls and have near optimal length (eg., in termsof interactions) of provided explanations. In this study we analyze the nature ofCFs and different methods (eg., Model Agnostic approach, Genetic Algorithms(GA)) to generate them along with the quality measures. Extensive experimentsshow that the generation of CFs can be done through multiple approaches andselecting optimal CFs will improve the explanations.Item Open Access Anomalies Detection in Radon Time Series for Earthquake Prediction Using Machine Learning Techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Gorasiya, Raghav; Chaudhury, BhaskarRadioactive soil and water radon gas emission is a significant precursor to earthquakes.The meteorological parameters such as temperature, pressure, humidity,rainfall, and windspeed influence the radon gas emission from the medium suchas soil and water. In this study, radioactive soil radon gas has been investigatedfor earthquake prediction. Before the seismic events, radon gas emission is also affectedby seismic energies. These seismic energies are responsible for the changesinside the earth�s crust, which causes earthquakes on earth. Our focus in this workis first to predict the radon gas concentration using Machine Learning algorithmsand then identify anomalies before and after the seismic events using standardconfidence interval methods. We experimented with different machine learningmodels for the detailed comparative study of radon concentration predictions. Adataset is divided into different settings of training and testing data. Testing dataincludes the seismic samples only. The models are trained on non-seismic daysamples and some of the seismic day samples and tested on seismic day samples.After acceptable predictions, anomaly detection can be done on test data.A simple mean plus two standard deviations away test has been used to identifythe original measured radon values, which are out of this prediction confidenceinterval. These values are then considered as an anomalyItem Open Access Performance and power prediction on disparate computer systems(2020) Amrutiya, AdityaPerformance and Power prediction is an active area of research due to its applications in the advancements of hardware-software co-development. Several empirical machine-learning models such as linear models, tree-based models, neural network etc are used for evaluating the performance of machine learning models. Furthermore, the prediction model’s accuracy may differ depending on performance data collected for different software types (compute-bound, memorybound) and different hardware (simulation-based or physical systems).Our results for performance prediction show that the tree-based machine-learning models outperform all other models with median absolute percentage error (MedAPE) of less than 5% consisting of bagging and boosting models that help to improve weak learners. We have also observed that in physical systems, the prediction accuracy of memory-bound applications is higher as compared to compute-bound algorithms due to manufacturer variability in processors. Moreover, the prediction accuracy is higher on simulation-based hardware due to its deterministic nature as compared to physical systems. We have used transfer learning for solving two problems cross-platform prediction and cross-systems prediction. Our result shows the prediction error of 15% in case of cross-systems prediction whereas in case of the cross-platform prediction error of 17% for simulationbased X86 to ARM system using best performing tree-based machine-learning model. For the prediction of the power consumption along with that of performance we have employed several machines learning univariate or multivariate models in our experiments. Our result shows that runtime and power prediction accuracy of more than 80% and 90% respectively is achieved for multivariate deep neural network model in cross-platform prediction. Similarly, for cross-system prediction runtime accuracy of 90% and power accuracy of 75% is achieved for the multivariate deep neural network.Item Open Access Machine learning in financial data EPS estimates(2020) Sharma, Rohan; Joshi, M.V.The project “EPS Estimates” is as the name suggests a work on Earnings Per Share figures released by companies annually and quarterly. The whole project is intended to come up with a better consensus methodology for EPS Estimates given by different brokers and give the clients a better idea of what the EPS figures will be like. There are various statistical methods and machine learning models used for the purpose and a comparison is done between them in this report. The details about the intuition behind the models, their shortcomings and some insights behind them are included in this report.Item Open Access ML-based clients prioritization and ranking algorithm(2020) Sharma, Rajat; Sasidhar, P S KalyanKristal.AI is an AI-powered DigitalWealth Management Platform. It is one of the leading firms in the Fin-tech industry, which provide its customers a platform for wealth investments, It has a very well experienced committee for handling customers queries and also has an AI-driven advisory algorithm that recommends portfolios to the customers according to their profile. As the company has stepped into the AI-driven world, it wants to implement one AI-driven algorithm for it’s clients prioritization and Ranking, so that Relation Management team of the company can focus more on more potential users of the company’s platform rather than just hovering around users who may not be worth of time, as there are also users who just do the sign up for the sake of curiosity but do not want to enroll themselves as the authenticated clients of the company. To tackle this problem there is a need of one AI-based automated algorithm which filters the more potential users from the data and ranks them according to their likelihood of becoming the company’s authenticated Registered KYC approved client. I with the Data Science team of the company has tackled this problem by creating one Machine Learning based client prioritization and ranking algorithm that takes raw company’s data as input on a daily basis and generates a list of clients with their corresponding ranks in which they are to be followed, and for this, weeks of Exploratory Data Analysis had been done to select the crucial features and One Regression Model(Gaussian Process Regression) was created and optimized to give the desired output. This model gave an accuracy of about 82% and a precision of about 84% over the test set.Item Open Access Performance and power modeling on disparate computer systems using machine learning(2020) Kumar, Rajat; Mankodi, AmitPerformance and Power prediction is an active area of research due to its applications in the advancements of hardware-software co-development. We have performed experiments to evaluate the performance of several machine learning models. Our results for performance prediction show that the tree-based machine-learning models outperform all other models with median absolute percentage error (MedAPE) of less than 5% followed by bagging and boosting models that help to improve weak learners. We have collected performance data both from simulation-based hardware as well as from physical systems and observed that prediction accuracy is higher on simulation-based hardware due to its deterministic nature as compared to physical systems. Moreover, in physical systems, prediction accuracy of memory-bound applications is higher as compared to compute-bound algorithms due to manufacturer variability in processors. Furthermore, our result shows the prediction error of 15% in case of crosssystems prediction whereas in case of the cross-platform prediction error of 17% for simulation-based X86 to ARM prediction and 23% for physical Intel Core to Intel-Xeon system using best performing tree-based machine-learning model. We have employed several machine learning univariate or multivariate models for our experiments. Our result shows that runtime and power prediction accuracy of more than 80% and 90% respectively is achieved for multivariate deep neural network model in cross-platform prediction. Similarly, for cross-system prediction runtime accuracy of 90% and power accuracy of 75% is achieved for the multivariate deep neural network.Item Open Access VIU content access layer intelligent & flexible content selection(2020) Marakana, Meet; Banerjee, AsimFor OTT media streaming products like VIU, it is really important to increase the consumption of media content as much as possible. To get the highest benefit, the user must stay on the platform and consume numerous content. To survive in markets where too many competitors are there as the Indian market, this problem is essential to resolve. The problem is to increase the engagement time between the customer and platform, which can be solved by augmenting the content selection. To solve the problem, the company should customize its homepage in favour of user appealing content. Also, the system must behave dynamically as all users have a different preference. By executing this approach, we can improve the engagement time of the users, and hence solved our problem. CAL is the solution to our problem, and it manages all the issues that we had in the past. Now, the users will get the preferred content from the combination of various content selectors, which can select content based on user preference. Trending APIs, recommendation APIs, and BecauseYouHaveWatched APIs are known as content selectors which used for generating intelligent content selection for the user. We are trying to build a system that will give intelligent and flexible content selection. It aims for flexible consumption patterns. It supports plug and plays models for additional content selection algorithms which means no need for updating the system when new content selector service will join the system in the future. To provide the plug and play feature, the use of a discovery service is necessary. I have developed the content selector registry, which is a discovery service API. It manages the availability of the content selector that resides inside the Kubernetes cluster. Also, written a Google Cloud Function that will store the data to BigQuery by initiating the DataFlow. Later the Data of BigQuery will use to generate Insights and KPI metrics.Item Open Access Distributed TDMA scheduling in tree based wireless sensor networks with multiple data attributes and multiple sinks(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Vasavada, Tejas Mukeshbhai; Srivastava, SanjayData collection is an important application of wireless sensor networks. Sensors are deployed in given region of interest. They sense physical quantity like temperature, pressure, solar radiation, speed and many others. One or more sinks are also deployed in the network along with sensor nodes. The sensor nodes send sensed data to the sink(s). This operation is known as convergecast operation. Once nodes are deployed, logical tree is formed. Every node identi es its parent node to transmit data towards sink. As TDMA (Time Division Multiple Access) completely prevents collisions, it is preferred over CSMA (Carrier Sense Multiple Access). The next step after tree formation is to assign time slot to every node of the tree. A node transmits only during the assigned slot. Once tree formation and scheduling is done, data transfer from sensors to sink takes place. Tree formation and scheduling algorithms may be implemented in centralized manner. In that case, sink node executes the algorithms and informs every node about its parent and time-slot. The alternate approach is to use distributed algorithms. In distributed approach, every node decides parent and slot on its own. Our focus is on distributed scheduling and tree formation. Most of the researchers consider scheduling and parent selection as two di erent prob- lems. Tree structure constrains e ciency of scheduling. So it is better to treat scheduling and tree formation as a single problem. One algorithm should address both in a joint manner. We use a single algorithm to perform both i.e. slot and parent selection. The main contributions of this thesis are explained in subsequent paragraphs. In the rst place, we have addressed scheduling and tree formation for single-sink heterogeneous sensor networks. In a homogeneous network, all nodes are of same type. For example, temperature sensors are deployed in given region. Many applications require use of more than one types of nodes in the same region. For example, sensors are deployed on a bridge to monitor several parameters like vibration, tilt, cracks, shocks and others. So, a network having more than one types of nodes is known as heterogeneous network. If all the nodes of network are of same type, the parent selection is trivial. A node can select the neighbor nearest to sink as parent. In heterogeneous networks, a node may receive di erent types of packets from di erent children. To maximize aggregation, appropriate parent should be selected for each outgoing packet such that packet can be aggregated at parent node. If aggregation is maximized, nodes need to forward less number of packets. So, less number of slots are required and energy consumption would be reduced. We have proposed AAJST (Attribute Aware Joint Scheduling and Tree formation) algorithm for heterogeneous networks. The objective of the algorithm is to maximize aggregation. The algorithm is evaluated using simulations. It is found that compared to traditional approach of parent selection, the proposed algorithm results in 5% to 10% smaller schedule length and 15% to 30% less energy consumption during data transfer phase. Also energy consumption during control phase is reduced by 5%. When large number of nodes are deployed in the network, it is better to use more than one sinks rather than a single sink. It provides fault tolerance and load balancing. Every sink becomes root of one tree. If ner observations are required from a region, more number of nodes are deployed there. That is, node deployment is dense. But the deployment in other regions may not be dense because application does not require the same. When trees are formed, tree passing through the dense region results in higher schedule length compared to the one passing through the sparse region. Thus schedule lengths are not balanced. For example, trees are T1 and T2. Their schedule lengths are SH1 and SH2 respec- tively. Every node in tree Ti will get its turn to transmit after SHi time-slots. If there is a large di erence between SH1 and SH2, nodes of one tree (having large value of SHi) will wait for very long time to get turn to transmit compared to the nodes of the other tree (having small value of SHi). But if SH1 and SH2 are balanced, waiting time would be almost same for all the nodes. Thus schedule lengths should be balanced. Overall sched- ule length (SH) of the network can be de ned as max(SH1,SH2). If schedule lengths are balanced, SH would also be reduced. We have proposed an algorithm known as SLBMHM (Schedule Length Balancing for Multi-sink HoMogeneous Networks). It guides every node to join a tree such that the schedule lengths of resulting trees are balanced. Through simulations, it is found that SLBMHM results 13% to 74% reduction in schedule length di erence. The overall schedule length is reduced by 9% to 24% compared to existing mechanisms. The algorithm results in 3% to 20% more energy consumption during control phase. The control phase involves transfer of control messages for schedule length balancing and for slot & parent selection. The control phase does not take place frequently. It takes place at longer intervals. So, additional energy consumption may not a ect the network lifetime much. No change in energy consumption during data transmission phase is found. The schedule lengths may be unbalanced also due to di erence in heterogeneity levels of regions. For example, in one region, two di erent types of sensors are deployed. But in the other region, four di erent types of sensors are present. When heterogeneity is high, aggregation becomes di cult. As a result, more packets ow through the network. Thus schedule length of the tree passing through region of two types of nodes will have smaller schedule length than the tree passing through the region of four types of nodes. We have proposed an algorithm known as SLBMHT (Schedule Length Balancing for Multi-sink HeTerogeneous Networks). It is an extension of SLBMHM. The proposed algorithm is capable of balancing schedule lengths no matter whether imbalance is caused due to di erence in density or di erence in heterogeneity. It is also evaluated through simulations. It is found that the SLBMHT algorithm results in maximum upto 56% reduction in schedule length di erence, maximum upto 20% reduction in overall schedule length and 2% to 17% reduction in energy consumption per TDMA frame during data transfer phase. It results in maximum 7% more energy consumption during control phase. As control phase does not take place very frequently, increase in energy consumption during control phase can be balanced by reduction in energy consumption during data phase. As a result, network lifetime is going to increase.Item Open Access Crime information extraction from news articles(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Gohel, Prashant; Jat, P.M.In the modern era all news reportings are available in digital form. Most newsagencies put it on their website and are freely available. This motivates us totry extracting some information from online news reporting. While understandingnatural language text for information extraction is a complex task,we hopethat extracting information like crime type, crime location, and some profile informationof accused and victim should be feasible. In this work we pulled about1000 crime news articles from NDTV and Indian Express websites. Hand taggingwas done for crime location and crime types of all articles. Through this workwe show that a combination of LSTM and CNN based solution can be effectivelyused for extracting crime location. Using this technique we get 95.58 % precisionand 94.54 % recall. Further, determination of crime type, we found relatively easier.Through simple key word based classification approach we get 95% precision.We also tried out topic modeling for crime type extraction we do not gain any improvement,and we get 79 % precision. Keywords: crime related named entities,deep learning, neural network, LSTM, CNN, NER, NLPItem Open Access Set labeling of graphs(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kumar, Lokesh; Muthu, RahulGiven a universal set and its subsets, intersection graph can be characterized as the graph with one distinct subset of given universal set for each vertex of the graph and any two non-adjacent vertices have no element common in their respective set. This was first studied by Erdos. For Kneser graph and Petersen graph, adjacency is characterized by disjointness. This motivates us to look at disjointness instead of intersection. This report contains results about asymptotic bounds for valid labeling of some special classes of graphs such as harary graphs, split graphs, bipartite graphs, disjoint complete graphs and complete multipartite graphs. Parameters relevant to study of labeling of vertices of the graphs are minimum label size possible (ILN), minimum universe size possible (USN) and their uniform versions such as UILN and UUSN. We have also proposed one framework to label disconnected graphs.