![]() This is followed by a “ Dataset and system parameters used” section, the “ Results and discussion”, and finally the “ Conclusion”. The “ Association rule mining” section presents terminology used in association rule mining the “ Hadoop and MapReduce” section presents the Hadoop framework and concept of MapReduce the “ Related works” section presents previous similar works done in this area the “ Experimental design” section presents the design used in Hadoop’s parallel and distributed environment. The rest of the paper is organized as follows. Given the fact that repeated scans of the dataset are needed in the Apriori algorithm, the parallel and distributed structure of Hadoop should be availed of in an optimized way for mining positive as well as negative association rules in big data using the Apriori algorithm. In this paper we present an architecture for positive as well as negative association rule mining in the big data environment using Hadoop’s MapReduce environment using frequent itemset mining. Negative association rule mining also has many applications, including the building of efficient decision support systems, in crime data analysis, in the health care sector, etc. ![]() Negative association rules can be defined as items that are negatively correlated, that is, if one item goes up, the other item goes down. Though the classic application of positive association rule mining is market basket analysis, applications of positive rule mining have been extended to a wide range of areas like biological datasets, web-log data, fraud detection, census data, etc. Positive association rule mining finds items that are positively related to one another, that is, if one item goes up, the related item also goes up. Traditional association rule mining algorithms, like Apriori, mostly mine positive association rules. ![]() Figure 1 presents a flow chart of how the Apriori algorithm works. The process of generating the frequent itemsets calls for repeated full scans of the database, and in this era of big data, this is a major challenge of this algorithm. Then the itemsets are checked against a minimum confidence level to determine the association rules. This process goes on until the newly generated itemset is an empty set, that is, until there are no more itemsets that meet the minimum support threshold. These are then used to find the frequent 2-itemsets. Itemsets that satisfy the minimum support threshold are kept. To find the frequent itemsets, first the set of frequent 1-itemsets are found by scanning the database and accumulating their counts. The Apriori algorithm employs an iterative approach where k-itemsets are used to explore ( k + 1) itemsets. Using the Apriori algorithm, we find frequent patterns, that is, patterns that occur frequently in data. The Apriori algorithm is one of the most commonly used algorithms for association rule mining. In today’s big data environment, association rule mining has to be extended to big data. Association rule mining, originally developed by, is a well-known data mining technique used to find associations between items or itemsets.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |