IIUM Repository

An efficient algorithm to discover large and frequent itemset in high dimensional data

Zulkurnain, Nurul Fariza (2019) An efficient algorithm to discover large and frequent itemset in high dimensional data. Project Report. UNSPECIFIED. (Unpublished)

[img] PDF (research report)
Restricted to Registered users only

Download (2MB) | Request a copy

Abstract

The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory.

Item Type: Monograph (Project Report)
Additional Information: 4123/70312
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices > TK7885 Computer engineering
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Engineering > Department of Electrical and Computer Engineering
Depositing User: DR Nurul Fariza Zulkurnain
Date Deposited: 01 Dec 2019 08:05
Last Modified: 01 Dec 2019 08:05
URI: http://irep.iium.edu.my/id/eprint/70312

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year