-
Book Overview & Buying
-
Table Of Contents
Apache Spark for Machine Learning
By :
Several algorithms have been developed to efficiently discover frequent itemsets and association rules, even in large datasets. However, Apache Spark provides two algorithms, which we will discuss next.
FP-Growth, short for Frequent Pattern Growth, is a robust data mining technique used to discover frequent patterns, associations, and relationships within large datasets. Unlike the Apriori algorithm, which is also used for FPM, FP-Growth requires only two scans of the transaction dataset, making it more efficient. It also does not need to generate candidate sets during frequent itemset generation, which saves computing resources and time.
The FP-Growth algorithm in Apache Spark has several valuable use cases. Let’s explore the top three: