The following figure shows different data types that we encounter in data mining.
The following figure shows a taxonomy of data mining.
The following figures shows the data mining process.
The following figures shows the data preparation process.
The following figures shows the data mining (SEMMA) process.
Split the data into 2 mutually exclusive sets training (~70%) and testing (30%).
For ANN, the data is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]).
The following figures shows the ROC curve.
Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value (e.g.
CART)
Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split (e.g.
ID3, C4.5, C5)
Algorithms are available for generating association rules.
- Apriori
- Eclat
- FP-Growth
- Derivatives and hybrids of the three
The following figure shows biological versus artificial neural networks.
References
- FIT5158 Monash University Lecture Notes, 2012
- Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.
- Efrain Turban et. al. (2011), Business Intelligence, A Managerial Approach, 2nd Ed, Prentice Hall.
ReplyDeleteThat's interesting! Can you please share more about it? Thank you.
Data Mining