Knowledge seeker's blog: CRM and data mining Day 08

Tuesday, April 17, 2012

The following figure shows data mining at the intersection of many disciplines.

The following figure shows different data types that we encounter in data mining.

The following figure shows a taxonomy of data mining.

The following figures shows the data mining process.

The following figures shows the data preparation process.

The following figures shows the data mining (SEMMA) process.

Split the data into 2 mutually exclusive sets training (~70%) and testing (30%).

For ANN, the data is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]).

The following figures shows the ROC curve.

Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value (e.g.

CART)

Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split (e.g.

ID3, C4.5, C5)

Algorithms are available for generating association rules.

The following figure shows biological versus artificial neural networks.

References

FIT5158 Monash University Lecture Notes, 2012
Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.
Efrain Turban et. al. (2011), Business Intelligence, A Managerial Approach, 2nd Ed, Prentice Hall.

Knowledge seeker's blog