Tuesday, April 17, 2012

CRM and data mining Day 08

The following figure shows data mining at the intersection of many disciplines.

The following figure shows different data types that we encounter in data mining.




The following figure shows a taxonomy of data mining.



The following figures shows the data mining process.


The following figures shows the data preparation process.



The following figures shows the data mining (SEMMA) process.



Split the data into 2 mutually exclusive sets training (~70%) and testing (30%).


For ANN, the data is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]).

The following figures shows the ROC curve.


Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value (e.g.
 CART)



Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split (e.g. 
ID3, C4.5, C5)


Algorithms are available for generating association rules.
  1. Apriori
  2. Eclat
  3. FP-Growth
  4. Derivatives and hybrids of the three

The following figure shows biological versus artificial neural networks.



References
  1. FIT5158 Monash University Lecture Notes, 2012
  2. Andy Oppel (2011), Database Demystified, 2nd Ed, McGraw-Hill.
  3. Efrain Turban et. al. (2011), Business Intelligence, A Managerial Approach, 2nd Ed, Prentice Hall.

1 comment:




  1. That's interesting! Can you please share more about it? Thank you.



    Data Mining

    ReplyDelete

Mounting USB drives in Windows Subsystem for Linux

Windows Subsystem for Linux can use (mount): SD card USB drives CD drives (CDFS) Network drives UNC paths Local storage / drives Drives form...