- machine learning, data mining
- classification, pattern recognition
- interaction, dependence, dependency
- independence, independence assumption
- constructive induction, feature construction
- feature selection, attribute selection, myopic, information gain
- naive Bayes, simple Bayes
- naive Bayesian classifier, simple Bayesian classifier
- information theory, entropy, relative entropy, mutual information

A. Jakulin, I. Bratko,
**"Quantifying and Visualizing Attribute Interactions."**

*Working paper (Nov. 2003): [ARXIV] [PDF]
*

A. Jakulin, G. Leban, **"Interactive Interaction Analysis."**
Proceedings A of the 6th Information Society Conference (IS 2003), Ljubljana, Slovenia, October 13-17, 2003.
*Paper (in Slovene): [PDF (Slovene)] Presentation:[PPT]*

A. Jakulin, I. Bratko, **"Analyzing Attribute Dependencies."** Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), Cavtat-Dubrovnik, Croatia, September 22-26, 2003.
N. Lavrac, D. Gamberger, H. Blockeel, L. Todorovski (Eds.) Lecture Notes in Artificial Intelligence, Vol. 2838, Springer. Pp. 229-240.

*Preprint (Apr.03): [ps.gz][pdf]
Final version: SpringerLink
Presentation: [PPT]*

A. Jakulin, I. Bratko, D. Smrke, J. Demsar and B. Zupan,
**"Attribute Interactions in Medical Data Analysis."**
Proceedings of the 9th Conference on Artificial Intelligence in Medicine in Europe (AIME 2003), Protaras, Cyprus, October 18-22, 2003. M. Dojat, E. Keravnou, P. Barahona (Eds.) Lecture Notes in Artificial Intelligence, Vol. 2780, Springer. Pp. 229-238.

*Preprint (Mar. 2003): [ps.gz] [pdf]Final version: SpringerLink
Presentation: [PPT]*

A. Jakulin, **"Attribute Interactions in Machine Learning."** Master's thesis, University of Ljubljana, December 2002. [outdated]

- full [ps.gz - 480k] [pdf - 1300k]
- Presentations:
- english [pdf - 290k]
- slovene [pdf - 540k] video

The examples are illustrations of the CMC dataset from UCI, where the attributes inform us about the demographic and socioeconomical characteristics of a couple, while the label describes the contraception method used.

**A positive interaction:***Wife age*attribute singly eliminates 3.33% of the uncertainty,*Number of children*attribute singly eliminates 5.82% of the uncertainty about the label. If we assume these two attributes are dependent, and treat them holistically (Cartesian product, classification tree, but not, e.g., naive Bayesian classifier, linear or logistic regression, linear SVM), we eliminate an additional 1.85% of label uncertainty. We say that they interact positively, since they are synergistic.**A negative interaction:***Husband education*attribute singly eliminates 2.60% of the uncertainty,*Wife education*attribute singly eliminates 4.60% of the uncertainty about the label. These two attributes are partly redundant, as they both provide a shared 1.15% of label information. To estimate the uncertainty, we should be careful not to overcount the information or evidence. This problem is solved in three ways: feature selection, feature weighting (e.g. linear SVM, logistic regression), and the assumption of dependence (tree-augmented naive Bayesian classifier, Bayesian networks).