Abstract
When learning is based on noisy data, the induced rule sets have a tendency to overfit the training data, and this degrades the performance of the resulting classifier. Consequently, the ability to tolerate noise is a necessity for robust, practical learning methods. Pruning is a common way of handling noisy data. This paper presents a new pruning technique built on the sound foundation of the minimum description length principle. The proposed pruning technique has the advantage that it does not require the set of examples employed for pruning to be distinct from the set used to build the rule set. The new technique is designed to improve the performance of the RULe Extraction System (RULES) family of inductive learning algorithms, but can be used for pruning rule sets created by other learning algorithms. It was tested in RULES-6, the latest algorithm in the family, and showed significant performance improvements.
Original language | English |
---|---|
Pages (from-to) | 1339-1352 |
Number of pages | 14 |
Journal | Institution of Mechanical Engineers. Proceedings. Part C: Journal of Mechanical Engineering Science |
Volume | 222 |
Issue number | 7 |
DOIs | |
Publication status | Published - 1 Jul 2008 |
Keywords
- data mining
- knowledge discovery
- pruning
- inductive learning
- machine learning
- rule induction
- minimum description length principle
- noise handling