Abstract
We propose kernel density decision trees (KDDTs), a novel fuzzy decision tree (FDT) formalism based on kernel density estimation that improves the robustness of decision trees and ensembles and offers additional utility. FDTs mitigate the sensitivity of decision trees to uncertainty by representing uncertainty through fuzzy partitions. However, compared to conventional, crisp decision trees, FDTs are generally complex to apply, sensitive to design choices, slow to fit and make predictions, and difficult to interpret. Moreover, finding the optimal threshold for a given fuzzy split is challenging, resulting in methods that discretize data, settle for near-optimal thresholds, or fuzzify crisp trees. Our KDDTs address these shortcomings by representing uncertainty intuitively through kernel distributions and by using a novel, scalable generalization of the CART algorithm for finding optimal partitions for FDTs with piecewise-linear splitting functions or KDDTs with piecewise-constant fitting kernels. KDDTs can improve robustness to random or adversarial perturbation, reduce or eliminate the need for ensemble methods, enable smooth regression with trees, and give access to gradient-based feature learning methods that can improve prediction performance and reduce model complexity.
Committee:
Artur Dubrawski (Chair)
Barnabas Poczos
Mario Berges
Nick Gisolfi