Algebraic Aggregation of Random Forests

Julian Hatwell
Aug 10, 2023

In my paper, “CHIRPS: Explaining random forest classification”, I took an empirical approach to addressing model transparency by extracting rules that make Random Forest (RF) models more interpretable. Importantly, this was done without sacrificing the high levels of accuracy achieved by the high-performing RF models.

The recently published “Algebraic aggregation of random forests: towards explainability and rapid evaluation” by Gossen and Steffen provides a theoretical counterpart, offering essential proofs and a mathematical framework for achieving explainability with RF models.

While my paper focused on simplifying complex models by rule extraction on a per instance basis, this subsequent work introduces Algebraic Decision Diagrams (ADDs) to aggregate Random Forests, optimizing their structure and enhancing interpretability at the model level. Both papers aim to improve model transparency, though by different means: my approach is empirical, leveraging rule extraction to clarify black-box models, whereas the latter introduces algebraic methods to combine decision trees into efficient, understandable diagrams.

The mathematical concepts in Gossen and Steffen’s paper, such as path reduction and algebraic operations, support model simplification. Importantly, the authors provide formal proofs that this aggregation retains the original model’s accuracy. This complements the practical focus in my paper, where the goal was also to maintain accuracy while increasing explainability.

Ultimately, the two papers reach the same destination—improving transparency of RF models—but by different routes. While my paper uses rule extraction to bring clarity to complex models, the subsequent work constructs a theoretical basis using algebraic tools, providing formal assurances to the outcomes I demonstrated empirically. Together, they offer complementary perspectives on making RF models more understandable and efficient.

Tags:

Categories: