Explaining Random Forests with Boolean Satisfiability

Julian Hatwell
Jun 21, 2021

The paper “On Explaining Random Forests with SAT” uses Boolean satisfiability (SAT) methods to provide a formal framework for generating explanations of Random Forest (RF) predictions. A key result in the paper is that abductive explanations (AXp) and contrastive explanations (CXp) can be derived by encoding the RF’s decision paths into propositional logic.

Encoding a decision path as propositional logic, is an entirely reasoned approach and quite straightforward, as I showed in my paper CHIRPS: Explaining random forest classification. The decision paths of an RF model can be transformed into a Boolean formula in Conjunctive Normal Form (CNF). For example, each decision tree in the forest is represented as a set of clauses. Following the paths for a single example prediction essentially carves out a region of the feature space with a set of step functions, resulting in a sub-region that must return the target response. When the clauses of this step functions set correspond to a subset of the features, a change in the remaining feature inputs has no effect on the model prediction. This subset is a prime implicant (PI) explanation.

A PI-explanation is a minimal subset of features that are sufficient to guarantee a particular prediction made by a machine learning model. It represents the smallest set of conditions that, if held constant, would lead to the same classification result. Essentially, it’s the most concise explanation of why the model arrived at its decision, highlighting the critical features responsible for that prediction. In fact, my own research centred finding soft PI-explanations, and revealing the limits where they no longer hold true for extreme outliers and unusual examples.

The authors of this paper show that finding AXp and CXp by this PI-explanation method reduces to solving a SAT problem and is therefore NP-hard but can be polynomial under specific conditions. This insight into the problem complexity is significant because it establishes that generating explanations is feasible when those assumptions are met and opens up the method to practical applications with real-world data.

Overall, the SAT-based methodology enables a structured, efficient way to uncover the decision-making process of Random Forests, ensuring that their predictions are not just accurate but also explainable, which is crucial for domains requiring transparency like healthcare and finance.

Tags:

Categories: