Explainable AI for Improved Heart Disease Prediction
The paper “Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction” focuses on explaining machine learning models in healthcare, similar to my original work in “Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences”. The newer paper combines a novel Bayesian method to optimally tune the hyper-paremeters of ensemble models such as AdaBoost, XGBoost and Random Forest and then applies the now well established SHAP method to assign Shapley values to each feature. The authors use their method to analyse three heart disease prediction datasets, included the well-known Cleveland set used as a benchmark in many ML research papers.
SHAP (Lundberg and Lee) came hot on the heels of the revolutionary LIME method (Ribeiro, Singh and Guestrin), which together delivered a paradigm shift in the usefulness and feasibility of eXplainable Artificial Intelligence (XAI). In fact, LIME was published at exactly the time I was becoming interested in the topic of XAI and served as inspiration for my own Ph.D journey. Both methods fall into the category of Additive Feature Attribution Methods (AFAM) and work by assign a unitless value to each level of the set of input features. The main benefits of AFAM become clear when viewing a beeswarm plot of their responses across a larger dataset, such as the whole training data. Patterns emerge showing which input variables affect the response variable most strongly, and in which direction. This usage is much more sophisticated than classic variable importance plots, which lack direction and mathematical guarantees offered by SHAP.
In the clinical setting, these mathematical guarentees mean that the resulting variable sensitivity information could be used to create a broader diagnostic tool. However, while this approach can provide a general understanding of which variables drive a model’s predictions, it lacks the fine-grained, instance-specific clarity offered by perfect fidelity, decompositional methods.
On the other hand, my original method Ada-WHIPS (firmly within the decompositional methods category) enhances interpretability in clinical settings by providing direct, case-specific explanations, making it a powerful tool for clinicians needing detailed transparency for patient-specific decision-making. Given the choice of an AdaBoost model (or a Gradient Boosted Model, or a Random Forest), it makes sense to use an XAI method that is highly targeted to these decomposable ensembles. Ada-WHIPS digs deep into the internal structure of AdaBoost models, redistributing the adaptive classifier weights generated during model training (and therefore a function of the training data distribution) to extract interpretable rules at the decision node level.
One area where Ada-WHIPS could benefit from the techniques in the new paper is the use of Bayesian methods to tune hyperparameters. Their approach potentially leads to improved model accuracy, a crucial factor in high-stakes environments like healthcare and “juicing up” the model internals for greater accuracy in the generated decision nodes. However, the paper appears to omit any detail about how this approach is deployed. This omission is indeed a great pity because, from what I understood, the Bayesian parameter selection was actually the authors’ novel contribution (the use of ensembles and SHAP on these particular datasets being nothing particularly new).
In conclusion, the SHAP-based approach offers valuable insights at a macro level, the new paper boasts improvements in model accuracy through Bayesian tuning, and my Ada-WHIPS method’s per-instance clarity and actionable insights should prove practical in scenarios where clinicians require detailed explanations of specific cases. I would be delighted to see some confluence of the three ideas, so that the benefits from each can combine and reinforce the use of highly targeted explainability in clinical applications.