It is not enough to say that something is true just because ‘I know it’s true!’ – we have to have some evidence or argument that gives a justification for our belief. Explanations, justifications, and more broadly epistemology have been the focus of philosophy for thousands of years. For Plato, being puzzled and therefore wanting to be unpuzzled is the origin of philosophy.
Theaetetus: Yes, Socrates, and I am amazed when I think of them; by the Gods I am! And I want to know what on earth they mean; and there are times when my head quite swims with the contemplation of them.
And, here we are, 2500 years later, longing for clarity, in something we have created – artificial intelligence.
The “Explanation” in AI aims to enable human users to understand the reasons behind the predictions. Comprehending the reasons behind predictions is fundamental if one plans to take action based on a prediction. There are three essential needs for explainability:
- Explain to Justify – For users to understand and trust. Why did the AI system do that? Why didn’t the AI system do something else?
- Explain to Control – To comply and sustain. In fields like medical, defense, judiciary, education etc., models must be strictly answerable for their predictions. You need to be able to explain so as to comply with accountability and regulatory requirements. When you can control, you can sustain performance and prevent things from going wrong. Bias is potentially present in any dataset and explainability help you to identify and mitigate bias. And, for Debugging/Troubleshooting.
- Explain to Improve – To Iterate and improve performance. When you understand the underlying mechanics of that technique, you will know the potential pitfalls associated with it and how to improve. Explanations help in continuous optimizations for better decision making.
Explanations make AI – fair, robust, certifiable, ethical, privacy-preserving, and human interpretable. Explainability enables human users to effectively manage the AI systems.
However, what makes machine learning algorithms excellent predictors, also makes them difficult to understand. They look like “black boxes” to most users.
XAI aims to “produce more explainable models while maintaining a high level of learning performance (prediction accuracy); and enable human users to understand, appropriately, trust, and effectively manage the emerging generation of artificially intelligent partners” – DARPA. It aims to answer the following questions:
- Why a particular prediction was made, as opposed to others?
- Will the model always do that?
- When does it fail & why? When to disregard the model output or how to fix the error?
Before we go into how XAI is achieved, lets discuss how it is delivered, in what form and what is the scope of explainability.
Type of Explanations
The purpose of the explanations is to allow affected parties, regulators, and other non-insiders to understand, discuss, and potentially contest decisions made by black-box algorithmic models. An explanation system can provide explanations mainly in multiple forms:
- Why-type explanations to describe why a result was generated for particular input. Such explanations aim to communicate what features in input data or what logic in the model resulted in a given machine output.
- Contrastive Explanations that highlight not only the pertinent positives but also the pertinent negatives. For example in medicine, a patient showing symptoms of cough, cold and fever, but no sputum or chills, will most likely be diagnosed as having flu rather than having pneumonia.
- Explanations by Example: Explaining the decisions of a model might be to report other examples the model considers to be most similar.
- What If Explanations is an interactive approach to help user explore what would the system do if a particular input was different. In this approach, user can simulate a few changes to explore the change in output. For example, checking if one’s mortgage request would still be declined if the requester was a male.
- How to Explanations is another interactive approach to help user explore how to get the system to produce a chosen output value. The goal is to provide user with the hypothetical input conditions to produce that output. For example, finding out what annual income would have helped to get the mortgage request approved.
Why-type explanations, contrastive explanations and explanations by exampleaddress evaluation (how a system works), while what if and how to explanations address the curiosity (build a mental model of system).
How Explanations are Delivered
An explanation system can deliver explanations to humans in various forms:
Verbal/Textual Explanations
Humans often justify decisions verbally. Verbal explanations describe the machine learning model and reasoning with words, text, or natural language. Verbal explanations are popular in applications like question answering explanations, decision lists, and explanation interfaces.
This form of explanation has also been implemented in recommendation systems and robotics. An example technique of building verbal explanation is by training one model to generate predictions and a separate model, such as a recurrent neural network language model, to generate an explanation.
Visual Explanations
Visual explanations use visual elements to describe the reasoning behind the machine learning models. Visual explanations include visualizations of learned model parameters, evaluation metrics, computational graphs, data-flow graphs etc.
Visualizations can be in form of scatter plots, line charts, heat-maps, node-link diagrams, hierarchical (decision trees) etc. For example, line charts for temporal metrics, heat-maps that overlay images to highlight important regions that contribute towards classification and their sensitivity, synthetic images that are representative of what the model has learned about the chosen features, visual back-propagation to visualize which parts of an image have contributed to the classification, visualizing CNN filters, prediction difference analysis to highlight features in an image to provide evidence for or against a certain class and more.
Scope of Explainability
Does the interpretation method explain the entire model behavior or an individual prediction? Or is the scope somewhere in between? An explanation can be either at a global level or local level.
Global level
This is all about trying to understand “How does the model as a whole make predictions?” It is all about being able to explain the entire reasoning leading to all the different possible outcomes.
Understanding of conditional interactions between the response variable(s) and the predictor features on the complete dataset provides the first measure in assessing the trust of any model. The trust is established when the list of important variables is consistent with domain expectations and can also stay stable, with slight data variations.
Even though a multitude of techniques (such as decision trees, rule lists, feature importance, etc.) can be used to enable global interpretability, analyzing and explaining the feature interactions (for model decisions) when there are more than three or four features is quite difficult.
This class of methods is helpful to explain population level decisions, such as alcohol consumption trends or ML course enrollment trend.
Local level
This is all about providing justification for a single prediction. It is about understanding “Why did the model make specific decisions for a single instance?” and “Why did the model make specific decisions for a group of instances?”.
Local explanations identify the specific variables that contributed to an individual decision. For example, when trying to explain why a machine learning algorithm declined mortgage to an individual.
Although global interpretability allows you to verify the hypotheses and whether the model is overfitting to noise, it is hard to diagnose specific model predictions. Local interpretability, on the other hand, tries to answer: why was this prediction made or which variables caused the prediction?
Several techniques such as LIME, LOCO, Anchors, Saliency maps, Local Explanation Vectors etc. can be used to enable local interpretability.
As you can see, XAI is not an AI that can explain itself, it is a design decision in implementation. In the next posts, I will go over the approach of designing XAI, model dependency of XAI, details of inner workings of various techniques as well as how explainability is evaluated.
Check out ‘Part 1 – XAI, the third wave of AI’ which covers the need and importance of explainable intelligence in case you haven’t read it yet!
And, ‘Part 2 – The Illusion of Free Will’ which contrasts us demanding machines to explain why they did what they did vs. our ability to explain our choices.