Machine Learning Interpretability: A Survey on Methods and Metrics
<p>Google Trends comparison between the terms “machine learning interpretability” and “machine learning explainability” from January 2004 until May 2019. The vertical axis refers to the relative popularity, where 100 is the popularity peak for the term.</p> "> Figure 2
<p>Explainable machine learning pipeline.</p> ">
Abstract
:1. Introduction
2. Context
2.1. Relevance
2.2. Field Emergence
2.3. Historical Context
2.4. Awareness
2.4.1. Public Awareness
2.4.2. Industry Awareness
2.5. Science and Research Emergence
- Data Science—ML algorithms are data hungry, as their predictive performance is directly proportional to the quality and quantity of data they train on. More accurate predictions can lead to more accurate explanations. Moreover, one can argue the backward path that starts in the prediction results and that aims to produce better explanations is data dependent. Therefore, Dat Science, which encompasses Machine Learning, is a key component in the interpretability process.
- Human Science—In order to reach human interpretability, one should first study and model how humans produce and understand explanations between each other and which properties make explanations perceivable to humans.
- Human Computer Interaction (HCI)—the user comprehension and trust in the system is dependent on the process of interaction between the aforementioned entities. Taking into account that HCI’s fundamental interest is to empower the user and to prioritize the user’s perception [17], knowledge from this field can help in developing interpretable systems, especially in aiming for more interpretable visualizations. With the help of cognitive science and psychology, it would be even better.
2.6. Terminology
3. Motivation and Challenges
3.1. Interpretability Requirement
3.2. High-Stakes Decisions Impact
3.3. Societal Concerns and Machine Learning Desiderata
- Fairness—Ensure that predictions are unbiased and do not implicitly or explicitly discriminate against protected groups. Explanations help humans to judge whether the decision is based on a learned demographic (e.g., racial) bias.
- Privacy—Ensure that sensitive information in the data is protected.
- Reliability/Robustness—Ensure that small changes in the input do not cause large changes in the prediction.
- Causality—Ensure that only causal relationships are picked up. Deeming causality as the measurement of mapping technical explainability with human understanding, it plays an utterly important role to ensure effective interactions between humans and ML systems. Furthermore, Holzinger et al. [82] propose the notion of causability as the degree to which an explanation to a human expert achieves a specified level of causal understanding with effectiveness, efficiency, and satisfaction in a specified context of use. In other words, causability is the property of the human to understand the system explanations [83]. Consequently, the key to effective human–AI interaction is an efficient mapping of explainability (in the sense of a technical explanation that explain the results of a system) with causability, this being one of the most important end goals of (explainable) ML systems.
- Trust—It is easier for humans to trust a system that explains its decisions rather than a black box that just outputs the decision itself.
3.4. Regulation
3.5. The Interpretability Problem
- Safety—because the system is never completely testable, as one cannot create a complete list of scenarios in which the system may fail.
- Ethics—because the human notion of, e.g., fairness can be too abstract to be entirely encoded into the system.
- Mismatched objectives—because the algorithm may be optimizing an incomplete objective, i.e., a proxy definition of the real ultimate goal.
- Multi-objective trade-offs—Two well-defined desiderata in ML systems may compete with each other, such as privacy and prediction quality [90].
3.6. Towards the Success of AI
4. Literature Review
4.1. Interpretability
4.2. Importance of Interpretability in Machine Learning
4.3. Taxonomy of Interpretability
4.3.1. Pre-Model vs. In-Model vs. Post-Model
4.3.2. Intrinsic vs. Post Hoc
4.3.3. Model-Specific vs. Model-Agnostic
4.3.4. Results of Explanation Methods
- Feature summary—Some explanation methods provide summary statistics for each feature. This can be, e.g., a single number per feature, such as feature importance.Most of the feature summary statistics can be visualized as well. Some of the feature summaries are only meaningful if they are visualized, making no sense to present them in other ways, e.g., partial dependence plots are not intuitive if presented in tabular format.
- Model internals—This is the explanation output of all intrinsically interpretable models. Some methods’ outputs are both model internals and summary statistics, such as the weights in linear models. Interpretability methods that output model internals are, by definition, model-specific.
- Data point—There are methods that return data points (already existent or not) to make a model interpretable. These are example-based methods. To be useful, explanation methods that output data points require that the data points themselves are meaningful and can be interpreted. This works well for images and texts but is less useful for, e.g., tabular data with hundreds of features.
- Surrogate intrinsically interpretable model—Another solution for interpreting black box models is to approximate them (either globally or locally) with an intrinsically interpretable model. Thus, the interpretation of the surrogate model will provide insights of the original model.
4.4. Scope of Interpretability
4.4.1. Algorithm Transparency
4.4.2. Global Model Interpretability
On a Holistic Level
On a Modular Level
4.4.3. Local Model Interpretability
For a Single Prediction
For a Group of Predictions
4.5. Explanations
- Non-pragmatic theory of explanation: The explanation should be the correct answer to the why-question.
- Pragmatic theory of explanation: The explanation should be a good answer for an explainer to give when answering the why-question to an audience.
- The cognitive dimension is related to knowledge acquisition and involves deriving the actual explanation by a process of abductive inference, meaning that, first, the causes of an event are identified and, then, a subset of these causes are selected as the explanation.
- The social dimension concerns the social interaction, in which knowledge is transferred from the explainer to the explainee (the person for which the explanation is intended and produced). The primary goal is that the explainee receives enough information from the explainer in order to understand the causes of some event or decision. The explainer can be either a human or a machine.
4.5.1. Properties of Explanation Methods
- Expressive power—It is the language or structure of the explanations the method is able to generate. These could be, e.g., rules, decision trees, and natural language.
- Translucency—It represents how much the explanation method relies on looking into the inner workings of the ML model, such as the model’s parameters. For example, model-specific explanation methods are highly translucent. Accordingly, model-agnostic methods have zero translucency.
- Portability—It describes the range of ML models to which the explanation method can be applied. It is inversely proportional to translucency, meaning that highly translucent methods have low portability and vice-versa. Hence, model-agnostic methods are highly portable.
- Algorithmic complexity—It is related to computational complexity of the explanation method. This property is very important to consider regarding feasibility, especially when computation time is a bottleneck in generating explanations.
4.5.2. Properties of Individual Explanations
- Accuracy—It is related to the predictive accuracy of the explanation regarding unseen data. In some cases where the goal is to explain what the black box model does, low accuracy might be fine if the accuracy of the machine learning model is also low.
- Fidelity—It is associated with how well the explanation approximates the prediction of the black box model. High fidelity is one of the most important properties of an explanation because an explanation with low fidelity is essentially useless.Accuracy and fidelity are closely related: if the black box model has high accuracy and the explanation has high fidelity, the explanation consequently has high accuracy. Moreover, some explanations only provide local fidelity, meaning that the explanation only approximates well to the model prediction for a group or a single instance.
- Consistency—Regarding two different models that have been trained on the same task and that output similar predictions, this property is related to how different are the explanations between them. If the explanations are very similar, the explanations are highly consistent.However, it is noteworthy that this property is somewhat tricky [70], since the two models could use different features but could get similar predictions, which is described by the “Rashomon Effect” [108]. In this specific case, high consistency is not desirable because the explanations should be very different, as the models use different relationships for their predictions. High consistency is desirable only if the models really rely on similar relationships; otherwise, explanations should reflect the different aspects of the data that the models rely on.
- Stability—It represents how similar are the explanations for similar instances. While consistency compares explanations between different models, stability compares explanations between similar instances for a fixed model. High stability means that slight variations in the feature values of an instance do not substantially change the explanation, unless these slight variations also strongly change the prediction.Nonetheless, a lack of stability can also be created by non-deterministic components of the explanation method, such as a data sampling step (as noted in the end of the previous Section 4.5.1). Regardless of that, high stability is always desirable.
- Comprehensibility—This property is one of the most important but also one of the most difficult to define and measure. It is related to how well humans understand the explanations. Interpretability being a mainly subjective concept, this property depends on the audience and the context.The comprehensibility of the features used in the explanation should also be considered, since a complex transformation of features might be less comprehensible than the original features [70].
- Certainty—It reflects the certainty of the ML model. Many ML models only provide prediction values, not including statement about the model’s confidence on the correctness of the prediction.
- Importance—It is associated with how well the explanation reflects the importance of features or of parts of the explanation. For example, if a decision rule is generated as an explanation, is it clear which of the conditions of the rule was the most important?
- Novelty—It describes if the explanation reflects whether an instance, of which the prediction is to be explained, comes from a region in the feature space that is far away from the distribution of the training data. In such cases, the model may be inaccurate and the explanation may be useless. One way of providing this information is to locate the data instance to be explained in the distribution of the training data.Furthermore, the concept of novelty is related to the concept of certainty: the higher the novelty, the more likely it is that the model will have low certainty due to lack of data.
- Representativeness—It describes how many instances are covered by the explanation. Explanations can cover the entire model (e.g., interpretation of weights in a linear regression model) or represent only an individual prediction.
4.5.3. Human-Friendly Explanations
- Contrastiveness [110]—Humans usually do not ask why a certain prediction was made but rather why this prediction was made instead of another prediction. In other words, there is a tendency for people to think in counterfactual cases. This means that people are not specifically interested in all the factors that led to the prediction but instead in the factors that need to change (in the input) so that the ML prediction/decision (output) would also change, implying a reference point, which is an hypothetical instance with the needed changes in the input and, consequently, with a different prediction (output).
- Selectivity—People do not expect explanations that cover the actual and complete list of causes of an event. Instead, they prefer selecting one or two main causes from a variety of possible causes as the explanation. As a result, explanation methods should be able to provide selected explanations or, at least, make explicit which ones are the main causes for a prediction.The “Rashomon Effect” describes this situation, in which different causes can explain an event [108]—since humans prefer to select some of the causes, the selected causes may vary from person to person.
- Social—Explanations are part of a social interaction between the explainer and the explainee. As seen in the beginning of Section 4.5, this means that the social context determines the content, the communication, and the nature of the explanations.Regarding ML interpretability, this implies that, when assessing the most appropriate explanation, one should take into consideration the social environment of the ML system and the target audience. This means that the best explanation varies according to the application domain and use case.
- Focus on the abnormal—People focus more on abnormal causes to explain events [111]. These are causes that had a small probability but, despite everything, happened. The elimination of these abnormal causes would have greatly changed the outcome (counterfactual faithfulness).In terms of ML interpretability, if one of the input feature values for a prediction was abnormal in any sense (e.g., a rare category) and the feature influenced the prediction outcome, it should be included in the explanation, even if other more frequent feature values have the same influence on the prediction as the abnormal one [70].
- Truthful—Good explanations are proven to be true in the real world. This does not mean that the whole truth must be in the explanation, as it would interfere with the explanation being selected or not and selectivity is a more important characteristic than truthfulness. With respect to ML interpretability, this means that an explanation must make sense (plausible) and be suitable to predictions of other instances.
- Consistent with prior beliefs of the explainee—People have a tendency to ignore information that is inconsistent with their prior beliefs. This effect is called confirmation bias [112]. The set of beliefs varies subjectively from person to person, but there are also group-based prior beliefs, which are mainly of a cultural nature, such as political worldviews.Notwithstanding, it is a trade-off with truthfulness, as prior knowledge is often not generally applicable and only valid in a specific knowledge domain. Honegger [79] argues that it would be counterproductive for an explanation to be simultaneously truthful and consistent with prior beliefs.
- General and probable—A cause that can explain a good number of events is very general and could, thus, be considered a good explanation. This seems to contradict the claim that abnormal causes make good explanations. However, abnormal causes are, by definition, rare in the given scenario, which means that, in the absence of an abnormal cause, a general explanation can be considered a good explanation.Regarding ML interpretability, generality can easily be measured by the feature’s support, which is the ratio between the number of instances to which the explanation applies and the total number of instances [70].
4.6. Interpretable Models and Explanation Methods
4.6.1. Interpretable Models
- Linearity—A model is linear if the association between feature values and target values is modelled linearly.
- Monotonicity—Enforcing monotonicity constraints on the model guarantees that the relationship between a specific input feature and the target outcome always goes in the same direction over the entire feature domain, i.e., when the feature value increases, it always leads to an increase or always leads to a decrease in the target outcome. Monotonicity is useful for the interpretation because it makes it easier to understand the relationship between some features and the target.
- Interaction—Some ML models have the ability to naturally include interactions between features to predict the target outcome. These interactions can be incorporated in any type of model by manually creating interaction features through feature engineering. Interactions can improve predictive performance, but too many or too complex interactions will decrease interpretability.
4.6.2. Model-Specific Explanation Methods
4.6.3. Model-Agnostic Explanation Methods
4.7. Evaluation of Interpretability
- Application-grounded evaluation (end task)—Requires conducting end-user experiments within a real application. This experiment is performed by using the explanation in a real-world application and having it tested and evaluated by the end user, who is also a domain expert. A good baseline for this is how good a human would be at explaining the same decision [70].
- Human-grounded evaluation (simple task)—Refers to conducting simpler human–subject experiments that maintain the essence of the target application. The difference is that these experiments are not carried out with the domain experts but with laypersons. Since no domain experts are required, experiments are cheaper and it is easier to find more testers.
- Functionally grounded evaluation (proxy task)—Requires no human experiments. In this type of evaluation, some formal definition of interpretability serves as a proxy to evaluate the explanation quality, e.g., the depth of a decision tree. Other proxies might be model sparsity or uncertainty [70]. This works best when the class of model being used has already been evaluated by someone else in a human-level evaluation.
4.7.1. Goals of Interpretability
- Accuracy—Refers to the actual connection between the given explanation by the explanation method and the prediction from the ML model [151]. Not achieving this goal would render the explanation useless, as it would not be faithful to the prediction it aims to explain. This goal is a similar concept to the fidelity property mentioned in Section 4.5.2.
- Understandability—Is related to the easiness of how an explanation is comprehended by the observer. This goal is crucial because, as accurate as an explanation can be, it is useless if it is not understandable [151]. This is similar to the comprehensibility property mentioned in Section 4.5.2.
- Efficiency—Reflects the time necessary for a user to grasp the explanation. Evidently, without this condition, it could be argued that almost any model is interpretable, given an infinite amount of time [151]. Thereby, an explanation should be understandable in a finite and preferably short amount of time. This goal is related to the previous one, understandability: in general, the more understandable is an explanation, the more efficiently it is grasped.
4.8. Literature Review Summary
5. Approaches on Interpretability Assessment
5.1. Qualitative Interpretability Indicators
- Form of cognitive chunks—This relates to the basic units of explanation, i.e., what are the explanations composed of? These could be, e.g., feature importance values, examples from the training set, or even rule lists. In certain domains, there are other possibilities, such as groups of pixels for the specific case of image recognition.
- Number of cognitive chunks that the explanation contains. How does the quantity interact with the form? In other words, taking into consideration that an example could contain a lot more information than a feature, can we handle both in similar quantities, in terms of ease of comprehension? If the explanation is composed of features, does it contain all features or only a few (selectivity)?
- Compositionality—This is related to the organization and structure of the cognitive chunks. Rules, hierarchies, and other abstractions may influence the human processing capacity. For example, an explanation may define a new unit (cognitive chunk) that is a function of raw units and provide an explanation in terms of that new unit. Other simple examples of compositionality are, e.g., the ordering of feature importance values or any threshold used to constrain the explanation.
- Monotonicity and other interactions between units. These interactions could be, e.g., linear, nonlinear, or monotone. Which type of relation between units is more intuitive for humans? Some relations may seem more natural for some than for others.
- Uncertainty and stochasticity refer to the explanation returning some comprehensible uncertainty measure and if any random processes are part of the explanation generation, e.g., sampling and random perturbation.
5.2. Quantitative Interpretability Indicators
- The sensitivity axiom is related to individual feature importance values and is composed of two parts:
- -
- Firstly, if there are two different predictions for two inputs that only differ in a single feature value, then this feature should have a nonzero attribution (or importance, which in this case refers to the same). This seems intuitive because the difference in the prediction should have been caused by the difference in the feature value, meaning that this feature should have a nonzero importance.
- -
- Secondly, if the DNN never depends on some feature value for its predictions (which would mean that the feature is noise for the specified prediction task), then the importance value for that feature should always be zero. This follows the same intuitive logic presented above.
- The implementation invariance axiom argues that if two DNNs are equal, i.e., they were trained on the same prediction task and they return identical predictions for the same inputs, then the corresponding attributions for these networks must also be identical, even if these have different implementations. This means that if an explanation method does not satisfy this axiom, then the method is potentially sensitive to irrelevant properties of the models, e.g., the architecture they have, which is clearly undesirable.It can be argued that the two DNNs could be looking to different parts of the data; however, if that would be the case, then they probably would not return identical predictions for the same inputs.
- Identity—Identical objects must have identical explanations. This makes sense because two equal instances should have equal predictions and, consequently, equal feature importance values. If an explanation method is prompted several times to explain the same object, it is expected to always generate the same explanation. If the explanation varied inconsistently, it would be confusing for a human trying to understand it. Furthermore, if an explanation method does not always return the same explanation for the same object, the method is not accurate due to its random nature.
- Separability—Nonidentical objects cannot have identical explanations. This follows the same logic as the previous axiom. Two different objects have different feature values and, thus, should have different feature importance values. Even if a model predicts the same outcome for different instances, the explanation should be different due to the different feature values that generated the prediction. However, it is worth noting that this axiom only holds if the model does not have more degrees of freedom than needed to represent the prediction function [127].
- Stability—Similar objects must have similar explanations. This axiom was inspired in the concept of algorithmic stability: a prediction algorithm is said to be stable if slight perturbations in the input data only result in small changes in the predictions [153]. Similarly, Honneger defines an explanation method as stable if it returns similar explanations for slightly different (similar) objects, implying a directly proportional relationship between the similarity among objects and the similarity among the respective explanations. Notwithstanding, the implementation of this axiom in Honneger’s research lacks some useful information, namely the distance metric used in the pairwise distance matrices for the regression case.
- Completeness—This is related to the audience verifying the validity of the explanation, i.e., the coverage of the explanation in terms of the number of instances which are comprised by the explanation,
- Correctness—The explanation should generate trust. In other words, it should be correct. This property is related to the label coherence of the instances covered by explanation, i.e., the instances covered by a correct explanation should have the same label.
- Compactness—The explanation should be succinct, which can be verified by the number of conditions in the decision rule and the feature dimensionality of a neighbor-based explanation. This proxy is related to the number qualitative indicator [152] and, therefore, to the selectivity of the explanation (Section 4.5.3).
6. Final Remarks
Author Contributions
Funding
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
DNN | Deep Neural Network |
EM | Explanation Method |
HCI | Human Computer Interaction |
ML | Machine Learning |
XAI | Explainable Artificial Intelligence |
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- International Data Corporation. Worldwide Spending on Cognitive and Artificial Intelligence Systems Forecast to Reach $77.6 Billion in 2022, According to New IDC Spending Guide. Available online: https://www.idc.com/getdoc.jsp?containerId=prUS44291818 (accessed on 22 January 2019).
- Tractica. Artificial Intelligence Software Market to Reach $105.8 Billion in Annual Worldwide Revenue by 2025. Available online: https://www.tractica.com/newsroom/press-releases/artificial-intelligence-software-market-to-reach-105-8-billion-in-annual-worldwide-revenue-by-2025/ (accessed on 22 January 2019).
- Gartner. Gartner Top 10 Strategic Technology Trends for 2019. Available online: https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2019/ (accessed on 22 January 2019).
- Du, M.; Liu, N.; Hu, X. Techniques for Interpretable Machine Learning. arXiv 2018, arXiv:1808.00033. [Google Scholar]
- Montavon, G.; Lapuschkin, S.; Binder, A.; Samek, W.; Müller, K.R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 2017, 65, 211–222. [Google Scholar] [CrossRef]
- Golovin, D.; Solnik, B.; Moitra, S.; Kochanski, G.; Karro, J.; Sculley, D. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1487–1495. [Google Scholar]
- Rudin, C. Please Stop Explaining Black Box Models for High Stakes Decisions. arXiv 2018, arXiv:1811.10154. [Google Scholar]
- Van Lent, M.; Fisher, W.; Mancuso, M. An explainable artificial intelligence system for small-unit tactical behavior. In Proceedings of the National Conference on Artificial Intelligence, San Jose, CA, USA, 25–29 July 2004; AAAI Press: Menlo Park, CA, USA; MIT Press: Cambridge, MA, USA, 2004; pp. 900–907. [Google Scholar]
- Swartout, W.R. Xplain: A System for Creating and Explaining Expert Consulting Programs; Technical Report; University of Southern California, Information Sciences Institute: Marina del Rey, CA, USA, 1983. [Google Scholar]
- Van Melle, W.; Shortliffe, E.H.; Buchanan, B.G. EMYCIN: A knowledge engineer’s tool for constructing rule-based expert systems. In Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project; Addison-Wesley Reading: Boston, MA, USA, 1984; pp. 302–313. [Google Scholar]
- Moore, J.D.; Swartout, W.R. Explanation in Expert Systems: A Survey; Technical Report; University of Southern California, Information Sciences Institute: Marina del Rey, CA, USA, 1988. [Google Scholar]
- Andrews, R.; Diederich, J.; Tickle, A.B. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl. Based Syst. 1995, 8, 373–389. [Google Scholar] [CrossRef]
- Cramer, H.; Evers, V.; Ramlal, S.; Van Someren, M.; Rutledge, L.; Stash, N.; Aroyo, L.; Wielinga, B. The effects of transparency on trust in and acceptance of a content-based art recommender. User Model. User Adapt. Interact. 2008, 18, 455. [Google Scholar] [CrossRef]
- Herlocker, J.L.; Konstan, J.A.; Riedl, J. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, Philadelphia, PA, USA, 2–6 December 2000; pp. 241–250. [Google Scholar]
- Abdul, A.; Vermeulen, J.; Wang, D.; Lim, B.Y.; Kankanhalli, M. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; p. 582. [Google Scholar]
- Gunning, D. Explainable Artificial Intelligence (XAI); Defense Advanced Research Projects Agency: Arlington, VA, USA, 2017; Volume 2.
- Gunning, D. Explainable Artificial Intelligence (XAI). Available online: https://www.darpa.mil/program/explainable-artificial-intelligence (accessed on 22 January 2019).
- Committee on Technology National Science and Technology Council and Penny Hill Press. Preparing for the Future of Artificial Intelligence; CreateSpace Independent Publishing Platform: Scotts Valley, CA, USA, 2016. [Google Scholar]
- ACM US Public Council. Statement on Algorithmic Transparency and Accountability. 2017. Available online: https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf (accessed on 22 January 2019).
- IPN SIG AI. Dutch Artificial Intelligence Manifesto. 2018. Available online: http://ii.tudelft.nl/bnvki/wp-content/uploads/2018/09/Dutch-AI-Manifesto.pdf (accessed on 22 January 2019).
- Cédric Villani. AI for Humanity—French National Strategy for Artificial intelligence. 2018. Available online: https://www.aiforhumanity.fr/en/ (accessed on 22 January 2019).
- Royal Society. Machine Learning: The Power and Promise of Computers that Learn by Example. 2017. Available online: https://royalsociety.org/topics-policy/projects/machine-learning/ (accessed on 3 May 2019).
- Portuguese National Initiative on Digital Skills. AI Portugal 2030. 2019. Available online: https://www.incode2030.gov.pt/sites/default/files/draft_ai_portugal_2030v_18mar2019.pdf (accessed on 3 May 2019).
- European Commission. Artificial Intelligence for Europe. 2018. Available online: https://ec.europa.eu/digital-single-market/en/news/communication-artificial-intelligence-europe (accessed on 3 May 2019).
- European Commission. Algorithmic Awareness-Building. 2018. Available online: https://ec.europa.eu/digital-single-market/en/algorithmic-awareness-building (accessed on 3 May 2019).
- Rao, A.S. Responsible AI & National AI Strategies. 2018. Available online: https://ec.europa.eu/growth/tools-databases/dem/monitor/sites/default/files/4%20International%20initiatives%20v3_0.pdf (accessed on 22 January 2019).
- High-Level Expert Group on Artificial Intelligence (AI HLEG). Ethics Guidelines for Trustworthy Artificial Intelligence. 2019. Available online: https://ec.europa.eu/futurium/en/ai-alliance-consultation/guidelines (accessed on 3 May 2019).
- Google. Responsible AI Practices—Interpretability. Available online: https://ai.google/education/responsible-ai-practices?category=interpretability (accessed on 18 January 2019).
- H2O.ai. H2O Driverless AI. Available online: https://www.h2o.ai/products/h2o-driverless-ai/ (accessed on 18 January 2019).
- DataRobot. Model Interpretability. Available online: https://www.datarobot.com/wiki/interpretability/ (accessed on 18 January 2019).
- IBM. Trust and Transparency in AI. Available online: https://www.ibm.com/watson/trust-transparency (accessed on 18 January 2019).
- Kyndi. Kyndi AI Platform. Available online: https://kyndi.com/products/ (accessed on 18 January 2019).
- Andy Flint, Arash Nourian, Jari Koister. xAI Toolkit: Practical, Explainable Machine Learning. Available online: https://www.fico.com/en/latest-thinking/white-paper/xai-toolkit-practical-explainable-machine-learning (accessed on 18 January 2019).
- FICO. FICO Makes Artificial Intelligence Explainable. 2018. Available online: https://www.fico.com/en/newsroom/fico-makes-artificial-intelligence-explainable-with-latest-release-of-its-analytics-workbench (accessed on 18 January 2019).
- Fahner, G. Developing Transparent Credit Risk Scorecards More Effectively: An Explainable Artificial Intelligence Approach. Data Anal. 2018, 2018, 17. [Google Scholar]
- FICO. FICO Score Research: Explainable AI for Credit Scoring. 2019. Available online: https://www.fico.com/blogs/analytics-optimization/fico-score-research-explainable-ai-and-machine-learning-for-credit-scoring/ (accessed on 5 February 2019).
- Kahng, M.; Andrews, P.Y.; Kalro, A.; Chau, D.H.P. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Trans. Vis. Comput. Gr. 2018, 24, 88–97. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Wang, Y.; Molino, P.; Li, L.; Ebert, D.S. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE Trans. Vis. Comput. Gr. 2019, 25, 364–373. [Google Scholar] [CrossRef] [PubMed]
- Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
- FAT/ML. Fairness, Accountability, and Transparency in Machine Learning. Available online: http://www.fatml.org/ (accessed on 22 January 2019).
- Kim, B.; Malioutov, D.M.; Varshney, K.R. Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). arXiv 2016, arXiv:1607.02531. [Google Scholar]
- Kim, B.; Malioutov, D.M.; Varshney, K.R.; Weller, A. Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017). arXiv 2017, arXiv:1708.02666. [Google Scholar]
- Kim, B.; Varshney, K.R.; Weller, A. Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018). arXiv 2018, arXiv:1807.01308. [Google Scholar]
- Wilson, A.G.; Kim, B.; Herlands, W. Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems. arXiv 2016, arXiv:1611.09139. [Google Scholar]
- Caruana, R.; Herlands, W.; Simard, P.; Wilson, A.G.; Yosinski, J. Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning. arXiv 2017, arXiv:1711.09889. [Google Scholar]
- Pereira-Fariña, M.; Reed, C. Proceedings of the 1st Workshop on Explainable Computational Intelligence (XCI 2017); Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2017. [Google Scholar]
- IJCNN. IJCNN 2017 Explainability of Learning Machines. Available online: http://gesture.chalearn.org/ijcnn17_explainability_of_learning_machines (accessed on 22 January 2019).
- IJCAI. IJCAI 2017—Workshop on Explainable Artificial Intelligence (XAI). Available online: http://home.earthlink.net/~dwaha/research/meetings/ijcai17-xai/ (accessed on 12 July 2019).
- IJCAI. IJCAI 2018—Workshop on Explainable Artificial Intelligence (XAI). Available online: http://home.earthlink.net/~dwaha/research/meetings/faim18-xai/ (accessed on 12 July 2019).
- Stoyanov, D.; Taylor, Z.; Kia, S.M.; Oguz, I.; Reyes, M.; Martel, A.; Maier-Hein, L.; Marquand, A.F.; Duchesnay, E.; Löfstedt, T.; et al. Understanding and Interpreting Machine Learning in Medical Image Computing Applications. In First International Workshops, MLCN 2018, DLF 2018, and iMIMIC 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16–20, 2018; Springer: Berlin, Germany, 2018; Volume 11038. [Google Scholar]
- IPMU. IPMU 2018—Advances on Explainable Artificial Intelligence. Available online: http://ipmu2018.uca.es/submission/cfspecial-sessions/special-sessions/#explainable (accessed on 12 July 2019).
- Holzinger, A.; Kieseberg, P.; Tjoa, A.M.; Weippl, E. (Eds.) Machine Learning and Knowledge Extraction: Second IFIP TC 5, TC 8/WG 8.4, 8.9, TC 12/WG 12.9 International Cross-Domain Conference, CD-MAKE 2018, Hamburg, Germany, August 27–30, 2018, Proceedings; Springer: Berlin, Germany, 2018; Volume 11015. [Google Scholar]
- CD-MAKE. CD-MAKE 2019 Workshop on explainable Artificial Intelligence. Available online: https://cd-make.net/special-sessions/make-explainable-ai/ (accessed on 12 July 2019).
- Lim, B.; Smith, A.; Stumpf, S. ExSS 2018: Workshop on Explainable Smart Systems. In CEUR Workshop Proceedings; City, University of London Institutional Repository: London, UK, 2018; Volume 2068, Available online: http://openaccess.city.ac.uk/20037/ (accessed on 12 July 2019).
- Lim, B.; Sarkar, A.; Smith-Renner, A.; Stumpf, S. ExSS: Explainable smart systems 2019. In Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion, Marina del Ray, CA, USA, 16–20 March 2019; pp. 125–126. [Google Scholar]
- ICAPS. ICAPS 2018—Workshop on Explainable AI Planning (XAIP). Available online: http://icaps18.icaps-conference.org/xaip/ (accessed on 12 July 2019).
- ICAPS. ICAPS 2019—Workshop on Explainable AI Planning (XAIP). Available online: https://kcl-planning.github.io/XAIP-Workshops/ICAPS_2019 (accessed on 12 July 2019).
- Zhang, Q.; Fan, L.; Zhou, B. Network Interpretability for Deep Learning. Available online: http://networkinterpretability.org/ (accessed on 22 January 2019).
- CVPR. CVPR 19—Workshop on Explainable AI. Available online: https://explainai.net/ (accessed on 12 July 2019).
- FICO. Explainable Machine Learning Challenge. 2018. Available online: https://community.fico.com/s/explainable-machine-learning-challenge (accessed on 18 January 2019).
- Institute for Ethical AI & Machine Learning. The Responsible Machine Learning Principles. 2019. Available online: https://ethical.institute/principles.html#commitment-3 (accessed on 5 February 2019).
- Lipton, Z.C. The mythos of model interpretability. arXiv 2016, arXiv:1606.03490. [Google Scholar] [CrossRef]
- Silva, W.; Fernandes, K.; Cardoso, M.J.; Cardoso, J.S. Towards Complementary Explanations Using Deep Neural Networks. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications; Springer: Berlin, Germany, 2018; pp. 133–140. [Google Scholar]
- Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning. arXiv 2018, arXiv:1806.00069. [Google Scholar]
- Doran, D.; Schulz, S.; Besold, T.R. What does explainable AI really mean? A new conceptualization of perspectives. arXiv 2017, arXiv:1710.00794. [Google Scholar]
- UK Government House of Lords. AI in the UK: Ready, Willing and Able? 2017. Available online: https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/10007.htm (accessed on 18 January 2019).
- Kirsch, A. Explain to whom? Putting the user in the center of explainable AI. In Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 Co-Located with 16th International Conference of the Italian Association for Artificial Intelligence (AI* IA 2017), Bari, Italy, 16–17 November 2017. [Google Scholar]
- Molnar, C. Interpretable Machine Learning. 2019. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 22 January 2019).
- Temizer, S.; Kochenderfer, M.; Kaelbling, L.; Lozano-Pérez, T.; Kuchar, J. Collision avoidance for unmanned aircraft using Markov decision processes. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, Toronto, ON, Canada, 2–5 August 2010; p. 8040. [Google Scholar]
- Wexler, R. When a computer program keeps you in jail: How computers are harming criminal justice. New York Times, 13 June 2017. [Google Scholar]
- McGough, M. How Bad Is Sacramento’s Air, Exactly? Google Results Appear at Odds with Reality, Some Say. 2018. Available online: https://www.sacbee.com/news/state/california/fires/article216227775.html (accessed on 18 January 2019).
- Varshney, K.R.; Alemzadeh, H. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. Big Data 2017, 5, 246–255. [Google Scholar] [CrossRef]
- Donnelly, C.; Embrechts, P. The devil is in the tails: Actuarial mathematics and the subprime mortgage crisis. ASTIN Bull. J. IAA 2010, 40, 1–33. [Google Scholar] [CrossRef]
- Angwin, J.; Larson, J.; Mattu, S.; Kirchner, L. Machine Bias. 2016. Available online: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing (accessed on 18 January 2019).
- Tan, S.; Caruana, R.; Hooker, G.; Lou, Y. Detecting bias in black-box models using transparent model distillation. arXiv 2017, arXiv:1710.06169. [Google Scholar]
- Doshi-Velez, F.; Kortz, M.; Budish, R.; Bavitz, C.; Gershman, S.; O’Brien, D.; Schieber, S.; Waldo, J.; Weinberger, D.; Wood, A. Accountability of AI under the law: The role of explanation. arXiv 2017, arXiv:1711.01134. [Google Scholar] [CrossRef]
- Honegger, M. Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions. arXiv 2018, arXiv:1808.05054. [Google Scholar]
- O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Broadway Books: Portland, OR, USA, 2017. [Google Scholar]
- Keil, F.; Rozenblit, L.; Mills, C. What lies beneath? Understanding the limits of understanding. Thinking and Seeing: Visual Metacognition in Adults and Children; MIT Press: Cambridge, MA, USA, 2004; pp. 227–249. [Google Scholar]
- Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainabilty of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery; Wiley: Hoboken, NJ, USA, 2019; p. e1312. [Google Scholar]
- Mueller, H.; Holzinger, A. Kandinsky Patterns. arXiv 2019, arXiv:1906.00657. [Google Scholar]
- European Commission. General Data Protection Regulation. 2016. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679 (accessed on 18 January 2019).
- Weller, A. Challenges for transparency. arXiv 2017, arXiv:1708.01870. [Google Scholar]
- Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR.(2017). Harv. J. Law Technol. 2017, 31, 841. [Google Scholar]
- Goodman, B.; Flaxman, S. EU regulations on algorithmic decision-making and a “right to explanation”. arXiv 2016, arXiv:1606.08813. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Floridi, L. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int. Data Priv. Law 2017, 7, 76–99. [Google Scholar] [CrossRef]
- Hardt, M.; Price, E.; Srebro, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; pp. 3315–3323. [Google Scholar]
- Rüping, S. Learning Interpretable Models. Ph.D. Thesis, University of Dortmund, Dortmund, Germany, 2006. [Google Scholar]
- Freitas, A.A. Comprehensible classification models: a position paper. ACM SIGKDD Explor. Newslett. 2014, 15, 1–10. [Google Scholar] [CrossRef]
- Case, N. How To Become A Centaur. J. Design Sci. 2018. [Google Scholar] [CrossRef]
- Varshney, K.R.; Khanduri, P.; Sharma, P.; Zhang, S.; Varshney, P.K. Why Interpretability in Machine Learning? An Answer Using Distributed Detection and Data Fusion Theory. arXiv 2018, arXiv:1806.09710. [Google Scholar]
- Miller, T. Explanation in Artificial Intelligence: Insights from the social sciences. Artif. Intell. 2018, 267, 1–38. [Google Scholar] [CrossRef]
- Kim, B.; Khanna, R.; Koyejo, O.O. Examples are not enough, learn to criticize! Criticism for interpretability. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; pp. 2280–2288. [Google Scholar]
- Heider, F.; Simmel, M. An experimental study of apparent behavior. Am. J. Psychol. 1944, 57, 243–259. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Kim, B.; Doshi-Velez, F. Introduction to Interpretable Machine Learning. In Proceedings of the CVPR 2018 Tutorial on Interpretable Machine Learning for Computer Vision, Salt Lake City, UT, USA, 18 June 2018. [Google Scholar]
- Tukey, J.W. Exploratory Data Analysis; Pearson: London, UK, 1977; Volume 2. [Google Scholar]
- Jolliffe, I. Principal component analysis. In International Encyclopedia of Statistical Science; Springer: Berlin, Germany, 2011; pp. 1094–1096. [Google Scholar]
- Maaten, L.v.d.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
- Google People + AI Research (PAIR). Facets—Visualization for ML Datasets. Available online: https://pair-code.github.io/facets/ (accessed on 12 July 2019).
- Cowan, N. The magical mystery four: How is working memory capacity limited, and why? Curr. Dir. Psychol. Sci. 2010, 19, 51–57. [Google Scholar] [CrossRef]
- Lou, Y.; Caruana, R.; Gehrke, J. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 150–158. [Google Scholar]
- Kim, T.W. Explainable Artificial Intelligence (XAI), the goodness criteria and the grasp-ability test. arXiv 2018, arXiv:1810.09598. [Google Scholar]
- Breiman, L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat. Sci. 2001, 16, 199–231. [Google Scholar] [CrossRef]
- Robnik-Šikonja, M.; Bohanec, M. Perturbation-Based Explanations of Prediction Models. In Human and Machine Learning; Springer: Berlin, Germany, 2018; pp. 159–175. [Google Scholar]
- Lipton, P. Contrastive explanation. R. Inst. Philos. Suppl. 1990, 27, 247–266. [Google Scholar] [CrossRef]
- Kahneman, D.; Tversky, A. The Simulation Heuristic; Technical Report; Department of Psychology, Stanford University: Stanford, CA, USA, 1981. [Google Scholar]
- Nickerson, R.S. Confirmation bias: A ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 1998, 2, 175. [Google Scholar] [CrossRef]
- Lakkaraju, H.; Bach, S.H.; Leskovec, J. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1675–1684. [Google Scholar]
- Rudziński, F. A multi-objective genetic optimization of interpretability-oriented fuzzy rule-based classifiers. Appl. Soft Comput. 2016, 38, 118–133. [Google Scholar] [CrossRef]
- Angelino, E.; Larus-Stone, N.; Alabi, D.; Seltzer, M.; Rudin, C. Learning certifiably optimal rule lists. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 35–44. [Google Scholar]
- Dash, S.; Günlük, O.; Wei, D. Boolean Decision Rules via Column Generation. arXiv 2018, arXiv:1805.09901. [Google Scholar]
- Yang, H.; Rudin, C.; Seltzer, M. Scalable Bayesian rule lists. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 3921–3930. [Google Scholar]
- Rudin, C.; Ustun, B. Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice. Interfaces 2018, 48, 449–466. [Google Scholar] [CrossRef]
- Wang, T.; Rudin, C.; Doshi-Velez, F.; Liu, Y.; Klampfl, E.; MacNeille, P. A bayesian framework for learning rule sets for interpretable classification. J. Mach. Learn. Res. 2017, 18, 2357–2393. [Google Scholar]
- Kim, B.; Rudin, C.; Shah, J.A. The bayesian case model: A generative approach for case-based reasoning and prototype classification. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; pp. 1952–1960. [Google Scholar]
- Ross, A.; Lage, I.; Doshi-Velez, F. The neural lasso: Local linear sparsity for interpretable explanations. In Proceedings of the Workshop on Transparent and Interpretable Machine Learning in Safety Critical Environments, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Lage, I.; Ross, A.S.; Kim, B.; Gershman, S.J.; Doshi-Velez, F. Human-in-the-Loop Interpretability Prior. arXiv 2018, arXiv:1805.11571. [Google Scholar]
- Lee, M.; He, X.; Yih, W.t.; Gao, J.; Deng, L.; Smolensky, P. Reasoning in vector space: An exploratory study of question answering. arXiv 2015, arXiv:1511.06426. [Google Scholar]
- Palangi, H.; Smolensky, P.; He, X.; Deng, L. Question-answering with grammatically-interpretable representations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kindermans, P.J.; Schütt, K.T.; Alber, M.; Müller, K.R.; Erhan, D.; Kim, B.; Dähne, S. Learning how to explain neural networks: PatternNet and PatternAttribution. arXiv 2017, arXiv:1705.05598. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. arXiv 2017, arXiv:1703.01365. [Google Scholar]
- Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. Smoothgrad: removing noise by adding noise. arXiv 2017, arXiv:1706.03825. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2673–2682. [Google Scholar]
- Polino, A.; Pascanu, R.; Alistarh, D. Model compression via distillation and quantization. arXiv 2018, arXiv:1802.05668. [Google Scholar]
- Wu, M.; Hughes, M.C.; Parbhoo, S.; Zazzi, M.; Roth, V.; Doshi-Velez, F. Beyond sparsity: Tree regularization of deep models for interpretability. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Xu, K.; Park, D.H.; Yi, C.; Sutton, C. Interpreting Deep Classifier by Visual Distillation of Dark Knowledge. arXiv 2018, arXiv:1803.04042. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Murdoch, W.J.; Szlam, A. Automatic rule extraction from long short term memory networks. arXiv 2017, arXiv:1702.02540. [Google Scholar]
- Frosst, N.; Hinton, G. Distilling a neural network into a soft decision tree. arXiv 2017, arXiv:1711.09784. [Google Scholar]
- Bastani, O.; Kim, C.; Bastani, H. Interpreting blackbox models via model extraction. arXiv 2017, arXiv:1705.08504. [Google Scholar]
- Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H. Explainability Methods for Graph Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Wagner, J.; Kohler, J.M.; Gindele, T.; Hetzel, L.; Wiedemer, J.T.; Behnke, S. Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhang, Q.; Yang, Y.; Ma, H.; Wu, Y.N. Interpreting CNNs via Decision Trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Kanehira, A.; Harada, T. Learning to Explain With Complemental Examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Goldstein, A.; Kapelner, A.; Bleich, J.; Pitkin, E. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Gr. Stat. 2015, 24, 44–65. [Google Scholar] [CrossRef]
- Apley, D.W. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. arXiv 2016, arXiv:1612.08468. [Google Scholar]
- Friedman, J.H.; Popescu, B.E. Predictive learning via rule ensembles. Ann. Appl. Stat. 2008, 2, 916–954. [Google Scholar] [CrossRef]
- Fisher, A.; Rudin, C.; Dominici, F. Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the “Rashomon” Perspective. arXiv 2018, arXiv:1801.01489. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; pp. 4765–4774. [Google Scholar]
- Staniak, M.; Biecek, P. Explanations of model predictions with live and breakDown packages. arXiv 2018, arXiv:1804.01955. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Koh, P.W.; Liang, P. Understanding black-box predictions via influence functions. arXiv 2017, arXiv:1703.04730. [Google Scholar]
- Bibal, A.; Frénay, B. Interpretability of machine learning models and representations: An introduction. In Proceedings of the 24th European Symposium on Artificial Neural Networks ESANN, Bruges, Belgium, 27–29 April 2016; pp. 77–82. [Google Scholar]
- Doshi-Velez, F.; Kim, B. Considerations for Evaluation and Generalization in Interpretable Machine Learning. In Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Berlin, Germany, 2018; pp. 3–17. [Google Scholar]
- Bonnans, J.F.; Shapiro, A. Perturbation Analysis of Optimization Problems; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Name | Year |
---|---|
Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) (NIPS, ICML, DTL, KDD) [42] | 2014–2018 |
ICML Workshop on Human Interpretability in Machine Learning (WHI) [43,44,45] | 2016–2018 |
NIPS Workshop on Interpretable Machine Learning for Complex Systems [46] | 2016 |
NIPS Symposium on Interpretable Machine Learning [47] | 2017 |
XCI: Explainable Computational Intelligence Workshop [48] | 2017 |
IJCNN Explainability of Learning Machines [49] | 2017 |
IJCAI Workshop on Explainable Artificial Intelligence (XAI) [50,51] | 2017–2018 |
“Understanding and Interpreting Machine Learning in Medical Image Computing Applications” (MLCN, DLF, and iMIMIC) workshops [52] | 2018 |
IPMU 2018—Advances on Explainable Artificial Intelligence [53] | 2018 |
CD-MAKE Workshop on explainable Artificial Intelligence [54,55] | 2018–2019 |
Workshop on Explainable Smart Systems (ExSS) [56,57] | 2018–2019 |
ICAPS—Workshop on Explainable AI Planning (XAIP) [58,59] | 2018–2019 |
AAAI-19 Workshop on Network Interpretability for Deep Learning [60] | 2019 |
CVPR—Workshop on Explainable AI [61] | 2019 |
Pre-model | N.A. | N.A. |
In-model | Intrinsic | Model-specific |
Post-model | Post hoc | Model-agnostic |
Algorithm | Linear | Monotone | Interaction | Task |
---|---|---|---|---|
Linear regression | Yes | Yes | No | Regression |
Logistic regression | No | Yes | No | Classification |
Decision Trees | No | Some | Yes | Classification, Regression |
RuleFit | Yes | No | Yes | Classification, Regression |
Naive Bayes | No | Yes | No | Classification |
Explanation Method | Scope | Result |
---|---|---|
Partial Dependence Plot [142] | Global | Feature summary |
Individual Condition Expectation [143] | Global/Local | Feature summary |
Accumulated Local Effects Plot [144] | Global | Feature summary |
Feature Interaction [145] | Global | Feature summary |
Feature Importance [146] | Global/Local | Feature summary |
Local Surrogate Model [98] | Local | Surrogate interpretable model |
Shapley Values [147] | Local | Feature summary |
BreakDown [148] | Local | Feature summary |
Anchors [149] | Local | Feature summary |
Counterfactual Explanations [87] | Local | (new) Data point |
Prototypes and Criticisms [96] | Global | (existent) Data point |
Influence Functions [150] | Global/Local | (existent) Data point |
Section | Content |
---|---|
Interpretability importance | Satisfy human curiosity |
Scientific findings | |
Find meaning | |
Regulation requirements | |
Social acceptance and trust | |
Safety | |
Acquire new knowledge | |
Taxonomy of interpretability | Pre-model vs. In-model vs. Post-model |
Intrinsic vs. Post-hoc | |
Model-specific vs. Model-agnostic | |
Results of explanation methods | |
Scope of interpretability | Algorithm transparency |
Global model interpretability (holistic vs. modular) | |
Local model interpretability (single prediction vs. group of predictions) | |
Properties of explanation methods | Expressive power; Translucency; Portability; Algorithmic complexity |
Properties of explanations | Accuracy; Fidelity; Consistency; Stability; Comprehensibility; Certainty; Importance; Novelty; Representativeness |
Human-friendly explanations | Contrastiveness; Selectivity; Social; Focus on the abnormal; Truthful; Consistent with prior beliefs; General and probable |
Interpretability evaluation | Application-level |
Human-level | |
Functional-level | |
Interpretability goals | Accuracy |
Understandability | |
Efficiency |
Property (Section 4.5.2) [109] | Sundararajan et al. [127] | Honegger [79] | Wilson et al. [65] |
---|---|---|---|
Accuracy | N.A. | N.A. | Correctness |
Fidelity | Sensitivity | Identity, Separability | Correctness |
Consistency | Implementation invariance | N.A. | Yes |
Stability | N.A. | Stability | N.A. |
Comprehensibility | N.A. | N.A. | Compactness |
Certainty | N.A. | N.A. | N.A. |
Importance | Sensitivity | N.A. | N.A. |
Novelty | N.A. | N.A. | N.A. |
Representativeness | N.A. | N.A. | Completeness |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 2019, 8, 832. https://doi.org/10.3390/electronics8080832
Carvalho DV, Pereira EM, Cardoso JS. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics. 2019; 8(8):832. https://doi.org/10.3390/electronics8080832
Chicago/Turabian StyleCarvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. 2019. "Machine Learning Interpretability: A Survey on Methods and Metrics" Electronics 8, no. 8: 832. https://doi.org/10.3390/electronics8080832