Despite the widespread application of deep neural networks in finance, medical treatment, and autonomous driving, these networks face multiple security threats, such as maliciously constructed adversarial samples that can easily mislead deep neural network model classification, causing errors. Therefore, creating an interpretable model or designing an interpretation method is necessary to improve its security. This paper presents an interpretation scheme, named Convergent Interpretation for Deep Neural Networks (CIDNN), to obtain a provably convergent and consistent interpretation for deep neural networks. The main idea of CIDNN is to first convert the deep neural networks into a set of mathematically convergent Piecewise Linear Neural Networks (PLNN), then convert the PLNN into a set of equivalent linear classifiers. In this way, each linear classifier can be interpreted by its decision features. By analyzing the convergence of the local approximation interpretation scheme, we prove that this interpretable model can be sufficiently close to the deep neural network with certain conditions. Experiments show the convergence of CIDNN's interpretation, and the interpretation conforms with similar samples in the synthetic dataset. Besides, we demonstrate the semantical meaning of CIDNN in the Fashion-MNIST dataset.

High-precision linearized interpretation for fully connected neural network

Castiglione A.;
2021-01-01

Abstract

Despite the widespread application of deep neural networks in finance, medical treatment, and autonomous driving, these networks face multiple security threats, such as maliciously constructed adversarial samples that can easily mislead deep neural network model classification, causing errors. Therefore, creating an interpretable model or designing an interpretation method is necessary to improve its security. This paper presents an interpretation scheme, named Convergent Interpretation for Deep Neural Networks (CIDNN), to obtain a provably convergent and consistent interpretation for deep neural networks. The main idea of CIDNN is to first convert the deep neural networks into a set of mathematically convergent Piecewise Linear Neural Networks (PLNN), then convert the PLNN into a set of equivalent linear classifiers. In this way, each linear classifier can be interpreted by its decision features. By analyzing the convergence of the local approximation interpretation scheme, we prove that this interpretable model can be sufficiently close to the deep neural network with certain conditions. Experiments show the convergence of CIDNN's interpretation, and the interpretation conforms with similar samples in the synthetic dataset. Besides, we demonstrate the semantical meaning of CIDNN in the Fashion-MNIST dataset.
2021
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4774525
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 9
social impact