In a statistical decision problem, one observes independent samples and is asked to decide between two mutually exclusive states of nature. With knowledge of the data distributions under the two hypotheses, it is known that the asymptotically optimal error probabilities converge to zero at a rate given by the error exponent function, which quantifies the information for detection contained in the data. As a new element, suppose that the decision problem is to be solved without data labeling, i.e., with no knowledge of which observation comes from which distribution. What is the error exponent in this case? How much information for detection is contained in the samples and how much is lost with the labels? This problem - unlabeled detection - is of theoretical relevance per se, but is also attracting growing practical interest, because loss of labels often occurs naturally in big-data settings, user profiling in social networks; and may be deliberate, for example to save bandwidth in inference involving large sensor networks. For binary unlabeled observations and focusing on the low-detectability regime, we derive simple closed-form expressions for the error exponent and related quantities (Chernoff-Stein exponent, Chernoff information), providing new insights. Practical algorithms are then discussed, showing that many decision algorithms proposed in the literature reduce for binary data to simple forms, which highlight their properties and relative merits. A detector with close-to-optimum performance for a wide class of detection problems is proposed.

Making Decisions by Unlabeled Bits

Marano S.;
2020-01-01

Abstract

In a statistical decision problem, one observes independent samples and is asked to decide between two mutually exclusive states of nature. With knowledge of the data distributions under the two hypotheses, it is known that the asymptotically optimal error probabilities converge to zero at a rate given by the error exponent function, which quantifies the information for detection contained in the data. As a new element, suppose that the decision problem is to be solved without data labeling, i.e., with no knowledge of which observation comes from which distribution. What is the error exponent in this case? How much information for detection is contained in the samples and how much is lost with the labels? This problem - unlabeled detection - is of theoretical relevance per se, but is also attracting growing practical interest, because loss of labels often occurs naturally in big-data settings, user profiling in social networks; and may be deliberate, for example to save bandwidth in inference involving large sensor networks. For binary unlabeled observations and focusing on the low-detectability regime, we derive simple closed-form expressions for the error exponent and related quantities (Chernoff-Stein exponent, Chernoff information), providing new insights. Practical algorithms are then discussed, showing that many decision algorithms proposed in the literature reduce for binary data to simple forms, which highlight their properties and relative merits. A detector with close-to-optimum performance for a wide class of detection problems is proposed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4771000
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 7
social impact