Given a finite ordered set of items and an unknown distinguished subset P of up to p positive elements, identify the items in P by asking the least number of queries of the type ‘‘does the subset Q intersect P?”, where Q is a subset of consecutive elements of {1, 2, . . . , n}. This problem arises, e.g., in computational biology, in a particular method for determining splice sites in genes. We consider time-efficient algorithms where queries are arranged in a fixed number s of stages: In each stage, queries are performed in parallel. In a recent bioinformatics paper, we proved optimality (subject to lower-order terms) with respect to the number of queries, of some strategies for the special cases p = 1 or s = 2. Exploiting new ideas, we are now able to provide improved lower bounds for any p >= 2 and s >= 3 and improved upper bounds for larger s. Most notably, our new bounds converge as s grows. Our new query scheme uses overlapping query intervals within a stage, which is effective for large enough s. This contrasts with our previous results for s <= 2 where optimal strategies were implemented by disjoint queries. The main open problem is whether overlaps help already in the case of small s >= 3. Anyway, the remaining gaps between the current upper and lower bounds for any fixed s >= 3 amount to small constant factors in the main term. The paper ends with a discussion of practical implications in the case that the positive elements are well separated.

Overlaps Help: Improved Bounds for Group Testing with Interval Queries

CICALESE, Ferdinando;
2007-01-01

Abstract

Given a finite ordered set of items and an unknown distinguished subset P of up to p positive elements, identify the items in P by asking the least number of queries of the type ‘‘does the subset Q intersect P?”, where Q is a subset of consecutive elements of {1, 2, . . . , n}. This problem arises, e.g., in computational biology, in a particular method for determining splice sites in genes. We consider time-efficient algorithms where queries are arranged in a fixed number s of stages: In each stage, queries are performed in parallel. In a recent bioinformatics paper, we proved optimality (subject to lower-order terms) with respect to the number of queries, of some strategies for the special cases p = 1 or s = 2. Exploiting new ideas, we are now able to provide improved lower bounds for any p >= 2 and s >= 3 and improved upper bounds for larger s. Most notably, our new bounds converge as s grows. Our new query scheme uses overlapping query intervals within a stage, which is effective for large enough s. This contrasts with our previous results for s <= 2 where optimal strategies were implemented by disjoint queries. The main open problem is whether overlaps help already in the case of small s >= 3. Anyway, the remaining gaps between the current upper and lower bounds for any fixed s >= 3 amount to small constant factors in the main term. The paper ends with a discussion of practical implications in the case that the positive elements are well separated.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/1869310
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
social impact