The main problem related to the retrieval of information from the World Wide Web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a Web Mining Environment (WME) that it is capable to find, extract and structure information related to a particular domain from web documents using general-purpose indices. The WME architecture includes a Web Engine Filter (WEF) to sort and reduce the answer set returned by a web engine, a Data Source Pre-processor (DSP) that processes html layout cues in order to collect and qualify page segments and an Heuristic-based Information Extraction System (HIES) to finally retrieve the required data. Furthermore, we present a Web Mining Environment Generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

Automatic generation of Web mining environments

Cibelli Maurizio;Costagliola Gennaro
1999-01-01

Abstract

The main problem related to the retrieval of information from the World Wide Web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a Web Mining Environment (WME) that it is capable to find, extract and structure information related to a particular domain from web documents using general-purpose indices. The WME architecture includes a Web Engine Filter (WEF) to sort and reduce the answer set returned by a web engine, a Data Source Pre-processor (DSP) that processes html layout cues in order to collect and qualify page segments and an Heuristic-based Information Extraction System (HIES) to finally retrieve the required data. Furthermore, we present a Web Mining Environment Generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
1999
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4865946
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact