Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering

Loia, Vincenzo; Pedrycz, W.; Senatore, Sabrina

doi:10.1109/TFUZZ.2006.889970

The semantic vision of the Web involves the processing of data by automated tools as well as by people, where the association of meaning with content, facilitates the search, the interoperability and the composition of several services. The Semantic Web forms a new scenario, where advanced methods and techniques are developed for the description, the retrieval and ﬁltering of Web-based content. In the light of existing challenges and open issues concerning the actual cyberspace, this study proposes an approach for binding the “semantic” facet with the usual textual one, that together constitutes a typical web page, or speciﬁcally, a semantic web document. Through the use of unsupervised learning, we offer a new alternative of organizing web documents which emphasizes a direct separation between the syntactic and semantic facets of the web information. In this study, we discuss a collaborative proximity-based fuzzy clustering and show how this type of clustering is used to discover a structure of web information by a prudent reliance on the structures in the spaces of semantics and data. The method focuses on the reconciliation between the two separated facets of web information and a combination of results leading to a comprehensive data organization. The information arranged in this manner can provide an integral description of web resources, becoming in this manner an essential technique for the next generation of Web search engines.