Abstract The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser-based search tools which in turn have led to an increased demand for indexing techniques and technologies. As the amount of globally accessible information grows, it is no longer cost-effective for quality indexes to be produced by professional indexers. The era of amateur indexers is thus upon us, and the information infrastructure needs to provide support for such indexing if search of the Net is to produce useful results.
In this paper, we propose the Concept Assigner, an automatic subject indexing system based on a variant of a Hopfield network [13]. In the application discussed herein, a collection of documents is used to automatically create a subset of a thesaurus termed a Concept Space [4]. A Concept Space is a semantic network consisting of concepts (noun phrases in the textual domain) and related concepts, and is computed based on co-occurrence relationships. These weighted relationships serve as the Hopfield network of concepts (nodes) and their associations (weights).
To automatically index an individual document, concepts extracted from the document become the input pattern to the network. After completion of the Hopfield net parallel spreading activation process, the output from the network produces another set of concepts that are strongly related to the concepts of the input document. Since the initial Concept Space contains knowledge obtained from the entire collection, the system is able to find a set of global concepts drawn from the entire collection without restriction. These concepts are analogous to concept descriptors (i.e., keywords) of the document. The system can be further used as a computer-aided subject indexing tool in an interactive environment in which system-suggested concepts can be reviewed by a human indexer to generate the index. The system can thus assist an indexer both by speeding the indexing process and improving vocabulary consistency.
A prototype of our automatic subject indexing system has been implemented as part of the Interspace, a semantic indexing and retrieval environment which supports data persistence and remote execution across the Internet.
![]()