| ..... | ||
| ... | ||
| ..... |
The automatic generation of thesauri represents an area of growing importance in the field of computational science. Developed under the auspices of the federal NII Digital Library Initiative (DLI) at the Universities of Illinois and Arizona. Concept Space thesauri are based on a hybrid symbolic/numeric computation that determines relationships between concepts in a collection of source units. The resulting map between concepts is designated a Concept Space and is useful in the refinement of queries presented to the collection. Concept Spaces are used, for example, in interactive query sessions as part of the DLI testbed at the University of Illinois, Urbana-Champaign. Algorithms to perform iterative search refinement which incorporate the computation of Concept Spaces.
In the conceptspace generation process called by the domain manager, a ConceptInCS object is created for each ConceptInDomain objects that pass the domain threshold functions. Two threshold functions are used in this step: a sourceunit level threshold function and a collection level threshold function. The generation process uses a similarity function and only updates the new ConceptInCS objects. A detail discussion of the algorithm can be found in our technical report. The threshold function is needed to eliminate those ConceptInCS objects that are relatively unimportant in the conceptspace computation. We do this because it would be computationally infeasible to compute every ConceptInCS objects at this time. However, this may change in the future when hardware computation power is higher. The occurrence list of each of these ConceptInCS objects will be updated if the old one was saved or recreated if none exists. Since the cooccurrence list objects take up a large amount of space, there is an option to save it or to discard when the computation is done. After the cooccurrence is list computed, the similarity list is generated from it. This list represents the similarity matrix for each ConceptInCS object. These matrixes together form the conceptspace. (1) Chen, H. and Lynch, K. J., "Automatic Construction of Networks of Concepts Characterizing Document Databases", IEEE Transactions on Systems, Man and Cybernetics, 22(5) 885-902, Sept./Oct., 1992. (2) Schatz, B.R., and Chen, H., "Building Large-Scale Digital Libraries", IEEE Computer, Special Issue on Building Large-scale Digital Libraries, 29(5) 22-27, May 1996. (3) Schatz, B.R., Information Retrieval in Digital Libraries: Bringing Search to the Net, Science, 275(5298), 327-334, cover story and lead article, January 17, 1997. |
|
|
|
..... | ........ |