......
.....

Bruce R. Schatz
CANIS -- Community Architectures for Network Information Systems
National Center for Supercomputing Applications (NCSA)
University of Illinois at Urbana-Champaign, Champaign, IL 61820 USA
schatz@uiuc.edu,
http://www.canis.uiuc.edu

Abstract

The Net of the 21st Century will radically transform interaction with knowledge. Users will navigate in the Interspace, across logical spaces of semantic indexes, rather than in the Internet, across physical networks of computer servers. Correlation across indexed collections is the most important feature of this infrastructure. Over ten years of research, the author has developed scalable technology for generating the necessary semantic indexes.

Construction of large-scale models of the Interspace is feasible now under controlled laboratory conditions. Community repositories for entire scientific disciplines have been constructed using supercomputer simulations on millions of documents. A model Interspace is a set of community repositories, interconnected by concept switching networks to support information analysis across subject domains.

CANIS has constructed several model testbeds with increasingly better infrastructure technology. We propose a PACI Interspace for the NSF flagship efforts of the HPDC community. The Interspace would provide concept switching for the users while the Grid would provide object switching for the sources.

Information Analysis in the Interspace

The Net of the 21st Century will radically transform the interaction with knowledge. Traditionally, online information has been dominated by data centers with large collections indexed by trained professionals. The rise of the Web and the information infrastructure of distributed personal computing have rapidly developed the technologies of collections for independent communities. In the future, online information will be dominated by small collections maintained and indexed by the community members themselves.

The information infrastructure must accordingly be radically different to support indexing of community collections and searching across such small collections. The base infrastructure will be knowledge networks rather than data networks. Users will consider themselves to be navigating in the Interspace, across logical spaces of semantic indexes, rather than in the Internet, across physical networks of computer servers [17].

Future knowledge networks will rely on scalable semantics, on indexing the small collections so that they can effectively searched within the Net of a billion repositories. The most important feature of the infrastructure is providing functionality of correlating across the indexed collections. Just as the data networks of the Internet are connected via switching machines that switch packets, the knowledge networks of the Interspace will be connected via switching machines that switch concepts.

Building the Interspace requires generating semantic indexes for community repositories with interactive support adequate for amateur classifiers, then correlating these indexes across multiple sources with interactive support adequate for amateur navigators. Since there will be so many sources indexed by non-professional indexers, the infrastructure itself must provide substantial support for semantic indexing. Since the sources in the Net will be dominated by small community repositories, the typical interaction will be navigating through many sources ­ retrieving and correlating objects relevant to the particular session. The infrastructure itself must accordingly provide substantial support for information analysis.

The protocols of the Net have progressed towards more sophisticated information retrieval in digital libraries [17]. As Figure 1 below shows, the ARPANET concentrated on access, transparent fetch to provide distributed files. The Internet is moving towards providing organization, transparent search to provide distributed repositories. The core Internet protocols are still dominated by concerns of fetching, so that even the Web protocols for distributed hyperdocuments are still largely transparent fetching, although multimedia documents rather than file packets. The rapid replacement of web browsers by web searchers shows, however, that there will be rapid evolution in the Internet towards organization, with repositories (organized collections) replacing objects (unorganized files). Thus, digital libraries will increasingly become part of the core information infrastructure [15].

The Interspace will be the first generation of the Net to support analysis, to directly enable problem solving. The Internet supports search of objects, e.g. matching phrases within documents. The Interspace, in contrast, supports correlation of concepts, e.g. comparing related terms in one repository to related terms in another repository. That is, there will be a quantum jump in functionality across the waves, from syntactic search to semantic search. Users will navigate within spaces of concepts, to identify relevant materials before they navigate within networks of objects as at present.

The Third Wave of Net Evolution

The meta-indexes, which contain different semantic representations of the actual objects, will dominate the Net in the coming era of the Interspace. During ten years of research, the author and colleagues have developed scalable technology for generating semantic indexes for concept spaces and category maps. Concept spaces are automatically generated indexes that support interactive term suggestion, to boost document search. Category maps are automatic indexes that support document clustering, to guide repository navigation.

Construction of large-scale models of the Interspace is possible today under controlled laboratory conditions using high-performance distributed computation. Each component of the new wave systems has unique new functionality. The section below on New Servers describes on-going generation of semantic indexes on supercomputers for collections covering entire disciplines with tens of millions of objects. The section on New Clients describes on-going development of interactive interfaces for 3D graphics workstations providing information spaceflight across multiple levels of abstraction. The section on New Systems describes coupling the new clients with the new servers by constructing concept switching networks.

A model Interspace is thus a set of community repositories, which are interconnected by a concept switching network to support information analysis across subject domains. The section on Building the Interspace discusses the construction of community repositories and their interconnection to form the Interspace using the concept switching network.

There is a natural mapping of these new system functionalities onto the research facilities of the premier NSF HPDC efforts, the PACI program (Partners in Advanced Computational Infrastructure). The new servers are computed on Origin supercomputers while the new clients are run on CAVEs. The NPACI consortium is built around digital library infrastructure, while the NCSA consortium is built around high-performance network infrastructure. Both consortiums are partnerships of a broad spectrum of scientist users and engineer developers.

It will be possible to build the PACI Interspace on the PACI Grid. The Interspace provides concept switching for the users (scientists, engineers) while the Grid provides object switching for the sources (simulations, databases). The last section the stages required to construct a PACI Interspace, for consideration within this community.

New Servers: Semantic Indexes via Supercomputer Simulations

The Interspace is a vision of the Future Net where each community maintains its own repository of its own knowledge. For the amateur classifiers to be comparable to today's professionals, the information infrastructure must provide substantial support for semantic indexing and semantic retrieval.

The focus of the Interspace is on statistical technologies for semantic indexing that work generically across all subject domains [16, 17]. Analogues of concepts and categories are automatically generated. Concept spaces can be used to boost search by interactively suggesting alternative terms [1, 14]. Category maps can be used to boost navigation by interactively navigating clusters of related documents [4, 11]. Collectively we refer to these techniques as semantic indexing.

The scalable semantics algorithms rely on statistical techniques, which correlate the context of phrases within the documents. Over the past several years, we have used NCSA supercomputers to compute progressively larger collections, until the scale of entire disciplines, such as engineering or medicine, has been reached. We use supercomputers as time-machines to simulate the world of a billion repositories, by partitioning a large existing collection into discipline sub-collections, which are the equivalent of community repositories.

Concept spaces were generated in 1995 for 400K abstracts from INSPEC (electrical engineering and computer science) and in 1996 for 4M abstracts from COMPDENDEX (all of engineering, some 38 broad subjects). The first computation took 1 day of supercomputer time [2] and the second took 10 days of high-end time on the HP Convex Exemplar [17]. The second computation provided a comprehensive simulation of community repositories, for 1000 collections across all of engineering, generated by partitioning the abstracts along the subject classification hierarchy [5]. (See examples below.) In 1998, we are going back to our roots in biomedicine and computing 1000 collections comprising some 10M abstracts from MEDLINE, taking approximately 1 month of dedicated time on the SGI 128-node Origin 2000. For example, we were the largest user on the largest Alliance machine the month of the first Alliance Partners meeting.

Concept Spaces are collections of abstract concepts, which are generated from concrete objects. Traditionally, the objects have been text documents and the concepts all canonical noun phrases occurring within the documents within a collection. Figure 2 depicts an example of the use of a concept space for engineering literature [17, 18]. The upper window displays abstract indexes for categories and concepts, while the lower window displays concrete indexes for document collections. The pane in the upper left of the figure shows an integrated list of abstract indexes over the INSPEC, COMPENDEX and Patterns collections. INSPEC and COMPENDEX are standard commercial bibliographic databases indexing collections of the same names, and Patterns is a Software Engineering community repository.

Figure 2. Concept Switching across Subject Domains using Concept Spaces
The upper left-hand pane is a snapshot of a query session in Software Engineering incorporating all three indexes: Computers and Data Processing from the COMPENDEX index, Software engineering techniques from the INSPEC index, and Design Patterns from the Patterns index. The two panes to the right of the upper window show portions of an automatically generated concept space for the INSPEC categories "Software Engineering Techniques" and "Object Oriented Programming". The concept space allows the user to interactively refine a search by selecting concepts, which have been automatically generated and presented to the user.

For example, below the user has specified "complex object" and the system returned a list of related concepts such as "configuration management". If the concept space is further navigated (not shown), additional related concepts such as "revision control system" can be found. Such concepts may be used in conducting a full-text search as portrayed in the "Full Text Search" pane below the concept space display. This allows the user to descend to the level of actual objects in a collection at any time. In the lower left pane below, the user has performed a full-text search on the concept space term "revision control system" and the system has identified several abstracts containing this term with one selected by user displayed in the lower right pane.

The Interspace consists of multiple spaces at the category, the concept, and the object levels. Within the course of an interaction session, a user will move across different spaces at different levels of abstraction and across different subject domains. For example, the system enables users to locate desired terms in the concept space by starting from broad terms then traversing into narrow terms specific to that document collection. They can then move across into document space to perform full-text search by dragging the concept term into the document space search window. This sequence is shown for "revision control system".

Finally, to search a subject domain they are less familiar with, users can begin within the concept space for a familiar subject domain, then choose another concept space for the unfamiliar domain and navigate across spaces based on common terms. This has been depicted as follows: first, the user has identified "complex object" as a desirable search term, and refined the search by locating the related term "revision control system". Next, the user has determined to pursue the object-oriented theme in greater detail, and wishes to switch from the "Software Engineering Techniques" subject domain into the "Object Oriented Programming" domain.

The result of this concept switching is depicted in Figure 2, where the portion of the corresponding concept space for "Object Oriented Programming" has been displayed in the upper right pane. The user can now deepen the search by browsing related terms of "complex object" in the "Object Oriented Programming" subject domain. Such a fluid flow across levels and subjects supports semantic interoperability. This form of interactive concept switching by space navigation is a key reason for naming the system the Interspace.

New Clients: Information Spaceflight on Visual Supercomputers

A Category Map is a dynamic classification of a collection of objects. Whereas Concept Spaces utilize co-occurrence frequency of concepts within objects, Category Maps utilize co-occurrence frequency of objects within repositories. For text collections, category maps are clusters of documents with similar terms that are the automatically generated analogue of the subject classification hierarchies occurring, for example, in the upper left pane of Figure 2.

While concept spaces are used primarily for boosting search, category maps are used primarily for boosting navigation. The user interactively examines a visualization of the map, which lays out the major category clusters, then zooms into a selected cluster where concept spaces can be used to search for specific documents of interest. Standard visualizations of category maps are based upon Kohonen maps, where each region of a 2D rectangular grid contains some of the statistical clusters.

We are also experimenting with 3D visualizations, where artificial landscapes of hills and valleys represent the semantic space [8]. Higher-altitude hills surrounding lower-altitude valleys form barriers between categories, and give an indication of the distance in N-space separating clusters of documents. As the user descends into the terrain of an automatically generated category map, clusters of documents in valleys of semantic locality can be navigated and perused. Such an interactive information retrieval session is called Information Spaceflight.


Figure 3. Information Spaceflight for categorization

Figure 3 depicts a 3D visualization of a Category Map for a real collection consisting of thousands of abstracts in biological science from BIOSIS. The view is from a fly-through position navigating a close-up of the semantic space of a portion of the collection. The view represents the highest map level, which can be zoomed into to examine finer-grained categories. The region marked "adolescent" occurs in a "valley" of the semantic space, and represents a cluster of documents, which deal with the concept of adolescence. Near-by regions are on related concepts, such as pre-adolescence or middle-age.


Within information retrieval systems, the user needs to be able to dynamically generate and navigate the semantic space inherent in a collection and to interactively select the topics of greatest interest. We envision this as interactive, dynamic information spaceflight through semantic space. For example, as the user descends into the terrain of an automatically generated Category Map, clusters of documents in valleys of semantic locality can be navigated and perused. Once a relevant category is encountered, the user descends into semantic space to browse the category. Upon entry to the category, a new Category Map and Concept Space are dynamically generated based on the portion of the collection represented by the category. Alternatively, the user can mark several categories of interest, then descend into a conjunction of categories with Category Map and Concept Space indexes computed dynamically for each such new collection that is specified by the marking session.

Dynamic indexing is possible today on machines large compared to the collections. For collections the scale of those in scientific communities (tens of thousands), real-time semantic indexing is possible on high-end supercomputers. That is, it will be possible to compute on-the-fly concept spaces and category maps for selected subsets of collections numbering in the thousands of documents. The subsets would be chosen by interactive selection in a graphical user interface, such as the artificial landscapes we have been generating from category maps for information spaceflight. Dynamic indexing leads to a paradigm shift for interaction with knowledge networks, where the user is reorganizing the knowledge of the world during the course of an interactive session.

A true spaceflight interface would move through progressive levels of abstraction: an airplane fly-over of the gross characteristics to a helicopter hover of the major regions to an automobile drive-through over the hills and valleys to a personal walk-through of the detailed landscape, before diving into the concepts and objects at a detailed interactive level. We will be experimenting with these complete interactions in the virtual reality surround environments in the CAVE, with the hope that the immersiveness of the surrounds will make up for the inaccuracies of the clusters.

New Systems: Concept Switching in Spaces

The development of the Interspace Prototype involves research to develop scalable technology for correlating semantic indexes across information sources. Scalable implies that the correlating works automatically, on collections across different subject domains, on collections at different levels of abstraction. The correlation across sources will form the fundamental technology for the switching machines of the Interspace. Concept switching across knowledge networks is the fundamental operation in navigating the billion repositories of the global Interspace.

The fundamental infrastructure of the Interspace will thus be technologies for correlations across repositories, by switching concepts across spaces. The switching machines of the global knowledge network will switch concepts instead of packets (as with the current global data network). As discussed in the previous sections, the Interspace provides abstractions of the objects in the collections, which are used to provide semantic indexes for searching the collections. These indexes, called concept spaces and category maps, thus provide the available scalable semantics for switching across collections.

The Interspace supports knowledge networking by switching across repositories, by intersection of concepts and categories from the indexes and collections. We have developed generic syntactic techniques for switching [3], which involve term matching across spaces as illustrated in Figure 2. We are beginning to develop generic semantic techniques for switching, which involve conceptual clusters across spaces.

Vocabulary switching describes the problem within information science of semantic interoperability across subject domains. A user wishes to specify objects (phrases within documents) using their vocabulary but search a repository (documents within collections) of another subject with another vocabulary. The different domains contain similar concepts with different terminology. A system for vocabulary switching would automatically translate terms across domains. This is necessary to enable scientists to find information effectively outside their own specialties.

Concept switching is the generalization of vocabulary switching, to objects beyond terms, to types beyond text. Concept switching in the Interspace is the concrete realization of what the "official" report on the Digital Library Research Agenda called the Grand Challenge of Digital Libraries, semantic interoperability [9].

Concept spaces have been shown to be effective for interactive term suggestion in several subject domains, where full-text search can be boosted by enabling the user to interactively select terms to be searched for from an automatically generated co-occurrence list. In molecular biology, concept spaces were often helpful for vocabulary switching as well by simply doing syntactic mapping of terms across spaces, e.g. for a fly biologist to search the worm community space [3]. To perform more general concept switching, however, we discovered that more sophisticated techniques were necessary to map terms across spaces which were not syntactically but were semantically equivalent (different strings but similar meaning).

We are investigating techniques from various disciplines to support semantic switching of concepts across spaces. The key issue is to try to map terms or documents into those semantically related in some manner effective with interactive correlation. The different disciplines have different emphases for investigating concept switching. They vary in spectrums on degree of knowledge about the subjects and degree of interaction with the users.

Library Science concentrates on expert human indexer assignments of terms across subject domains by term-term mapping across subject thesauri. Computer Science concentrates on expert machine indexer assignments of documents across subject domains by document-document mapping across subject axes. Information Science concentrates on automatic document-document mapping by computing feature vectors based on term frequency. Computational Science concentrates on automatic dynamic indexing to recompute the document mapping based on interactive term navigations.

As a more detailed example, Information Science concentrates on automatically computing semantic similarity between documents, such as automatic classification and categorization. The similarity computations are statistical in nature, focusing on the contextual redundancy. For example, concept spaces compute the co-occurrence frequency of noun phrases. Thus they are generic across subject domains, requiring little or no custom tuning from subject to subject. The deficiency with these generic classifications is that the level of semantics is lower (more superficial) than the manual (library science) mappings or the manually-boosted (computer science) parsings.

Feature vectors of documents would be an appropriate level of semantics for concept switching. Typically, the feature vectors have no obvious semantic validity. For example, category maps have a vector for each document with several hundred features, which are numeric values of abstract dimensions. We have developed concept assignment technology, which approximates trained indexing professionals by automatically assigning semantically related terms to documents [6, 7]. A variant of this technology can assign multiple-concept labels to documents, where the labels are related terms classifying that document. We can refine the quality of the labels assigned in most scientific disciplines by using the existing controlled vocabulary terms assigned by human indexers. These label sets can then be intersected to locate similar documents and provide concept switching.

Building the Interspace: Model Testbeds for Community Repositories

Constructing an Interspace requires constructing an appropriate set of community repositories, indexing them to provide concept spaces and category maps, then interconnecting these spaces with appropriate concept switching. Such semantic interoperability is a general fundamental technology for knowledge networks, which will enable members of one community to effectively search repositories of other communities.

Over the years, we have built a range of model testbeds with increasingly better infrastructure technology. These are described in more detail at http://www.canis.uiuc.edu .

The Worm Community System (WCS) from 1989 to 1993 [13, 19], manually constructed repositories for a community of molecular biologists, as part of a flagship project for the NSF National Collaboratory program. These included formal and informal literature and databases, contained within a tightly interconnected space. An interactive system enabled users to browse the repositories by searching across sources or navigating across links, using custom software over the Internet in the pre-Web period.

The Digital Libraries Initiative (DLI) project from 1994 to 1998 [16, 18] automatically constructed semantic indexes for a community of engineering faculty and students. The supercomputer computations provided large simulations with somewhat artificial partitions between the community boundaries. We are now doing the first major project to develop and deploy an Interspace, in the subject domain of clinical medicine with medical literature for practicing doctors.

To facilitate the development of the Interspace, we have established CANIS (Community Architectures for Network Information Systems), a new laboratory at the University of Illinois at Urbana-Champaign, for which the author serves as founding Director. The goal of CANIS is to develop new architectures and new environments for the Net, then deploy these experimentally in diverse applications relevant to the public good. New algorithms, new systems, new indexes, new interfaces, tested on real sources and real users.

The current funding for CANIS includes a flagship grant for 1997 to 2000 in the DARPA Information Management Program specifically to develop an Interspace Prototype, with software infrastructure adequate to enable members of the DARPA community to create their own repositories and connect these into a larger Interspace. The existence of such technology leads to the possibility of constructing an Interspace for the High-Performance Distributed Computing community in whose conference this paper appears.

Building the Interspace on the Grid: The PACI Community as a Model Testbed

There is a natural mapping of these new system functionalities onto the research facilities of the premier NSF HPDC efforts, the PACI (Partners in Advanced Computational Infrastructure) program. The PACI program has two leading edge sites, each with many scientific and computational partners. The NPACI (National Partnership for Advanced Computational Infrastructure) is based at the SDSC (San Diego Supercomputer Center) while the NCSA (National Computational Science Alliance) is based at the NCSA (National Center for Supercomputing Applications. NPACI is concentrating on developing digital library infrastructure while NCSA is concentrating on developing the Grid high-performance network.

The author is the NCSA Partner for Digital Libraries and collaborates closely with many of the Digital Library partners in NPACI, through his participation in the NSF/DARPA/NASA Digital Libraries Initiative. In some sense, the Interspace is the new level of network information system possible within the new level of PACI information infrastructure. The new servers run on Origin supercomputers, while the new clients run on CAVE displays. The Interspace provides concept switching for the users (scientists, engineers) under which the Grid provides object switching for the sources (simulations, databases).

Building the Interspace on the Grid would involve building a model testbed on the experimental network for the PACI scientist users and engineer developers. Each partner site would index their own repositories of their own knowledge. The leading edge sites would provide central software and coordination.

The scale of this project is not much different from those, which have already been carried out by CANIS. The Worm Community System had 50 sites at major university laboratories. The Digital Libraries Initiative project had 1000 users at the University of Illinois and elsewhere. The Interspace Prototype is processing 100M objects to handle all literature in medicine.

The heterogeneity of the PACI Interspace is significantly harder, however. The range of disciplines in PACI is much broader than engineering or medicine. They include activities across all of science and engineering plus activities across education and industry. This places great stress on the "generic" concept switching techniques still under development.

The objects placed into the repositories are much broader than literature abstracts. They include images, where generic parsing techniques are poorly established, and simulations, where essentially nothing is known about automatic categorization. They must be interacted with in a wide variety of environments from laptops with flat displays over telephone lines to supercomputers with immersive displays over gigabit networks.

The most sensible staged approach to Building the PACI Interspace is to focus on text documents, which contain the bulk of the materials and which are automatically indexable. We have extensive experience with indexing large collections of scientific literature, particularly the comprehensive bibliographic databases of abstracts. We have also shown for some informal documents, such as web pages, that automatic categorization is as effective as manual categorization for navigation purposes. (For example, answering questions using entertainment home pages using category maps versus using Yahoo classifications [4].)

Text descriptions of non-text materials can be indexed as metadata and the actual materials then interacted with subsequently. An image, for example, could be correlated using its metadata but not generically content searched. The community members of the PACI community are early adopters who are willing to spend significant amounts of time classifying their materials, particularly if interactive automatic techniques for suggesting classifications are available. (Such as the concept assigners in the Interspace Prototype [7], which suggest consistent indexing terms from the actual collections.)

It may also be possible to develop generic parsing for some types of non-textual materials. For example, it appears possible to automatically parse aerial photographs using texture extraction, an important datatype for environmental GIS applications, for use in semantic indexing. This technology was developed as a collaboration between the University of Illinois DLI project associated with NCSA and the University of California Santa Barbara DLI project associated with NPACI [12, 10, 18], and is being integrated into the Interspace Prototype as part of the on-going CANIS DARPA project.

Thus the general outline of the stages to build the Interspace for the Grid (the PACI Interspace) are as follows. These sources would be semantically indexed at the individual partner sites, who then would also run the analysis environment software to interactively correlate across sources via concept switching.

  • index the formal literature in relevant areas (scientific literature via abstract databases).
  • index the informal literature from specific laboratories (internal documents on web sites).
  • index the pictures (images and videos) by text descriptions (metadata on displayed objects) or type-specific parsers when available.
  • index the programs (databases and simulations) by text descriptions (similar to pictures) and execute them separately after retrieval using distributed object invocation.

All these activities together could, within a several year period, provide a unique facility to the PACI community that could enable team members to communicate across their differing disciplines at a completely new level. Building the Interspace for the Grid would be a large-scale realization of Concept Switching across Community Repositories and enable the High-Performance Distributed Computing community to once again prototype information infrastructure which will become the everyday Net of the 21st Century

Acknowledgements

The author wishes to thank the members of the Interspace Prototype team at CANIS, particularly Kevin Powell and William Pottenger, as well as the NCSA Alliance Partner for Information Science, Hsinchun Chen at the University of Arizona. This research was supported by the NSF/DARPA/NASA Digital Libraries Initiative under cooperative agreement IRI-94-11318COOP and by the DARPA Information Management program under contract N660001-97-C-855. NCSA kindly provided special allocations for the indexing simulations during debugging periods of the HP Convex Exemplar and the SGI Origin 2000.

References

1. Chen, H., B. Schatz, T. Yim, D. Fye (1995) Automatic Thesaurus Construction for an Electronic Community System, Journal American Society Information Science 46(3):175-193.

2. Chen, H., B. Schatz, D. Ng, J. Martinez, A. Kirchhoff, C. Lin (1996) A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Project, IEEE Transactions Pattern Analysis Machine Intelligence, 18(8):771-772, August.

3. Chen, H., J. Martinez, D. Ng, B. Schatz (1997) A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System, Journal American Society Information Science 48(1): 17-31, January.

4. Chen, H., A. Houston, R. Sewell, B. Schatz (1998a) Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques, Journal American Society Information Science 49(7):582-603.

5. Chen, H., J. Martinez, A. Kirchhoff, D. Ng, B. Schatz (1998b) Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-occurrence Analysis, and Parallel Computing. Journal American Society Information Science.

6. Chen, H., Y. Chung, A. Houston, P. Li, B. Schatz (1999) Using Neural Networks for Vocabulary Switching. Special Issue on Intelligent Information Retrieval, IEEE Expert, to appear.

7. Chung, Y., W. M. Pottenger, B. R. Schatz (1998) Automatic Subject Indexing Using an Associative Neural Network, 3rd Int. ACM Conference on Digital Libraries, Pittsburgh, June.

8. Interspace Prototype. Community Architectures for Network Information Systems (CANIS), GSLIS/NCSA University of Illinois at Urbana-Champaign, http://www.canis.uiuc.edu/interspace

9. Lynch, C., Garcia-Molina, H. (eds) (1995) Interoperability, Scaling, and the Digital Libraries Research Agenda: A Report on the IITA Digital Libraries Workshop, August.

10. Ma, W. Y. and B. S. Manjunath (1998) A Texture Thesaurus for Browsing Large Aerial Photographs, Journal American Society Information Science, 49(7): 633-648.

11. Orwig, R. H. Chen, J. F. Nunamaker (1995) A graphical, self-organizing approach to classifying electronic meeting output, Journal American Society Information Science, 48(2):157-170, February.

12. Ramsey, M., H. Chen, B. Zhu, B. Schatz (1998) A Collection of Visual Thesauri for Browsing Large Collections of Geographic Images, Journal American Society Information Science.

13. Schatz, B. R. (1991) Building an Electronic Community System. Journal Management Information Systems 8: 87-107 (Winter 1991-92).

14. Schatz, B., E. Johnson, P. Cochrane, H. Chen (1996a) Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-occurrence Lists for Information Retrieval. 1st Int. ACM Conference on Digital Libraries, Bethesda, MD, March, 126-133.

15. Schatz, B. R. and H. Chen (1996b) Building Large-Scale Digital Libraries, IEEE Computer, 29(5):22-26, May. Introduction to special issue on DLI.

16. Schatz, B. R., W. H. Mischo, T. W. Cole, J. B. Hardin, A. P. Bishop, H. Chen (1996c) Federating Diverse Collections of Scientific Literature, IEEE Computer, 29(5):28-36, May.

17. Schatz, B. R. (1997) Information Retrieval in Digital Libraries: Bringing Search to the Net. Science, 275(5298):327-334, January.

18. Schatz, B. R., W. H. Mischo, T. W. Cole, A. P. Bishop, E. H. Johnson, D. T. Ng, H. Chen (1999) Federating Repositories of Scientific Literature: The University of Illinois Digital Libraries Initiative (DLI) Project, IEEE Computer, February. special issue on Digital Libraries.

19. Shoman, L., E. Grossman, K. Powell, C. Jamison, B. Schatz (1995) The Worm Community System (WCS), in H. Epstein & D. Shakes (eds), C. elegans: Modern Biological Analysis of an Organism, Methods in Cell Biology, vol 48, chap 26, pp 607-625.

...
..... ...
...