...
.....

The few research systems that have tried to support correlation and collaboration across multiple information sources discovered that sharing of objects across subject domains is not sufficient to enable sharing of knowledge across community boundaries. These research systems were able to support syntactic interoperability that enables objects to be shared (linked, grouped) across physical boundaries (hardware, networks). But they discovered the real problem was semantic interoperability that enables objects to be correlated across logical boundaries (repository, community, subject) with different representations (terminology, media, modality).

The Worm Community System (WCS), developed by Schatz as the Principal Investigator of one of the flagship NSF National Collaboratory projects from 1989 to 1993, demonstrated this difficulty by supporting browsing and sharing across multiple sources of data and literature for a national community of molecular biologists. It was quite possible to support syntactic operability to search objects and make links across multiple sources distributed across wide-area networks and other physical boundaries -- this is why WCS was frequently cited as the national model of future science information systems [Science 1993]. But semantic relationships had to be hand-coded or use special-purpose heuristics to be correct since the base representation was raw objects. It was difficult to match, for example, the same gene across a literature abstract, gene description, physical map, sequence coding, lineage tree, developmental micrograph, and so on. The same semantic concept occurs in text, data, graphics, images with different representations.

The Worm Community System showed that analysis environments are possible on a large-scale across the Internet with specialized software and data manipulation. The Interspace environment is a generic version of WCS that provides generalized software and domain-independent knowledge manipulation, based on concept spaces. Our previous research has shown the technology to be effective for augmentation of semantic retrieval across subject domains on a small scale. The research proposed here will evaluate its effectiveness across many domains on a large scale, and the embedding of this technology into a complete analysis environment.

The style of multimedia browsing combined with distributed search is probably the main theme of Internet information services today. Its historical antecedents are in the research systems of the previous decade. A premier example is the Telesophy system. This was a research prototype, designed and built by the author while at Bellcore in the mid-1980s [Schatz, 1987]. Within the then-small community of Internet insiders, Telesophy was regarded as the forerunner of the future Net of world-wide information spaces [Schatz, 1989], which seemed far future at the end of the project in 1989, even though its mass realization proved to be only five years away!

Telesophy literally means "wisdom at a distance". The goal is to make manipulation of knowledge as easy and transparent as telephony made transmission of sound [Schatz, 1984]. The prototype was built (1985-1986) using the relatively new personal bit-mapped workstations, local area networks, and custom full-text search software [Schatz, 1989a]. It enabled many sources of information to be searched across the network then grouped into new units of knowledge. From the retrieval standpoint, multimedia multisource information retrieval was supported. That is, a user could issue a query from their workstation, this query would be propagated to all the sources on the network, and the results propagated back to the workstation for merged display. In the prototype, there were some 30 information sources with different search servers on different machines in the network, with a range spanning bibliographic databases, full-text databases, book catalogs, still images, line graphics, and video clips. For creating knowledge, edited versions of query results could be stored for later retrieval or links between items could be added on-the-fly.

The interaction style and system architecture pioneered in the Telesophy system are illustrative of that available in the Internet today. The interaction style combined both searching and browsing, combining the full-text search of the previous generation computer model with the interactive navigation now possible with the present generation. The system architecture supported transparent access to distributed sources, for federated search and for interactive navigation, with the prototype implementation providing careful optimization for fast response across wide-area networks.

The interaction style used search to retrieve a broad selection of relevant items, then browsing to navigate from retrieved items to new related ones. The analogy was to go to a section of a library which contained relevant materials related to each other, scan your eye along the titles on the spines of the books to locate desired ones, then open a few books to get pointers for which sections to search for next. The digital library, however, contained distributed multimedia documents so that the system provided transparent network information retrieval. Type transparency was supported by uniform commands on all objects, with location transparency supported by fast access on all sources. The search used a full-text matching similar to that of the existing on-line bibliographic systems. The distributed model, however, enabled the search to retrieve a selection of related items across all the information sources. Typically, the results were displayed as one-line summaries, which could be zoomed into to display the full object for selected items. Component-style type switching enabled different displayers to be invoked for different objects. Thus, different media types such as text, image, graphics, video, could all be displayed as appropriate for that type. The speed of the network and the workstation enabled broad queries to be issued, since several hundred items could be quickly fetched and displayed at the summary level.T he browsing was modeled on the few hypermedia systems which had been built by that time, notably NLS [Overhage, Harman 1965] and SDMS [Herot 1980]. All items in all sources were contained within a single logical information space, which consisted of interlinked information units. An information unit was an object with a standard set of headers, which contained structural information such as author and title, as well as link information to related objects. Thus any document could be linked to any other document, and these links followed on the fly.

Bruce Schatz has been an architect of large-scale network information systems for many years in a variety of settings for both research and development in both university and industry. In the early years, his concentration was on building commercial and research systems, with streamlined features that would be widely propagated. Later, he began to specialize in a unique niche, of building revolutionary systems with fundamentally new functionality, but with implementations that worked well enough to deploy on a prototype basis to real user populations.

In particular, the Telesophy system was perhaps the only research prototype of its era to actually do multimedia information retrieval transparently across wide-area networks. Because of this work, Schatz gave an invited talk in 1989 at the 20th Anniversary Symposium on the DARPANET on World-Wide Information Spaces, giving Telesophy as an example of what was to come. He also became the scientific advisor for information systems at NCSA then, and a number of attempts to reproduce Telesophy with different underlying infrastructure were made at NCSA. These finally resulted in 1993 in NCSA Mosaic, which showed the widespread feasibility of worldwide information spaces to millions of users.

The WCS (Worm Community System) was an attempt to build a practical Telesophy system and deploy it to a national community. The WCS project received one of the two large NSF National Collaboratory grants, with Schatz as PI, and was widely discussed in national forums (National Academy of Science studies and Science magazine news articles) as the example of future science information systems for remote retrieval and collaboration. WCS supported search and navigation across multiple distributed sources of both literature and data spread across the Internet. It also supported publication and sharing of any type, as part of a asynchronous collaboration system with sophisticated editorial and annotation facilities.

The Interspace project discussed in this proposal is the research arm of the NSF/ DARPA/NASA Digital Library Initiative (DLI) project at the University of Illinois, with Schatz as PI. It is attempting to make a concrete breakthrough in the Grand Challenge of Digital Libraries, semantic interoperability, by use of large-scale high-performance simulations and prototype software environments. The facilities in the Illinois DLI and the other DLI projects, plus the leverage from being the scientific advisor at NCSA for information systems, gives Schatz a unique perspective in the trends in information infrastructure and unique access to the high-performance technology for developing revolutionary advances.

Some of Schatz's systems where he served as lead architect include:

  • CS 600 (code named Gazelle): a mid-size PBX (telephone switch for a company with both voice and data services) sold by Bell Labs. 1979-1982.
  • PCS (Personal Communication System): electronic mail system propagated to several thousand sites inside Bell Labs. 1979-1983.
  • Telesophy ("knowledge at a distance"): multimedia information retrieval across networks (Bellcore). This pioneering research project showed the feasibility of what became the World-Wide Web 10 years later. 1984-1989.
  • WCS (Worm Community System): supporting all the knowledge (literature and database) of a community in biology and enabling retrieval and collaboration across the Internet (University of Arizona). done as PI of one of the NSF National Collaboratory projects. often cited in national forums as the model for future science information systems. 1989-1994.
  • DLI (Digital Library Initiative): retrieving structured documents across the Internet from a large-scale scientific collection (University of Illinois). bringing Search to the Net. doing as PI of one of the NSF/ DARPA/NASA DLI projects. A flagship project of the Federal NII effort. 1994-present.ï Interspace: semantic retrieval and information analysis for community repositories. the generation of systems beyond the current Internet. the long-term information systems research of the DLI project. 1994-present.

In student days, Schatz worked on optimizing compiler technology at Rice University (undergraduate under Ken Kennedy), at Carnegie-Mellon University (graduate under Bill Wulf), and at IBM Watson Research Center (summer job under Fran Allen). He also was a graduate student at the MIT Artificial Intelligence Lab in the mid-70s (computational vision under David Marr).

He spent 10 years in industrial R&D as a member of technical staff at Bell Labs, Bellcore, and IBM Research. Prior to his current position, he also spent 5 years at the University of Arizona as a faculty member in the Department of Management Information Systems, working on organizational memory and collaboration systems.

Hsinchun Chen received the Ph.D. degree in Information Systems from New York University in 1989. He is Associate Professor (with tenure) of Management Information Systems at the University of Arizona and director of the Artificial Intelligence Group. He received an NSF Research Initiation Award in 1992 and Hawaii International Conference on System Sciences (HICSS) Best Paper Award in 1994, and AT&T Foundation Award in Science and Engineering in 1994 and 1995.

He was awarded a major Digital Library Initiative grant by NSF/NASA/ DARPA for a joint project with the University of Illinois, 1994-1998, and an intelligent Internet categorization and search project from NSF/CISE, 1995-1998. The proposed research, which is grounded on automatic textual analysis of digital documents, attempts to create concept spaces to assist in cross-domain, semantic retrieval of textual and multimedia documents. Selected machine learning techniques and general AI-based search and classification methods have been adopted.

Chen is also the PI of several National Center for Supercomputing Applications (NCSA) High-performance Computing Resources Grants. Thinking Machine's CM-5 (512 nodes), SGI's Power Challenge Array (48 nodes), Cray's CS6400 (8 nodes), and Convex's Exemplar (64 nodes) have been used for large-scale data analysis. His recent work of parallel computing for large-scale digital library analysis was featured in NCSA Access High-performance Computing Magazine (Summer 1995) and Science journal ("Off-the-Shelf Chips Conquer The Heights of Computing," September 8, 1995).

Chen has been active in adopting "intelligent agents" in the design of collaborative systems. He was a co-PI of an NSF-funded ``National Collaboratory'' project (PI: B. Schatz), the Worm Community System, and has been the technical lead for an Army and AT&T funded next-generation electronic meeting system project (PI: J. Nunamaker), 1994-present. He has successfully developed and integrated selected machine learning based information categorization and visualization agents for information sharing and electronic meeting facilitation and idea convergence.

Chen has published more than 30 articles in major journals such as Communications of the ACM, IEEE COMPUTER, Journal of the American Society for Information Science, IEEE Transactions on Systems, Man, and Cybernetics, IEEE EXPERT, and Advances in Computers. He is the guest editor for IEEE Computer special issue on "Building Large-scale Digital Libraries" (May 1995) and Journal of the American Society for Information Science special issue on "Artificial Intelligence Techniques for Emerging Information Systems Applications" (1997).

 

 

INTERSPACE PROPOSAL MAIN PAGE

1. INNOVATIVE CLAIMS

2. DELIVERABLES

3. STATEMENT OF WORK

4. DESCRIPTION OF RESULTS,
PRODUCTS, TRANSFERABLE TECHNOLOGY,
AND TECHNOLOGY TRANSFER PATH

5. COLLABORATIONS

6. SCHEDULE AND MILESTONES

7. TECHNICAL RATIONALE, APPROACH, AND PLAN

8. COMPARISONS TO OTHER RESEARCH

9. KEY PERSONNEL

10. PREVIOUS ACCOMPLISHMENTS

11. BIBLIOGRAPHY

 

    ........