| ... | |||
| ..... |
We propose to develop a generation-after-next information analysis environment. Based on our prior research, we have developed a technology strategy and a realistic implementation plan that will permit us to accomplish this within three years. Our approach is necessarily anticipatory to the next two generations of software, hardware, and network developments. First, we plan to leap-frog the coming software consolidation phase of distributed objects -- the next generation. We will do this by developing our analysis environment, the Interspace, on a commercial software suite that approximates the resulting Internet-wide distributed object operating system -- the generation after next. Second, the foundation of our analysis environment will be a "semantic middleware" of scalable distributed services. These semantic services consist of a hierarchy of statistical algorithms that we claim are valid across the range of object types and subject domains. We have established collaborations with several major information providers to give us the text and image collections needed to develop realistic large collections to demonstrate a working prototype of our analysis environment. Finally, we recognize these algorithms pose significant performance problems on current desktop hardware when coupled with storage requirements for meaningful community repositories. Our approach is to use NCSA-unique computing, network, and visualization resources as a "time-machine" to simulate a generation-after-next hardware environment. We have done the preliminary research to make building a scalable analysis environment technically feasible and fiscally cost-effective. In past work on collaboration infrastructure in a flagship National Collaboratory project and on digital libraries in a flagship National Information Infrastructure project (the Digital Library Initiative partially funded by DARPA ITO), we have developed the fundamental technology for scalable semantics. Scalable semantics is a general term for algorithmic techniques that support information management of concepts for large collections of different types. The most promising techniques are statistical clustering algorithms from information retrieval and image processing, which compute similarity of contexts across different items in different collections. Using supercomputers, we have built increasingly larger concept spaces, with a current prototype encompassing 1000 community repositories from 10,000,000 journal abstracts across all of engineering. Our style is to build large-scale research prototypes of network information systems with real collections for real users. The reality of the prototypes enables us to occupy a unique niche for technology transfer between basic techniques and commercial applications. For example, our Telesophy System (1984-1989) inspired NCSA Mosaic and the Web, and the Worm Community System (WCS) (1990-1994) is inspiring collaboration infrastructure in the Web. We build research prototypes rather than feasibility demonstrations. For example, our algorithms for concept spaces went through five years of evolution and testing before it was possible to generate statistical indexes for collections of millions of items. Our multiple year efforts typically involve multiple students for algorithm development then multiple programmers for systems development. The scale of our efforts and the scalability of the results distinguishes us from more traditional academic computing research. We propose to take our statistical techniques for concepts and categories, and evaluate their utility across object types and subject domains. A complete analysis environment prototype will be constructed that supports search of categories and concepts and objects using indexes from automatic and semi-automatic and manual techniques. This software will be used to construct a prototype Interspace with large collections across text (abstracts in engineering) and image (maps in geography), then across subjects (medical thesaurus relationships) and classifications (personal document webs). This will produce a large-scale prototype of abstract spaces across many dimensions with system support for easy federation across many indexes. It will then be possible for the developers and colleagues to live experimentally with information infrastructure that routinely supports correlation and analysis.
|
4.
DESCRIPTION OF RESULTS, 7. TECHNICAL RATIONALE, APPROACH, AND PLAN 8. COMPARISONS TO OTHER RESEARCH
|
|
|
|
........ | ||