...
.....

OBJECTIVE

We plan to build a complete prototype environment for semantic indexing of multimedia information and to evaluate its utility on real collections. The semantic indexing relies on statistical clustering for concepts and categories. Interactive navigation using the semantic indexing enables information retrieval at a deeper level than previously possible for diverse large collections. We plan to develop the algorithms for concept spaces and category maps, then test their utility on engineering literature, on map images, and on medical literature. The Interspace thus created will enable interactive semantic interoperability across subject domains and media types.

APPROACH

Our research concentrates on building information analysis environments for distributed object repositories. In particular, the new infrastructure will support semantic retrieval and semantic mapping across object types and subject domains. This is a concrete approach to the Grand Challenge of Digital Library Research which is "interoperability at a deep semantic level, providing digital library users with a coherent view of heterogeneous autonomously managed resources" according to the Information Infrastructure Technology and Applications (IITA) committee.

The Interspace framework supports integrated interoperability of semantic services based on statistical algorithms for information management. Our suite of statistical algorithms will not only be complete (in the sense of mimic-ing all the standard manual classification techniques with automatic procedures) but also computationally feasible. We have done the foundational research identifying suitable algorithms for: concept spaces (co-occurrence matrices), category spaces (Kohonen maps), meta-data generation (Hopfield nets), and meta-map generation (spreading activation). The research in this proposal will evaluate each algorithm on a suitable large collection and evolve the implementation until adequate speed and reliability is reached.

We propose a set of major experiments to test the scalability of the Interspace prototype. Each experiment will test a part of the information infrastructure using the high-performance technology now which will be the everyday technology ten years hence. These information infrastructure experiments will be performed using the unique advanced computational infrastructure at NCSA (National Center for Supercomputing Applications), partially funded by ARPA ITO.

On the repository end, large collections will be used to simulate the world of a billion community repositories. For text objects (technical documents), we have already assembled 1000 repositories across all of science and engineering from 10,000,000 journal abstracts. For image objects (spatial maps), we are assembling 100 repositories across Southern California from 10,000 aerial photographs and satellite images. Experiments with large collections of both text and image will test whether concept spaces and category maps are effective for interactive retrieval across object types.

The integration provided by the environment will enable users to issue composite queries which require multiple correlations between multiple sources to answer. Examples of queries we expect that the Interspace prototype will be able to answer after the geography databases are incorporated include the following. "Find me a up-scale residential area in Santa Barbara county which was not flooded in 1994" "Find me
information about orchards along the Santa Cruz river in Arizona" "I plan to move my operation to Los Angeles county. Find me a site close to major highways but with a lot of green. It should have existing parking lots nearby and be close to city buildings. Hopefully, it will also be close to residential areas and schools."

BACKGROUND

The Net will be radically different ten years from now. There will be a billion repositories spread around the world. These repositories will contain the knowledge of communities ranging in size from a few co-located individuals to giant distributed organizations. Collaborative problem solving using the Net will be an everyday occurrence. There will be analysis environments enabling cross-correlation of information from a wide variety of sources across different subject domains. Collaboration will become the fundamental operation within the Net. There will be powerful facilities for dynamically grouping people and information. These advanced capabilities require a new information infrastructure supporting collaboration and user centered systems. This proposal is about bringing to fruition 10 years of research on developing such an infrastructure.

 

 

1. INNOVATIVE CLAIMS

2. DELIVERABLES

3. STATEMENT OF WORK

4. DESCRIPTION OF RESULTS,
PRODUCTS, TRANSFERABLE TECHNOLOGY,
AND TECHNOLOGY TRANSFER PATH

5. COLLABORATIONS

6. SCHEDULE AND MILESTONES

7. TECHNICAL RATIONALE, APPROACH, AND PLAN

8. COMPARISONS TO OTHER RESEARCH

9. KEY PERSONNEL

10. PREVIOUS ACCOMPLISHMENTS

11. BIBLIOGRAPHY

 

  ........