| ..... |
OBJECTIVE
We plan to build a complete prototype environment for semantic indexing
of multimedia information and to evaluate its utility on real collections.
The semantic indexing relies on statistical clustering for concepts and
categories. Interactive navigation using the semantic indexing enables
information retrieval at a deeper level than previously possible for diverse
large collections. We plan to develop the algorithms for concept spaces
and category maps, then test their utility on engineering literature,
on map images, and on medical literature. The Interspace thus created
will enable interactive semantic interoperability across subject domains
and media types.
APPROACH
Our research concentrates on building information analysis environments
for distributed object repositories. In particular, the new infrastructure
will support semantic retrieval and semantic mapping across object types
and subject domains. This is a concrete approach to the Grand Challenge
of Digital Library Research which is "interoperability at a deep
semantic level, providing digital library users with a coherent view of
heterogeneous autonomously managed resources" according to the Information
Infrastructure Technology and Applications (IITA) committee.
The Interspace framework supports integrated interoperability of semantic
services based on statistical algorithms for information management. Our
suite of statistical algorithms will not only be complete (in the sense
of mimic-ing all the standard manual classification techniques with automatic
procedures) but also computationally feasible. We have done the foundational
research identifying suitable algorithms for: concept spaces (co-occurrence
matrices), category spaces (Kohonen maps), meta-data generation (Hopfield
nets), and meta-map generation (spreading activation). The research in
this proposal will evaluate each algorithm on a suitable large collection
and evolve the implementation until adequate speed and reliability is
reached.
We propose a set of major experiments to test the scalability of the Interspace
prototype. Each experiment will test a part of the information infrastructure
using the high-performance technology now which will be the everyday technology
ten years hence. These information infrastructure experiments will be
performed using the unique advanced computational infrastructure at NCSA
(National Center for Supercomputing Applications), partially funded by
ARPA ITO.
On the repository end, large collections will be used to simulate the
world of a billion community repositories. For text objects (technical
documents), we have already assembled 1000 repositories across all of
science and engineering from 10,000,000 journal abstracts. For image objects
(spatial maps), we are assembling 100 repositories across Southern California
from 10,000 aerial photographs and satellite images. Experiments with
large collections of both text and image will test whether concept spaces
and category maps are effective for interactive retrieval across object
types.
The integration provided by the environment will enable users to issue
composite queries which require multiple correlations between multiple
sources to answer. Examples of queries we expect that the Interspace prototype
will be able to answer after the geography databases are incorporated
include the following. "Find me a up-scale residential area in Santa
Barbara county which was not flooded in 1994" "Find me
information about orchards along the Santa Cruz river in Arizona"
"I plan to move my operation to Los Angeles county. Find me a site
close to major highways but with a lot of green. It should have existing
parking lots nearby and be close to city buildings. Hopefully, it will
also be close to residential areas and schools."
BACKGROUND
The Net will be radically different ten years from now. There will be
a billion repositories spread around the world. These repositories will
contain the knowledge of communities ranging in size from a few co-located
individuals to giant distributed organizations. Collaborative problem
solving using the Net will be an everyday occurrence. There will be analysis
environments enabling cross-correlation of information from a wide variety
of sources across different subject domains. Collaboration will become
the fundamental operation within the Net. There will be powerful facilities
for dynamically grouping people and information. These advanced capabilities
require a new information infrastructure supporting collaboration and
user centered systems. This proposal is about bringing to fruition 10
years of research on developing such an infrastructure.
|
3.
STATEMENT OF WORK
4.
DESCRIPTION OF RESULTS,
PRODUCTS, TRANSFERABLE TECHNOLOGY,
AND TECHNOLOGY TRANSFER PATH
5.
COLLABORATIONS
6.
SCHEDULE AND MILESTONES
7.
TECHNICAL RATIONALE, APPROACH, AND PLAN
8.
COMPARISONS TO OTHER RESEARCH
9.
KEY PERSONNEL
10.
PREVIOUS ACCOMPLISHMENTS
11.
BIBLIOGRAPHY
|