System Architecture
The Interspace system uses a layered system design. The kernel contains
the generic application framework needed to support building of user
information applications. Under the kernel is the service layer. This
layer contains the core functionalities required by the kernel. Various
modules in this layer provide these functions. Some of the functionalities
include semantic indexing and retrieval, full text searching and vocabulary
switching.
The domain management module works closely with the kernel and has
many functions. The primary function is the management of each individual
domain and also relationships among domains.
Each domain represents a collection of sourceunits (or documents).
New information to be processed is either added to an existing domain
or a new one. When information comes in, domain would create a corresponding
set of SUs. Each sourceunit has an associated policy and setting. The
policy specifies the appropriate media parser to be used. It also contains
information on which part of the source unit to be used. The setting
specifies additional information on how the media parsing should be
done such as quality of output vs. parsing time. After each sourceunit
is created, concepts are extracted from it by the MCE. These concepts
are then added to a list in the domain. Thus the domain contains all
concepts in its collection of SUs. For incremental computation purpose,
the domain also keeps a list of all current concepts that have not been
processed. For example, we added five SUs to a particular domain. The
domain then performs the computation. Later on, if we add another three
SUs. The domain will not do the whole computation over again but rather
an incremental computation is done. In addition to creating concepts,
the MCE also collects sourceunit level statistics for them. These statistics
includes frequency and field type. The domain uses this information
and computes the domain level statistics such as frequency across SUs,
average SU size. All this statistical information is stored in the ConceptInDomain
object, which is created for each concept in the domain. These ConceptInDomain
object would be used in the computation of category map and conceptspace
for this domain.
Domains can also form relationships with each other such as parent
and child relationships. The relationship information is either obtained
from a human source or from an automatic computation process. In our
system, we have a module based on the SOM algorithms that would generate
the domain relationships automatically. These relationships among domains
are very important as they form the basis for categorizing these domains.
Without them, the sheer number of domains that exit would make them
unusable.
The
domain manager also acts as the link between the upper Interspace kernel
and the lower server layer modules. It translates requests from the
kernel to various calls to the service layer modules and then translates
the results back to the kernel afterwards. The kernel and the domain
management module can be on different machines and the communication
protocol between them is designed to handle that. The system is also
designed to handle many-to-many communication. Therefore there could
be many kernels talking to many domain management modules in many ways.
Currently, we use a polling protocol on several work queues stored in
a persistent data store for transferring data between the kernel and
the domain management module.
In addition to the functions discussed above, the domain manager also
performs other system support functions. Currently these include data
naming service that provides transparent access to data regardless of
where it is stored physically. In the future, addition functions such
as performance statistics collection, domain backup and restore, domain-naming
service may be added.
Conrad Chang
CANIS - Community Archictecture for Network Information
Systems
University of Illinois at Urbana-Champaign, Champaign, IL 61820
t-chang2@canis.uiuc.edu