.....
  ...
.....

Concept spaces are our main knowledge repositories. Knowledge obtained from various tools are stored in them. Our main concern is to have a model that is extensible and efficient. As unanticipated needs will surely arise in the future, we would like to have an object model that is general and will accommodate these needs easily. Since we are dealing with a large amount of data in terms of gigabytes and a large amount of knowledge extracted from them, efficiency in storage and retrieval is quite important.

Here we will describe the object model that we developed. We show the class diagram using the UML notation.

We have 20 classes in our model. A list of their descriptions and their relationships with other classes is listed below:

SourceUnit: This is the abstract class of a basic unit of information source in the model. In general, an object instance would be created from subclasses of this class. Currently, there are two subclasses: TextSU and MapSu. Each sourceunit has a specific SUPolicy and a SUSetting. SUPolicy specifies the type of SourceUnit. SUSetting specifies the quality of service that should be used in the multimedia concept extraction phase. A SourceUnit also contains a list of concepts associated with it.

TextSU: This is a subclass of SourceUnit. It specifies the information source is of textual type.

MapSU: This is a subclass of SourceUnit. It specifies the information source is of image type.

AudioSU: This is a subclass of SourceUnit. It specifies the information source is of audio type.

VideoSU: This is a subclass of SourceUnit. It specifies the information source is of video type.

SUPolicy: SUPolicy specify the media parser for a source unit. Depending on the types of the SourceUnit, a different could be used. A HTML SourceUnit would need a HTML media parser for instance. In addition, the SUPolicy also specify the parts of SourceUnit what are useful for concept extraction and computation.

HTMLPolicy: This is a subclass of SUPolicy . It specifies settings for HTML type of SourceUnit.

RawTextPolicy: This is a subclass of SUPolicy . It specifies settings for raw text type of SourceUnit. Raw text is basic any SourceUnit without a type.

SUSetting: SUSetting specify the quality of service for concept extraction. Concept extraction requires intensive computation. This setting provides a means to choose the appropriate tradeoff between computation time and quality of results.

Domain: Domain is the connection point between various servers in the service layer and the kernel. Its functoin is to group sourceunits into similar domain areas. For example, we might have a domain for Cancer. This domain would then contain a collection of sourceunits which are about cancer. In addition, the domain contains domain-specific information about concepts extracted from each sourceunit. This information is stored in ConceptInDomain. An example would be statistics, such as co-occurrence of concepts in sourceunits, which are needed in the concept space generation. Each domain is also associated with a single concept space. In addtion, a domain can contain multiple subdomain. For this reason, domain objects can be used to form multiple hierarchies that can be used for categroizational purposes.

Concept: Concepts represent semantic meaning. Each concept is unique and can appear in many domains and conceptspaces. They do not contain any computational information, which  is contained in ConceptInCS and ConceptInDomain classes.

Representation: This is an abstract class which represents the appearance of a concept. It has many subclasses, each of which is a different type of representation. It is possible that a concept couldl have multiple representations. For example, the concept "dog'' could be represented by the string 'dog', or a picture of dog. Both of these are representations of the same idea.

StringRep: This is a subclass of Representation. It specifies textual representation.

MapRep: This is a subclass of Representation. It specifies Image map representation.

ConceptSpace: This class contains a collection of concepts extracted from a list of sourceunits. This is the basic unit of analysis and computation when the conceptspace algorithm is applied. We will discuss the algorithm in the section 5.1

ConceptInCS: This is the computation part of a concept in a ConceptSpace. Since each concept can appear in multiple conceptspaces, information related to a concept with respect to a particular conceptspace is stored in this class. This information includes relationships with other concepts in this conceptspace.

ConceptInDomain: This is the computation part of a concept in a Domain. Since each concept can appear in multiple domains, information related to a concept with respect to a particular domain is stored in this class. This information includes relationships with other concepts in this domain.

Cooccurrence: This class contains information about one specific type of relationship that concepts might have with each other, namely the cooccurrence matrix. The cooccurrence matrix stores information about whether concepts co-appear with each other in a particular sourceunit.

 

 

 

..... ........