"The uncertainty about how to search text, and how to interpret it if we do search it, represents the principal difference between database management systems, such as dBase IV or SQL, and information retrieval systems, such as DIALOG or MEDLARS."
- Charles T. Meadow, "Text Information Retrieval Systems"
The Interspace Analysis Environment will provide users with the means to directly manipulate and combine arbitrary information retrieval capabilities, whether they be traditional techniques such as full text search or advanced techniques such as the exploration of the term coocurrence relationships provided in the Interspace Services. The environment emphasis indicates that we wish to give users a level of control and power over their work comparable to that enjoyed by the programmer, as opposed to more typical application-oriented user interfaces which give the user only those abilities which the system programmer deemed neccessary.
In order to give the user this level of control, we are taking an object-oriented approach to the construction of the Analysis Environment's direct-manipulation user interface. In this approach, the objects presented to the user are meant to most closely embody underlying concepts found in the user semantic model, or that model which we wish to present to the user as if it is the actual system, regardless of the actual implemenation of the underlying system. This is a particularly good match for the Interspace model, as it is being implemented using object-oriented techniques and therefore provides a close match between the actual system model and our user semantic model.
In our object-oriented direct manipulation user interface, we emphasize the inherent capabilities of the presented objects. Applications are defined more by the objects found within them than by the windows in which they appear. This approach has the advantage that difficult-to-share application-specific capabilities are minimized while object-specific capabilities are more readily identified and reused throughout the environment, independent of any one particular "application". For instance, within the Interspace model a term which has been extracted from the documents of a given domain has a number of inherent relationships with other objects, such as its appearance in a number of documents, cooccurrence with other terms as found in the documents of the domain, and noting which authors have used that term. No matter where a term is presented within the analysis environment, these remain its inherent capabilities and they are always available to the user.
To be convincing in our illusion of the user semantic model while simultaneously giving the user great control over that illusion, we follow these three guidelines:
Be sure that visually distinctive interface objects behave the same in all presentation contexts. Challenge: in the near term, we will be presenting textual phrases. But some of those phrases are extracted from the corpus, some are from concept spaces, some are from human-generated thesauri, some might even be provided by the user. Each of these kinds of phrase has special characteristics which would make them distinctive from the others. But they all LOOK LIKE phrases. This is a point of possible confusion to the user. Our current strategy is to eliminate these distinctiions, and provide users with one kind of phrase only, but with varying capabilitiies. This is similar to the use of "greyed-out" menu items to present a consistent set of capabilities, but some of which are for some reason or another simply unavailable. There will be additional visual cueing to indicate without detailed inspection the nature of the capabilities any given phrase might have. While we cannot provide all phrases with a uniform set of capabilities, we can provide a uniform set of potential capabilities.
Anytime, anywhere, the user is allowed to do anything that is within the capabilities of the presented objects and tools. This is a particularly challenging goal. For example, in phrase relationships alone we have ( as mentioned above ) numerous classes of inter-phrase relationship, each orthogonal to the others. A user may wish to explore phrase co-occurence in one moment, then shift into exploring narrower/broader subject relationships in the next, only to switch to author or document relationships after that. This must all be allowed, because that is what the user will want to do.
Or, specifically, the establishment of context, and cogent presentation of its evolution. This conflicts with the previous goal as the establishment of context is at odds with the notion that at any moment, the user may decide to switch contexts entirely. Context is built as the user navigates the information space via a series of queries, some of which are dependent upon previous queries to place bounds upon the space over which new queries are issued. Those that are, continue to establish a user-oriented context. Those that are not, potentially break context. Anything could happen, anytime. Devising a scheme by which context is built and broken in a fliud, coherent manner is among the more challenging aspects of the analysis environment's design.
Components of the Analysis Environment are built according to a layered strategy. Each layer corresponds to a particular role or task required by the analysis environemnt, and objects are developed to implement one or more of these roles. Beginning at the lowest level, these layers are:
The system model is the way the underlying system is actually implemented.
This layer assures that in the model we present to the user the inter-object relationships have actual semantic meaning. That is, if a user should ask an object for a list of categories in which that object appears, they will actually recieve a list of category objects and not just a series of strings with the names of the categories. The main difference is that the category objects are self-sufficient and the user could then query these directly, whereas a list of strings is entirely dependend upon application-specific interpretation in order to have any meaning at all. Outside the context of an application, they would be meaningless. This would prevent the user from having the power to combine these categories with other objects to construct their own analysis capabilities.
The user semantic model assures us that semantically-meaningful object relationships are presented to the user. However, additionaly capabilities which are dependent upon the display technologies used also need to be provided to the user. Since these are not an inherent capability of either the system or user semantic models, we provide a layer of wrappers which provide a means of presenting the semantic objects in a manner consistent with the display technology, and a means of giving the user access to the inherent capabilities of the underlying semantic objects as well as the presentational aspects of the interaction/presentation wrappers.
The Interaction/Presentation wrappers require some form of media in order to present themselves, and are therefore tightly coupled with a set of objects which define various simple generic media, such as List, Text, or Graphics. Corresponding to each Media there exists a set of objects in the role of an I/P wrapper which may be presented on that media. Media are not specific to any particular application.
Although we wish to emphasize the primacy of the user semantic model, there remains a use for application-specific browsers. These are intended to provide a place for application-specific code to be written ( preferrably very little of it ) while making use of generic Media components. An example would be a Browser having both List and Text media, with the list used to present search results and the Text to present the text of a document selected from the List.
The layered architecture we are using in development of the Analysis Environment has allowed us to focus on developing particular layers while being somewhat independent of the maturity of the components in the other layers. We've used this feature to emphasis the development of the lower layers, from the system model up through the Interaction/Presentation layers, while relying on a simple textual implementation of the Presentation half of the I/P layer through to the Browser layer. This initial emphasis in development efforts also has a name, it represents the initial development of the analysis environment and is known as the "Interspace Services Technology Demonstrator, Mk. I". The second phase of development ( now underway ) will focus on developing the upper layers of the architecture. This will employ an entirely different form of display technology, based fully upon a graphical direct manipulation user interface toolkit known as "Morphic", originally developed in the Self language but now ported into the dialect of Smalltalk used to build the analysis environment. This phase is distinguished with the Mk. II designation for ISTD. Both of these phases are considered to be leading up to the final realization of the Analysis Environment, which might also be thought of as ISTD mk. III .
The goals of ISTD mk. I were to encounter and develop the roles and interaction patterns of the lower level infrastructure needed by the analysis environment. This is sufficiently complete that we have now shifted emphasis to the upper levels, which are the province of ISTD mk. II. Lower level work will continue as needed in reflection of those needs identified by the changes in the upper layers of the analysis environment architecture.
Presently ISTD mk. I is working with real information held in object databases built according to the system model for the Interspace Services. An example session using ISTD mk. I is shown in the following screendump which shows the following, from left to right:
Domains is a window listing select root domains and their subdomains, as exposed by the user. From this list, two were chosen, one to be the source and the other the target for this switching demonstration.
Colorectal Neoplasms, Hereditary Nonpolyposis and its accompanying Search box are two windows for browsing this domain having documents on a form of hereditary cancer. In the search box, we have used the term hereditary cancer as a starting point. The first level of terms shown beneath the title are terms matching this search query. The second level are terms which are related via cooccurrence to the phrase Hereditary nonpolyposis colorectal cancer, in which list the phrase mismatch repair genes appeared to be interesting. Terms related to mismatch repair genes are further exposed as the third level of terms.
Genes,
Regulator and its accompanying search and switching boxes are windows corresponding
to another domain of knowledge in which the user is interested in possible connections
between regulating genes and hereditary cancer. The user has already tried a
search for mismatch repair genes, but this had no result. Instead, they
have made use of a rudimentary vocabulary switch based on a permuted lexical
exploded search for the phrase mismatch repair genes and all phrases
related via cooccurrence within the domain "Colorectal Neoplasms, Hereditary
Nonpolyposis". Results of this switch form the first level of phrases in
the "Genes, Regulator" window.
The term polymerase chain reaction technique appeared to be an interesting topic, one that is in both domains, and perhaps somehow related to mismatch repair genes. The second level of items in the list ( the underlined items ) represent the documents in which the term polymerase chain reaction technique was found. From out of that list, the document titled Regulation by CDF/LIF and retinoic acid of multiple ChAT mRNAs produced from distinct promoters. was selected in order to view its abstract.
Within the abstract, a modest hint about an associative hypertext ability exists in the form of the buttons upon which the author names are given. These could launch other views onto the selected author, and similar abilities are planned for the text within the documents so that the user would be informed of the phrases within the document which have become part of the domain and/or the domain's concept space.
As it turns out, this was probably not a productive line of search, but then that is also part of the search process. The environment allows the user to pursue differing lines of attack without getting locked into any particular path- at any point, any object reachable by the user can serve as a context-breaking point via that object's inherent relationships with other objects in the environment. Alternatively, the path taken by the user in exposing the various ( and self-describing ) relationships can be used to form a context enabling the user to narrowly specify a particular query as a natural side-effect of browsing through the space of concepts. This context forming ability is not demonstrated explicitly here, but is an existing ability within ISTD.