...  
.....

TELESOPHY

Bruce R. Schatz
August 1985

TABLE OF CONTENTS

ABSTRACT

The vision of universal and uniform access to all the world's knowledge has intrigued people for a long time. Recent advances in hardware and software technology make it possible to consider constructing an information retrieval and storage system for the WorldNet, the communications network comprising 'all" the world's computers. Such advances include: fiber optics, graphics workstations, object systems, electronic mail, and bibliographic databases.

This report is a feasibility study for a system providing transparent knowledge manipulation for different types of data stored in different physical locations. It outlines, from abstract philosophy to concrete technology, an approach to building an environment to support "telesophy", or "wisdom at a distance". The conclusion is that, in the foreseeable future, it will be technically and economically feasible to build a system to support browsing and sharing inside a worldwide information space.


AUTHOR'S NOTE

This report was originally written to order my thoughts on the philosophy and feasibility of a system that provides transparent universal manipulation of knowledge. A draft dated August 1984 was polished and informally circulated among a few co-workers. The demand for this document grew far beyond my original intentions and, eventually, several hundred copies were distributed.

In the meantime, the Telesophy Project has become an official project at Bell Communications Research. The staff has grown to five people, and a series of prototypes is being implemented on a network of bit-mapped workstations with a sample spectrum of databases.

This report has now been packaged for external distribution. Only smoothing and minor corrections of the original draft have been made, with little attempt to reflect progress since then. I hope that this report will be only the first of many describing such progress towards understanding and propagating the ideas embodied by telesophy.

Bruce R. Schatz
August 1985


PREFACE

This is a report on the feasibility of a system. It describes the basic philosophy and technology behind a new type of system for information management. The focus is on what the basic model of the system is and how an architecture can be laid out to implement this. Why this model accurately reflects real-world problems and why it will be possible to build the system are the underlying currents throughout the report.

The tone of the report is accordingly to convey the flavor of this system. The arguments, while solidly based, are made with a broad stroke. The intent is to show what this new kind of system could be and why advancing technology will soon make its implementation completely feasible. Thus the argument moves through each topic necessary to consider in designing such a system and presents brief discussions of the major points.

The merging of computing and communications is a topic that has received much publicity in recent years. But little has appeared that describes a single system combining the advantages of both styles of operating. The system described here is a serious attempt at exactly that. It represents a combining of existing techniques into a new style of thinking: Telesophy is a State of Being.

The main theme in this report is that achieving telesophy involves no technological breakthroughs, merely appropriate combination of existing and soon-to-exist technology. The functionality of an operating environment depends on its underlying technology assumptions. Telesophy is an instance of the type of support for information usage that can be provided with the availability of fiber optics, graphics workstations, object systems, electronic mail, and bibliographic databases. An attempt has been made to scale the architectural assumptions to begin to make such a system economically feasible for businesses in, say, 5-10 years and for residences in 15-20.

When the wideband networks extend worldwide, the telesophy operating environment will make them into a transparent WorldNet. Constructing a portal to navigate the WorldNet is thus almost certainly possible. In a free market economy in nations moving towards becoming information societies, this means that it almost certainly will appear in the near future. Telesophy is intended to be a system for the masses.

The undercurrents in this report reflect a philosophy of the design of successful systems. Systems are tools for people - to be widely successful, they must satisfy some fundamental human need significantly better than existing solutions. An explicit attempt has been made to chart this satisfying mapping from properties of human needs to properties of the system model to properties of the system architecture. This sort of attempt is particularly important in the design of computing systems, which, as evidenced by the telesophy environment, are perhaps the first type of human tools providing significant intellectual support. Thus what may appear to be an overemphasis on philosophical issues in this report is instead indicative of the coming approach to the design of systems: the architect must be as much a philosopher or artist as an engineer.

This belief also explains the timing of this report, coming as it does, before experience with building or using the system. The semantic basis of a system, its model, is the major determining factor behind widespread adoption. Systems with inadequate models are doomed to failure. Thus carefully laying out the underlying philosophy is asserted to be a necessary component of successful systems. One analogy is that of motion pictures, where good production values can at best jazz up a bad script. This report could be viewed as the script for a system - its realization remains in the future.

An attempt is made here to introduce aesthetics into system design. The point of view is that anything understood can be implemented with suitable effort. In computing systems, there is no longer a question of whether a system can be built and made to work. Instead the questions deal with efficiency, economy, elegance, and utility. Thus the goal of system design should be determining what problems should be solved and giving descriptions of how to go about solving them. This is similar to the situation in movies where good films are made by artists interested in using, say, special effects as a tool to tell a certain story rather than by technicians interested in using the effects per se. However, unlike a movie, which is a facade that will be seen only a few times in an uncritical situation, a system must wear well over years of daily use. The need for careful understanding of the model presented and how it maps onto the user's problems is thus paramount.

The undercurrents then are really based upon a metatheory of system design. This report is most similar to philosophical books on architecture, e.g. [PhilArch]. These discuss new design strategies now feasible with new materials. For instance, the introduction of iron permitted an entirely different class of bridges to be built than were possible with stone. The construction of physical structures is almost the only other example of a field in which environments are designed that must be both reliable and long-lasting with strong influences from both engineering and artistic considerations.

As the above indicates, the flavor here will be different from that found in usual technical writing on computing or communications. Discussions of components focus on how to perform the same function better: faster, cheaper, more reliably. Even discussions of systems tend to concentrate on better methods for implementing the same model, e.g. electronic versus mechanical switching machines. Accordingly, current systems typically provide only good implementations of syntactic speedup - leaving the ocean of semantic truth untouched. The structure of a telesophy environment attempts to provide direct support for information manipulation by directly modeling empirical observations about human needs and uses for it, rather than by providing a higher level of automation for the current solutions. Readers are encouraged to bear with the philosophy since it is just as essential a component of the system as the fibers, the semiconductors, and the programs.


CONTENTS 

 1. INTRODUCTION

1.1. The Vision
1.2 System Model
1.2.1 Sentimutation
1.2.2 Nexogenesis
1.3 System Architecture
1.3.1 Tele
1.3.2 Sophy
1.3.3 Sophport
1.4 This Report

2. HISTORY

2.1 Communications Systems
2.2 Computing Systems
2.3 Information Retrieval Systems
2.3.1 The Twenty Year Cycle
2.3.2 Existing Limitations
2.4 Browsing Systems

3. APPROACH

3.1 What is Communication
3.2 The Understanding Taxonomy
3.3 System Mapping
3.3.1 The Levels
3.3.2 The Dimensions
3.4 Subsumption

4. INFORMATION

4.1 The Spectrums
4.2 The Types
4.3 Electronic Information
4.3.1 Formal
4.3.2 Informal
4.3.3 Future

5. MODEL

5.1 WorldNet
5.2 Sentimutation
5.3 Nexogenesis

6. TELE I - Hardware

6.1 The Single Computer Model
6.1.1 Beyond Local Area Networks

6.2 Execution Distribution
6.3 Network Technology
6.3.1 Capacity
6.3.2 Switching
6.4 Terminal Technology
6.4.1 Memories
6.4.2 Processors
6.4.3 Workstations
6.5 Resource Technology
6.5.1 Magnetic Disks
6.5.2 Optical Disks
6.5.3 Layout
6.6 Traffic Considerations
6.6.1 Downloading
6.6.2 Media

7. TELE II - Software

7.1 The Fetching Process
7.2 Addressing
7.2.1 Servers
7.2.2 Levels
7.3 Propagation
7.3.1 Spreading Activation
7.4 Caching
7.4.1 Binding
7.5 Canonicalization
7.5.1 Standards

8. SOPHY I - Hardware

8.1 Terminal Types
8.2 Input
8.3 Output

9. SOPHY II - Sentimutation

9.1 Region Traversal
9.1.1 Examination
9.2 Information Units
9.2.1 Standard Fields
9.3 Region Formation
9.3.1 Query Generation
9.4 Matching
9.4.1 Equivalence
9.4.2 Closest
9.4.3 Inference
9.4.4 Inheritance
9.5 Feedback
9.5.1 Windows

10. SOPHY III - Nexogenesis

10.1 Knowledge Generation
10.1.1Direct Manipulation
10.2 Data Items
10.2.1 Types
10.2.2Values
10.3 Generic Manipulation
10.3.1 Operations
10.4 History
10.4.1 Learning

11. SOPHPORT

11.1 Underlying Support
11.2 Control
11.3 Environment Variables
11.4 Input/Output
11.5 Language Levels
11.5.1 Programming Environments
11.5.2 Customization

12. IMPLEMENTATION

12.1 Efficiency
12.2 Economics
12.3 Evolution

13. SOCIOLOGY

13.1 Homo Teleans
13.2 (De) Centralization
13.3 Business Opportunities
13.4 One World

14. ACKNOWLEDGEMENTS

15. BIBLIOGRAPHY

15.1 Classics
15.2 References
15.3 Texts


I. INTRODUCTION

The Use of Information is the hallmark of humankind. This is becoming increasingly clear as the majority of employees in industrial societies become labeled as "knowledge workers" and the times are widely called the "Information Age". The use of information consists of gathering it, filtering it, then recombining it into new information. Historians call this generation of new knowledge from old the "cultural transmission of knowledge" and mark it as the distinguishing characteristic of civilization.

The current tools to support this cultural transmission are surprisingly primitive. Knowledge workers scan linearly through pieces of paper and do recombinations manually by cut and paste. Ordinary people see no way to find the information they need to solve their problems and have a constant feeling of being trapped. One of the major desires of humankind is thus a "sophport", a gateway into wisdom, an omniscient object that can provide the information necessary to solve any problem.

The rationale behind Telesophy is that technology in communications and computing has advanced sufficiently to actually construct a fair approximation of such a sophport. Its physical instantiation would be a ubiquitous terminal capable of displaying "anything" and transparently linked into the WorldNet. Its logical functionality would provide concrete manipulation of the abstract notions of information usage. This report asserts that "telesophy", wisdom at a distance, will be possible in the near future and outlines how this goal might be achieved.

The report is laid out in a logical progression, from abstract philosophy to concrete technology. Some attempt has been made to keep the chapters self-contained. Thus some readers may wish to skip to the level of abstraction with which they feel most comfortable, although this will cause them to miss the complete flow of the argument.

This chapter introduces the terminology, lays out the major points, and outlines the discussion to follow.

1.1 The Vision

The universal Library is an old dream of humankind. This is the central repository of all knowledge, carefully indexed for future retrieval and study. However, this ideal lacks a fundamental property of real-world information, its dynamic nature.

Information is generated in people's heads after study of previous knowledge. It is refined into knowledge through a spectrum of filtering and combination: conversation, notes, working documents, internal memoranda, conference papers, journal articles, research monographs, text books (to illustrate only the scholarly spectrum). As time passes, the classification changes, not only for "new" knowledge but for old knowledge as well. All the knowledge is interlinked, each piece dependent upon others from its moment of generation onward.

A superior model for all the knowledge of humankind is thus the WorldNet. This is the single network containing "everything", capturing both the inter-dependent relationships and the dynamic changeability. All of the world's information is plugged into this network, including all stages of formulation and all types of media, from conversation with people to viewing of books or films. The use of information is like navigating through an abstract information space: traversing the links, examining the nodes and gathering them together to form new nodes and links.

To provide transparent access to "anything", the WorldNet must provide access "anywhere". This is necessary to permit the knowledge sources to be plugged in regardless of location. Conversely, any user anywhere must be able to access any particular source. Thus the WorldNet must provide "information access independent of physical location".

The significant point of Telesophy can now be appreciated. The word telesophy is formed from the Greek roots "tele", meaning "at a distance", and "sophy", meaning "wisdom" [Dictionary]. Compare the words, "telephone", meaning "sound at a distance", and "philosophy", meaning "love of wisdom". Just as telegraphy is a system for transparently providing writing at a distance and telephony is a system for transparently providing sound at a distance, telesophy is a system for transparently providing wisdom at a distance.

"Wisdom" here is used in the meaning of "common sense' rather than of "lasting truth". That is, "wisdom involves sound judgement and the ability to apply what has been acquired mentally to the conduct of one's affairs". The Greek "sophos" literally means "skilled". Thus a telesophy system facilitates drawing on the information resources of the WorldNet to solve your problems.

The "tele" is the major goal of the internals of the system. As users navigate the single abstract information space that represents the WorldNet databases, they must kept unaware of the multiple physical boundaries necessary to cross to fetch knowledge from anywhere. Providing this location transparency requires an efficient scheme of addressing and caching to maintain the illusion of a single space.

The "sophy" is the major goal of the externals of the system. As users navigate the single abstract information space that represents the WorldNet data bases, they must kept unaware of the multiple logical routines necessary to locate and generate knowledge of anything. Providing this location transparency requires an efficient scheme of matching and typing to maintain the illusion of uniform manipulation. In addition, since the location and generation is directly user-visible, a highly flexible and interactive scheme for adequate browsing support is necessary.

1.2 System Model

The primary system model is thus that of navigating through the worldwide information space. This may seem like a very abstract view of a knowledge worker's activities, compared to a more physical view such as the currently popular "desktop" model [Star]. But, paradoxically, the more abstract model is the more concrete. This is precisely because the use of information is an abstract concept so that physical models must lose some essential properties.

From the user's point of view, the "tele" is completely hidden. This is the point of location-independent access. As information is located, it is fetched from the network. Similarly, as it is generated, it is written back into the network. The metaphor of traversing a single information space is maintained, making the memory hierarchy transparent for both reading and writing.

The "sophy", then, comprises the bulk of the system model. The constituent actions of locating and generating each comprises a fundamental concept and each merits its own new word.

1.2.1 Sentimutation

The problem of locating knowledge is that of communication, mapping the categorization in the user's head to that in the system's databases. Most real-world problems do not have "right" solutions but merely currently adequate solutions. Good problem solvers try something for a while and see what its limitations are. In the process they understand more of what the problem really is and can see what to try next. Unless one is looking for a standard piece of information from a known source, the process of search is that of interactive browsing: following selected pointers from datum to datum until some currently acceptable set has been found. The selection requires subtle consideration of such things as applicability and believability - it is unlikely to be automated anytime soon.

The system described below accordingly provides a set of tools for interactive browsing, for pointer following. The portion of the telesophy environment dealing with the location of information thus supports sentimutation. The word "sentimutation" is formed from the Latin roots "senti", meaning "point of view", and "mutation", meaning "change". For the first root, compare the word "sentiment", meaning "a view on a particular matter or general mental disposition especially when determined by emotion as well as reason" (from the Latin "sentire", to feel, and "sententia", way of thinking). The second "root" is already an English word, meaning "the process of being changed" (from the Latin "mutare", to change). The word "sentimutation" captures the essence of the location process: jumping from view to view of the WorldNet with the jump criteria being determined by such complicated factors as to defy logical explanation.

1.2.2 Nexogenesis

The problem of generating new knowledge is basically the process of filtering and combining old knowledge. "There is no new thing under the sun" says Ecclesiastes (1:9). The Zeitgeist Theory of History states that revolutionary ideas bubble out of the environment - when the spirit of the times is ripe, they will appear. The great thinkers and doers are not those who change the Zeitgeist but those who realize early where the crest will be and state its position in a manner that sticks eponymically. "New" knowledge is created when careful consideration of old produces some modification that constitutes an improvement. Typically the major advances come from combining ideas from several sources previously thought to be unrelated. Experimentation, both physical and mental, is necessary to determine how to do the filtering: which pieces to keep, which to modify, which to discard.

The system described below accordingly provides a set of tools for this abstract recombination. The portion of the telesophy environment dealing with the generation of information thus supports nexogenesis. The word "nexogenesis" is formed from the Latin roots "nexo", meaning "binding", and "genesis", meaning "produce". For the first root, compare the word "nexus", meaning "the bond existing between members of a group" (from the Latin "nectere", to bind) or the word "connect", meaning "to bind together" (the British spelling is "connexion"). The second "root" is already an English word, meaning "the coming into being of anything; creation" (from the Latin "genesis", birth). Compare "genus", meaning "a group with common attributes", to "nexus". The word "nexogenesis" captures the essence of the generation process: a bond is produced creating a new group with previously unrecognized common properties whose members are found by filtering selection through a large collection of disparate things.

1.3 System Architecture

The architecture of a telesophy system should follow directly from the system model just presented. The user must be able to navigate through the worldwide information space, examining and manipulating whatever portions are desired while the system transparently handles how the information got "here" and how it is actually operated upon. The physical hardware is a terminal connected by a fast communications link to the world of resources. But the logical picture that users should have is that the terminal is a portal into the information space. They should literally consider themselves "sophonauts", sailors in the sea of wisdom (from the Greek roots "sopho", meaning "wisdom", and "naut", meaning "sailor" from "naus", ship, as in "astronaut").

The software environment that allows a user to transparently browse the WorldNet will thus be termed the sophport. The word "sophport" is formed from the Latin roots "soph", meaning "wisdom" as usual and "port", meaning "gateway" (from the Latin "porta", gate). The system is both in name and operation a gateway into wisdom, a portal into the worldwide information space. It must accordingly be an operating environment, directly supporting each of the fundamental concepts of the system model. That is, the user performs communication with appropriate experts via interactive navigation through the universal information space. This is modeled by the system as sentimutation and nexogenesis via transparent universal caching.

1.3.1 Tele

The support for tele provides the transparent fetching. That is, the widely dispersed "databases" are made to appear as a single space of named pieces of information.

The fetching process is basically formulation of a query locally, propagating this out into the network, and receiving the results. The formulation involves specifying properties to be matched in some canonical protocol. The propagation involves addressing some appropriate resource. The receiving involves the actual caching as well as transformation of the results back into canonical format.

Since the WorldNet has billions of potential nodes, the addressing must be accomplished via hierarchical levels of gateways. At each level, the gateway is a switching machine with a pointer directory that specifies onto which nodes it is appropriate to pass requests for which queries. This pointer directory gives more than merely connections since the addressing is symbolic and requires property matching to determine the exact network address.

The supporting hardware consists of user terminals connected to central resources via a wideband network. The terminals execute the interactive browsing of the retrieved information, both viewing and generating. The resources execute the matching of queries against the stored information. The network executes the routing to download requested information to requesting terminals. The network speed must be sufficient to "instantaneously" transmit the information, whether it be a few pieces or entire databases; the new optical fiber technology will make this possible. The existing telephone network is already ubiquitous worldwide, but its speed of transmission is much too slow to provide location transparency.

The fetching of information into local terminal memory completes the caching process. The caching is the time of binding. As new objects are fetched, their data and corresponding operations are changed into the canonical format and "compiled" into some internal representation. Achieving this canonical transformation requires standardization to be universal. The same is true of the fetching protocols. This re-emphasizes that providing location transparency in a "world-wide" network crucially depends on standards.

1.3.2 Sophy

The support for sophy provides the transparent examination and manipulation. That is, the wide variety of pieces of information available "locally" are made to appear as uniform objects for traversal and modification. These "knowledge" objects can be both viewed and generated by a set of generic operations.

The basic searching strategy is to interactively browse the single information space transparently provided by the "tele" support just discussed. To gather desired information, a series of pointers are followed with view switching, sentimutation, as necessary. The gathered information is then filtered and combined, nexogenesis, to generate new knowledge satisfactory for the current need. One search strategy is "spread and watch", issuing a very general query and interactively examining the results as they come back to determine potential regions of the WorldNet for a more careful examination. The other end of the search spectrum is "restrict and query", downloading a particular database to examine locally, which was chosen on the basis of prior categorization knowledge. The more exact the categorization, of course, the faster the searching converges.

Sentimutation proceeds via formation of a local region of the information space from query results and traversal of this region. As described above, the local sophport generates queries that result in local caching of new information. The matching of these queries can be across many properties referencing a wide variety of information, some inherited from other information in the space. The traversal is possible because of a uniform package for the information in the space, which enables a generic program to traverse the connection links. This package, termed an Information Unit or IU, contains the data, descriptive classification to be matched against, and the connection pointers that form the region. Thus it also permits implementation of a generic search capability so the retrieval is independent of the data. The sophport tries to simulate a ship's control panel: the main data window is surrounded by a variety of smaller browse windows giving contextual information.

Nexogenesis proceeds via creation of information units from existing ones and manipulation of thesc new units. Regions are formed by loading the results of a query into a new IU. Since the caching is transparent, this new information exists permanently in the complete space. It is named by the IU packaging, and history mechanisms must be used to undo changes when desired. Filtering of the gathered pieces of the IU can then be done. The manipulation of each piece is type-dependent. That is, the code actually used for the operations was cached and canonicalized into the local "type library" when the piece was fetched. The execution of the code also is dependent upon the current context and environment. This uniform typing of the data items contained within the information units is necessary for universal manipulation just as the uniform packaging of the units was necessary for generic formation and traversal.

1.3.3 Sophport

The support for sophport provides the underlying support environment for execution of telesophy. A transparent soft machine is implemented. That is, as data objects are cached into the local buffer, operations for their manipulation are also cached. An interpreter executes the instructions of the resultant machine.

The control of this "machine" must typically encompass several contexts. Each context is a complete browsing environment. A buffer contains the current information units. Some portion of this is currently in view on the terminal display. The available operations for each data type are kept in a library. The interpreter processes user commands on the information in the buffer and invokes the appropriate type-dependent code from the library. Other commands affect the buffer by altering the context or switching contexts.

Input and output are particularly important operations. These are both type-dependent and environment (e.g. terminal) dependent. To enable effective browsing, different levels of formatting are available. Thus information can be rapidly scanned at a coarse level, then "zoomed" into to be examined in more detail. Maintaining formatting information with the type library makes the sophport a universal viewing device. That is, generic routines can display any information (given appropriate formatting code and an appropriate terminal).

Besides basic support, customization is the primary facility provided by the sophport. The system is intended for the masses so it must serve the needs of a wide variety of people. But the "programming" support it provides is at a far higher level than that of current software. Modifications supported by the model are easy to make. The most notable example is nexogenesis. The generation of new knowledge from old is distinctly customization but so fundamental to the system that users will be unlikely to realize it constitutes programming. (Note in particular that new data types can be generated from old.) Modifications underlying the model, on the other hand, such as adding new manipulation or matching operations, require a different level of programming, which is lower and more efficient. Each level deeper into the system requires more knowledge for modification and will have a smaller set of people capable of performing such modification. The focus of this discussion is on the "universal" user level with an operating environment embodying an explicit system model.

1.4 This Report

A note on style. The reader has probably noticed that a number of new words have been coined to describe this system: telesophy, sentimutation, nexogenesis, sophport. These words are intended to be not only descriptive but also grown from existing roots (in the tradition of nexogenesis). Thus they do not describe new concepts as much as explicitly recognize existing fundamental properties of the use of information. Since the system is intended to provide explicit concrete manipulation of these abstract concepts, a deliberate attempt was made to coin a word for every major concept necessary to understand the system model. When a user understands the meanings and implications of these words, he or she will understand the system model. Purely internal concepts, such as addressing and other pieces of "tele", which are hidden from the user, have been described using standard English.

 

The organization of this report is as follows:

1. Introduction. the basic terminology and overview.
2. History. communications, computing, and information retrieval.
3. Approach. from properties of humans to the model to the architecture.
4. Information. its types, both present and future.
5. Model. the major themes and concepts.
6. Tele I. the hardware behind the WorldNet: fibers and switching.
7. Tele II. the software behind the WorldNet: addressing and caching.
8. Sophy I. the hardware behind browsing systems: input and output.
9. Sophy II. sentimutation software: region traversal and formation.
10. Sophy III. nexogenesis software: knowledge generation and manipulation.
11. Sophport. support: interpreters, types, input/output, languages.
12. Implementation. efficiency, economics and evolution issues.
13. Sociology. potential effects: Homo Teleans and One World.

Basically,

Chapters 1-5 describe the philosophy behind the model of a telesophy system;

Chapters 6-11 describe the technology that will make the suggested architectural structure feasible to implement; and

Chapters 1 2-13 describe a few practical implications.

2. HISTORY

This chapter places telesophy in perspective by giving cursory surveys of the evolution of systems in communications, in computing, and in information retrieval. Telesophy is proposed to be the culmination of clear historical trends.

2.1 Communications Systems

There are two main dimensions of the evolution of communications systems. "Media" deals with the types of information that can be transmitted and how they must be generated. "Distribution" deals with the means of the transmission and how the information can be spread.

The development of nationwide communications in the United States can be divided into somewhat arbitrary fifty-year periods: 1880-1930, 1930-1980, 1980-2030. This discussion is concerned only with "tele" systems, where the communications happens at a distance, preferably unaffected by that distance.

The first period saw the rise of telecommunications. These are communications systems with "immediate" distribution, where the information is generated on one end and continuously sent to the other end. The telegraph matured early in this period; it required centralized operators conveying the medium of text. The telephone matured later in this period and began to displace the telegraph; it permitted personal communications in the medium of voice. Both of these systems were single media, immediate communication with point-to-point distribution whose network spread nation-wide.

Telecommunications continued to evolve in the second period with the rise of broadcast systems. Voice came first with radio. It was later largely displaced by television, which permitted video pictures as well. The broadcast systems, particularly television, permitted a multiplicity of information media as well as wide distribution from a single resource point. But the expense of the distribution effectively limited the number of sources. Even cable television, which matured near the end of the period, permitted only a selection from fewer than 100 channels. And only limited immediate interaction was possible.

The second period saw the rise of telecomputing, a new type of telesystem. These are communications systems with "deferred" distribution, where the information is generated on one end and stored, for later transmission to the other end. Coming with the rise of computing systems, they formed the initial merging of computing and communications. Electronic mail systems matured through the middle of this period. These permitted point-to-point deferred communication, including the shipping of "files" containing previously generated material. The initial implementations had specialized networks, such as Telex, handled by centralized operators. As with the immediate telecommunications they had grown out of, these were gradually displaced by personal computer-based message services able to transmit messages directly from a user's terminal across a variety of networks.

Telecomputing continued to evolve during the second period with the rise of remote databases. Although the media tended to be limited to text, these permitted selective mass distribution for the first time. That is, unlike the broadcast systems, point-to-point selection from many sources was possible. Initially, the databases were formal and specialized for bibliographic use. But, as an outgrowth of personal electronic mail, community databases came into being. These were implemented as informal electronic bulletin boards.

Towards the end of the second period, a person could thus expect a variety of communications techniques to be available. These included: interactive voice, broadcast video, deferred text, and some database access.

The third and current period will see the rise of telesophy. These are communications systems with "synthesized" distribution, where the information is selectively generated on one end and stored, for later transmission to the other end. These provide more functionality than the electronic mail systems, because the information sent is selectively generated based on specifications provided by the receiving end. These also provide more functionality than the remote database systems, because the synthesized information can be stored for later selective use.

Telesophy systems will gradually displace telecomputing systems since the user references to information are transparent, independent of the stored physical location. "All" information available electronically is available locally. The distribution is handled automatically by the (single) network so that plugging a source into the network makes it selectively available to all. Any information from any source is available from any terminal. Terminals will be developed that are capable of handling all of the common media simultaneously. Sources will include immediate (interactive) as well as deferred (stored) communications. A flexible method of locating and generating information will be provided.

A telesophy system is the final merging of communications and computing systems. It stores "all the world's information" in a "single computer" making it transparently available for retrieval, manipulation, and storage. The implementation of telesophy will be the fruition of the Information Age.

To speculate, the next period will see the rise of teleportation. These would be transparent physical systems in the same sense that telesophy systems will be transparent electrical (or optical) ones. That is, the complete state of any physical object can be made "instantaneously" available anywhere. To evolve out of techniques known thus far, such a system would actually be tele-copying rather than portation per se. That is, it would scan a physical object into digital electrical form, transmit the bits remotely, and reassemble the object from a source of raw materials using the transmitted description. The current technology for scanning is quite primitive. Such techniques as facsimile, television, and tomography give only a surface description of objects. Blueprint descriptions of the internal mechanisms of machines are currently transmitted remotely but the scanning decomposition is hardly done automatically or in real-time. An alternative approach, evolving from a telesophy system, would be to have search propagations take place as though the user was actually in a different physical location, so that the items returned first would be the ones "closest" to the place the user had been "teleported" to.

2.2 Computing Systems

Computing systems have also evolved towards the same ends albeit on a more compressed timescale. There are two main dimensions of such systems. "Sharing" deals with the facility for making information generated by one user available to others. "Interaction" deals with the facility for giving contextual feedback as to the current state.

A computer model consists of a set of components and their arrangement. The state of technology and economics determines both the power of each component and the effective arrangements. The basic components have not changed much over the years. They mediate communication between human and information. Thus they consist of the human interface, the user-visible terminal, and the information interface, the machine-visible resources, as well as the connections among these. The terminal handles the user input and output. The resources include those for storage (and archiving), printing, outside communication, and special purposes (computation, generation). The connection serves as the system bus.

Since general-purpose computing engines have started to become available, the computer model has changed fundamentally every decade.

In the fifties, computers were expensive and rare. Users had to reserve a time slot and come directly into the machine room to use one. In return, they had complete control of the machine and could get interactive feedback whenever they chose. This provided a part-time computer for those with enough technical expertise to operate it.

In the sixties, time-sharing began to be feasible on a wide scale. Users could access a mainframe computer remotely from a relatively inexpensive terminal. Terminals could be placed in offices and (occasionally) homes, allowing access at any time. The shared central resource facilitated the growth of information communities. But the mainframe provided poor interaction due to slow communication speed and system overloading. Computer time was expensive compared to people time so the mainframe was used primarily as a remote batch machine. Even the minicomputer time-sharing systems developed near the end of the decade typicaHy only provided fast batch (e.g. Unix ).

In the seventies, the economics shifted and personal computers emerged. People time had become so expensive compared to computer time that it became widely possible to purchase a complete computer dedicated to an individual. Hardware technology had improved sufficiently to permit this to be roughly the same size as the remote time-sharing terminal. And software technology had improved sufficiently to permit non-technical people to begin to use computers. This latter technology relied crucially on the interactive nature of the dedicated machine (e.g. Visicalc ). But the requirement of providing a standalone desktop computing environment placed severe limitations on the facilities available. The sharing of information inside a user community so easy with mainframes proved cumbersome with personal computers. And the system capabilities lagged far behind those for mainframes.

In the eighties, another significant model has appeared. Technological progress in networks and terminals has improved sufficiently to provide computing resources with the interactive nature of the personal computer and the shared nature of the mainframe computer. This new system model is essentially a "single" computer whose components are geographically distributed. The public resources such as mass storage, which are expensive, large, and noisy, are housed in the machine room as in the mainframe model. The private resources such as terminal displays, which are inexpensive, small, and quiet, are housed in the user's room as in the personal model. Since the new networks run faster than disk speed, permanent data can be stored remotely and transparently downloaded when needed. Since the new terminals contain large displays with significant local memory and processing, considerable feedback can be provided. This gives each user a personal computer in interaction without the expense and physical deficiencies of the mass storage, as well as full access to mainframe-quality facilities and information sharing. The network may also contain special-purpose computation machines to permit the processing as well as the storage to be distributed.

In the nineties, the "computer" as a distinct object will begin to disappear. The bus-speed network links will expand from the currently possible local networks to approach a global WorldNet. This network will have a very different character from the current global telephone network. A user of any appropriate terminal plugged into the WorldNet will have complete access to any permissible information source just as though he or she were directly connected to it. The physical location of the person or of the source will be irrelevant. The nature of the source will be irrelevant. (It could be a database in any medium from text to video or another person in any medium from voice to video.) This is only practical today in limited circumstances with separate terminals to display each source type (which effectively makes it unavailable to the ordinary person).

When the technology in accessing and browsing catches up, the shared information community will become world-wide. Users will not perceive computers or networks or terminals but merely portals into information space. These information portals will provide fast transparent worldwide access to information. This will be the fruition of telesophy, as communications and computing technology make possible the Information Age.

2.3 Information Retrieval Systems

2.3.1 The Twenty Year Cycle

There appears to be a twenty-year cycle in fundamental ideas in information retrieval. Each new wave represents new technology that makes it possible to come closer to modeling the real world than previously possible.

The first wave was around 1945. The technology focused around analog computers and microfiche readers. A notable proposal was the Memex by Vannevar Bush [Bush], who mentioned the ideas of browsing by associative search and generation of knowledge by search traIUs. These are pieces of the formalization here of sentimutation and nexogenesis.

The second wave was around 1965. The technology focused around minicomputers and disk stores. A notable proposal was the Library of the Future by I.C.R. Licklider [Licklider], who analyzed sizing and searching the Universal Library. Computing technology was advanced enough for there to also be a notable experimental system, NLS by Douglas Engelbart [Engelbart]. This pioneered ideas of interactive intellectual support, which even today have barely appeared in commercial systems.

The third wave is now, around 1985. The technology focuses around wideband networks, graphics workstations, and typed object environments. This report testifies that the notable proposal is Telesophy and that the technology is ripe to implement it on a wide scale. Thus the next twenty year period may well see information retrieval across large networks of information at last gain mass usage and recognition.

2.3.2 Existing Limitations

A variety of information retrieval systems have evolved to the point of wide-spread implementation. Each of these grew up around its own specific application. A few of the most prominent are briefly discussed below with comments as to why their model is inappropriate for telesophy.

Database systems grew up around accounting applications, notably payroll and other employee records. Thus the databases tend to be large with a few short alphabetic/numeric fields for each record. The matching is exact since complete accuracy is important. This differs from a browsing system where the field contents are many and varied and the matching need only be close. (It is satisfactory in a browsing system to let the user separate wheat from chaff by rapid scanning.)

Videotex systems grew up around news and shopping, as a more current electronic version of existing services. The databases tend to be small with shallow menu traversal and limited search capability. The pictorial display is restricted by the slow current network transmission to low-quality block graphics and line drawings. Unlike traditional database systems, which are essential for modern financial transactions, videotex systems have yet to find a niche. They are too slow and costly to compete effectively with newspapers and catalogs. A browsing system needs to search larger amounts of information much more quickly.

Bibliographic database systems grew up around reference information for professional libraries. Thus the databases are large with short rigid fields and fast associative search. They utilize the financial database technology for more general-purpose information. They are a good start towards a browsing system in content available but are only effective for exact matches. When an appropriate set of keywords is known, bibliographic searching is useful [Lancaster]. But neither the query matching nor the view switching is flexible enough to help much in the browsing case, which often involves vaguely understood categorization.

Community database systems grew up around electronic bulletin boards. They are an electronic version of a club newsletter, one in which the submissions are from members who may be geographically distributed over a wide area. Accordingly, they consist of small databases containing messages on a specific topic that can be read using standard electronic mail technology (i.e., bulletin boards are specialized mailboxes). The search tends to be minimal since only those interested in the topic belong to the club. Such databases form an essential portion of the information base for a telesophy system, the informal complement to the formal bibliographic information. But a browsing system must also provide support for the higher-level problem of locating an appropriate community for the user's current topic of interest.

2.4 Browsing Systems

What is browsing like in the real world? Typically, people have some reasonable idea of what information they need to solve their problem, but only a vague idea of how the poorly-formed categorization in their head matches the precise categorization in the world's databases. A few systems have been developed to deal with the type of flexible searching necessary to perform this matching, albeit on a limited scale.

The NLS system [Engelbart], developed at Stanford Research Institute in the middle sixties and later commercially sold by Tymshare as Augment, provides a general framework for structured interaction with information. Information is represented in hierarchical form, with arbitrary cross-references permitted between the units, which tended to be of paragraph length. NLS thus directly implemented the concept of an information space. As part of a more general scheme of augmenting human intellect via interaction with computers, this project also did seminal work on interface devices, developing the mouse and the chord keyboard. Done in an era when time-sharing was just beginning to make interactive computing possible, the system itself was used only by a few researchers on relatively small amounts of data. But it inspired a generation of subsequent work.

The Smalltalk programming environment, developed at Xerox PARC in the early seventies, uses browsing as a fundamental metaphor to provide a self-explaining system [Smalltalk]. Programs are a particularly good example of interconnected but precisely linked pieces of information. So a small number of properties and hierarchies can provide effective location capability. View switching was provided among four standard levels of views. Browsing also showed the value of other pioneering features of the environment, both syntactic ones, such as the multiple windows used to display contextual information, and semantic ones, such as the uniform object format enabling the generic search to access all information. Smalltalk is in everyday use by a few groups in the research community and its graphical interface has made a strong impact on the commercial personal computing world. A few attempts were made to adapt this particular strategy for browsing to more general information domains, most notably for answering questions from a history textbook [Weyer].

The Spatial Data Management System, developed at the MIT Architecture Machine Group in the late seventies and later at Computer Corporation of America, provides a concrete graphical representation of an abstract information space [SDMS]. It demonstrated a true local multimedia browsing system with significant traversing and zooming capability. The information space is shown using a multi-media display for the various data objects. The user can browse the space by pointing to an object then zooming in on it to display different levels of detail and representations of its parts. Since the spaces were small and local, use of a relational database for region formation was adequate. A demonstration version of SDMS was installed in the U.S.S. Carl Vinson, an aircraft carrier, to provide a real-time spatial display of ships and their location on a sequence of maps.

The Xanadu system [Nelson] is a vision of publishing in the future. In this view, "books" are actually data structures of pointers to other objects and new books consist of commentaries and revision of references to old ones. Extensive support for recording the versions of a work should be available. Unlike the above projects, which produced working systems with some local usage and widely shown demonstrations, the Xanadu project is a global vision with implementations promised but yet to appear.

Reference books provide an excellent source of inspiration for searching systems, particularly as they have evolved over a much longer period than computing systems. A thesaurus is perhaps the best example since it illustrates all three of the major types of search: associative, hierarchical, and relational (see [Soergel]). A thesaurus is laid out in a hierarchy that represents a semantic taxonomy of all possible concepts. Its index is a set of pointers that, given a word, permit associative access into the concept tree. There are also sideways links within the taxonomy suggesting related concepts ("see also", the q.v. of an encyclopedia, which makes the taxonomy a forest of linked trees).

As will be discussed in more detail below, sentimutation in a telesophy environment facilitates all of these types of searching. The problem is much harder than previous browsing systems since the available information is so much more extensive. The basic solution is considerable interaction to view switch until a sufficiently restricted domain on which to do actual examination has been located. That is, traversal and formation is done initially on more abstract concepts, and the domain narrowing is done by the user rather than by the system.

3. APPROACH

This chapter discusses the point of view adopted by the system. It gives an overview of the concepts discussed in detail in the rest of the report. It ranges from the fundamental underlying theory of communication to the model of information usage to an architecture for an environment embodying this. This template for system design, from observation to philosophy to architecture, is an important underlying facet of this report.

3.1 What is Communication

In the final analysis, all information involves communication between humans. That is, information is generated by someone from direct observation and examination of previous information. At some later date, this information is then communicated in some form to some other person. This information may be freshly generated direct from a person's head via conversation or it may be refined through many generations of workers via print. In either case, communication can be viewed as transfer of information between databases. Effective communication is understanding understanding.

The major problem in communication is thus the protocol conversion involved in the transmission. This is the major effort involved in constructing communication systems today where the concentration is on syntactic issues such as bits and packets. But a complete theory of communication must include semantic issues as well. The differing organization of the databases between which information is being transferred is a much greater impediment to effective communication. In a foreign country, for example, effective communication is possible with a few words and gestures as long as the semantics of the situation is understood. The transfer of information is not passing bits over a wire but causing a desired state change in the receiver. (In conversation this is producing understanding in the listener's mind rather than phonemes in his or her ear.)

This state change is the important function of communication. This is true whether the source is human or machine, voice or text. A source appropriate to the nature of the interaction must be located. It must contain appropriate information, of course. But just as important, the information must be conveyed in some appropriate style and level. To some extent, the source is tunable as to exposition (more so for humans than for machines). However, the most effective method of communication for a human receiver is to first locate an "appropriate expert".

Communication is thus a problem of network information retrieval, where the transmission is not complete until the information has been retrieved and incorporated into the local database. Of course, the converse is equally true and that is really the point of view being taken here, namely that the problem of the Information Age is location of and communication with an appropriate expert. The notion of a sophport implicitly assumes that the receiver (requester) of the information is a human (rather than a machine). Thus the system must be inherently interactive. This is because the goal is not transfer or matching of information but causing a state change in the mind of the requester. The queries do not produce "right" answers but merely satisfactory ones based on problem and context-dependent constraints. The state changes "are" reality. Hence, for all practical purposes, Ecology is more important than Epistemology.

A system embodying this theory of communication tends to be "dumb interactive" rather than "smart batch". As the technology for representation and inferencing improves, the level of exactness required for specification of a query to be matched will become less. But an automatic solution to matching the categorization in the user's head to the categorization in the system's database does not seem likely to appear soon.

A successful environment for searching must thus provide fast interactive browsing, switching the point of view and pointer following. Such an environment is an assistant, who transparently handles the location changes, rather than a consultant, who understands the problem under consideration. Such an assistant falls midway in both price and performance between a book, which provides static mass standard information, and a consultant, who provides dynamic personal custom information. Similarly, the resulting system is a guide, leading the user to potentially appropriate domains, rather than an oracle, answering directly the user's domain-independent questions.

3.2 The Understanding Taxonomy

An attempt has been made throughout this report to use a consistent set of terminology for the different levels of understanding. The levels are laid out below.

Data is the lowest level. This is "raw" bits, i.e. what is actually out there in the world.

Information is the next level. This is "categorized" data, i.e. what identification is on the package.

Knowledge is the next level. This is "filtered" information, i.e. what results you get when you specifically search for something.

Wisdom is the highest level. This is "understood" knowledge, i.e. what you really need to know to solve your problem.

Since wisdom is the goal, the function of a telesophy environment can now be understood. Telesophy is "browsing through knowledge (point-of-view information packaging data) seeking wisdom". Recall that the word "telesophy" in fact means "wisdom at a distance" to indicate the seeking of wisdom across the network by interactive navigation through information space.

3.3 System Mapping

The telesophy environment attempts to directly model the theory of communication (theory of knowledge) briefly mentioned above. There is a mapping from properties of human information use to the underlying system model to the system architecture. This mapping is laid out below. The pieces are then discussed in more detail throughout the rest of the report.

It should be noted that all technology assumptions here are conservative. No breakthroughs are required to implement the architecture on a wide scale in the relatively near future (5-10 years for initial business use).

The mapping has been divided into three levels each with three dimensions. These divisions are somewhat arbitrary: they correspond to the exposition divisions but the components are intermixed in the actual architecture and implementation. The levels deal with: observation, philosophy, and architecture. The dimensions deal with: representation, location, and generation. All of these have been mentioned briefly in the Introduction.

3.3.1 The Levels

The observation level is concerned with basic properties of the use of information. These could be considered as basic properties of humans that are part of the folklore of history and sociology but are rarely embodied in systems or tools. These observational foundations should directly reflect empirical properties of the world.

The philosophy level is concerned with the basic tenets of the system model. These constitute the semantic base for the functionality of the system and what a user should understand in order to use it. The philosophical foundations should directly model the basic properties of the use of Information

The architecture level is concerned with the basic operations of the system itself. These constitute the basic concepts and external behavior of any particular implementation. The architectural foundations should directly reflect the basic tenets of the system model.

3.3.2 The Dimensions

The representation dimension is concerned with the storage and relationships of the information itself. At the observation level, the inferred properties are that information is characterized by dynamic interconnections with each piece depending upon others. At the philosophy level, the inferred tenets are that all available information resides in a single space where any subpiece may be a pointer to some other piece of information. At the architecture level, the inferred operations are that addressing and caching are transparent. That is, the entire WorldNet is viewed as a single computer and the implementation insures that referenced information is fetched into local memory in manipulable form regardless of its current location or current representation.

The location dimension is concerned with finding some information satisfying a particular need. At the observation level, the inferred properties are that searching is done by following pointers, by repeated trials of examination and new search. At the philosophy level, the inferred tenets are that sentimutation is supported so new queries may be issued to augment the local region and so selected current units can be followed into their surrounds in the single space. At the architecture level, the inferred operations are traversal, formation, and examination. That is, currently fetched information can be selected and displayed, and new queries formulated to send out into the network for matching. The implementation requires a uniform facility for packaging information and linking pieces together.

The generation dimension is concerned with creating new knowledge. At the observation level, the inferred properties are that knowledge is generated by gathering existing information, manipulating it, and storing the result. At the philosophy level, the inferred tenets are that nexogenesis is supported so local 'regions may be named and modified in addition to being transparently stored. At the architecture level, the inferred operations are that associated operations for each type of information are available for manipulation purposes and that the local cache is written through back into the network. The implementation requires a uniform facility for typed data.

3.4 Subsumption

The system proposed here is a very general one precisely because information is at the heart of nearly everything. Accordingly, a good implementation should replace many existing more specialized systems whose functionality a telesophy environment contains as a special case.

Telesophy combines in a fundamental fashion the ideas of computing and of communications. Operations are issued on local information transparently of the type. In this sense, a sophport is a complete personal programming environment, such as that available for Lisp or Smalltalk, which contains everything necessary for dealing with the system. A second form of transparency in the sophport is that data is internally fetched from the network so that the entire WorldNet appears to be part of the local buffer. Thus there is a complete communications subsystem underlying the system. A user of a sophport perceives only this portal into an abstract information space, the underlying computers and networks being hidden. Thus as Telesophy becomes the prevailing style of information handling, computing and communications will disappear as independent entities, both hardware and software. Some specific examples are given below.

A telesophy environment is basically a structure editor on datatypes that caches the data in from a network filing system. This description merely demonstrates that editing has become the primary paradigm for interactive computing systems. But generalization from other popular computing paradigms is ,'~n equally valid style of viewing the sophport. In each case, the generalization is primarily the permitting of arbitrarily typed data and the support by a full-fledged network programming environment. A sophport could be considered a spreadsheet program; these even have the pointer links if limited to numerical support. A sophport could be considered a database program; these even have the search capability if limited to exact matching. A sophport could be considered a forms program; these even have the data definition and validation if limited to unlinked pieces.

A telesophy environment is basically a network supporting a single symbolic namespace. This description merely demonstrates that network provision for distributed databases is becoming a primary paradigm. The old paradigm of explicit naming and explicit file transfer is becoming obsolete although still by far the most common case. Strictly speaking, communications services such as electronic mail are redundant in a sophport. Since users are always connected to the single network, they need merely create a piece of information and name it as being owned by the desired recipients (e.g. via locating them in a directory). The saving of the message and its subsequent routing to some (remote) database are handled automatically by the underlying system.

Manipulation of specific types also is handled by the telesophy framework. If new video editing or document production programs are written, they provide the designated operations to data of the appropriate type. Indeed, providing support to modify and display "any" medium requires a considerable library of support programs.

4. INFORMATION

This chapter tries to give an impression as to the enormous array of different types of information. Note that the uniform packaging with protocol translation for retrieval and display permits all media amenable to the user terminal to be handled.

4.1 The Spectrums

There are a variety of dimensions that can be used to characterize information. Each of these has

values ranging across a spectrum. A few of the most important spectrums are mentioned here.

Currency is how frequently the information is updated. This ranges from dynamic databases like wire services that are continuously changing to static databases like library books that are rarely changed.

Categorization is how frequently the classification of the information is updated. This ranges from random databases whose classifications change continuously such as community bulletin boards to thesaurus databases whose classifications have a published archival hierarchy such as bibliographic citations. Categorization is independent of currency. For example, weather reports are dynamic in currency but static in categorization.

Level is how self-contained the information is. This ranges from technical literature requiring an expert to understand to popular literature accessible to the general public.

Refinement is how carefully the material has been prepared and how many iterations of thought the underlying concepts have undergone. This ranges from spur-of-the-moment notes quickly jotted down in a few minutes to monumental books painstakingly reworked over many years.

Editing is how carefully the material has been checked for conformity with standards of quality, content, and style. This ranges from community bulletin boards with only general sociological approval as a filter to commercially published literature with expert reviewers requiring specified revisions.

4.2 The Types

There are many different types of information currently available. As a suggestive illustration, some of the major classes are listed here. Clearly there are many other categories and spectrums, and the divisions given here are somewhat arbitrary. Note these do not include the cross-products that would be formed by using different media.

Sensors are databases that are updated almost continuously. These include: wire services, stock quotes, and weather reports. The information decays quickly and has a standard classification for rapid retrieval.

Notes are databases that contain short personal comments. They are useful not only to the generator but also to colleagues as a personalized categorization of information. These support searching by indirection through someone else's pointer region, i.e. "who would know what was relevant on this topic?".

Bulletins arc databases that contain information of interest to a specific community. As electronic clubs, they are a more centralized version of notes. As general postings, they are a form of localized broadcast.

Announcements are databases that contain notifications of events. They include not only future events such as calendars listing upcoming lectures, but also past events such as publication of a paper in some topic of interest. The latter case of SDI (Selective Dissemination of Information) illustrates what the most general feature would be: a facility that notifies people of changes in information anywhere in the WorldNet that fit their interest profiles.

Magazines are databases that contain edited articles. These articles are typically longer and more carefully prepared than bulletins or notes and must fit the style and content of the magazine. There is a spectrum of currency and permanency that runs from moderated digests on a bulletin board to electronic newsletters to physical newsletters to commercial magazines to technical journals.

Directories are databases that contain pointer information. The objects pointed to include: people, organizations, and services. The pointers include a variety of addressing information such as telephone numbers (voice), room numbers (physical), and machine logins (electronic). Most of the browsing activity takes place in a directory of some sort, such as a directory of databases, until the "actual data" is located.

Reviews are databases that contain evaluative pointers. They differ from directories per se by including comparative or subjective information about the objects rather than primarily objective properties. A review is often consulted prior to a directory; colleagues' notes are merely a less formal version of these. Reviews include: consumer reports, meeting notes, and technical evaluations.

Archives are databases that contain "permanent" information. Items residing in these have been through a process of editing and refinement. Such archives include the contents of libraries and of bibliographic databases, which contain pointers to library material. Libraries contain not only physical material such as books but also electronic material such as programs.

4.3 Electronic Information

It is not commonly realized how widespread electronic information already is. A few samples are given here to convey some appreciation of how much information would be available in the WorldNet.

4.3.1 Formal

Current databases are mostly formal in nature. These include the bibliographic databases in

specialized areas alluded to earlier.

A typical such "database" is INSPEC. This is maintained by the IEE (the British analogue of the IEEE) and contains one page summaries of articles from 2500 journals in electronics and computer science. It contains roughly one million entries covering the past ten years. The articles are summarized by professional reviewers who also classify each entry by several keywords chosen from a specialized thesaurus for INSPEC.

A bibliographic database program typically is available on-line from a "service". This is a company providing a computer center and a set of accessing programs. They obtain databases from information providers such as the IEE and make them remotely available over packet networks such as GTE Telenet. A typical such service is the Dialog Information Retrieval Service, which provides remote access to some 250 different databases in distinct areas ranging from medicine to engineering to business. Their retrieval program can do exact string matching on several fields, including keywords and authors, and takes less than a second to return all matching items.

There are several books listing services. For example, "The Encyclopedia of Information Systems and Services" [ServiceDir] is 1000 pages with each page listing 2 services. At an even higher level, there is also a "Directory of Directories" [MetaDir], which lists 2000 such service directories. Although some overlap exists, the specialized databases tend to be disjoint so the computed total here of a billion databases each with a million entries is probably a considerable underestimate of what is currently available electronically. Note that it is still relatively uncommon for the entries to contain the referenced item itself rather than a text summary.

4.3.2 Informal

A major factor in the coming enormous growth in electronic information will be the mass penetration and maturity of informal databases.

These are the community databases alluded to earlier. They are newsletters containing messages by members of a special interest group. Each contains discussions on a specific topic by a geographically distributed community. Sometimes the discussion is moderated, but more frequently group approval and interest determines the content. It is similar to a club whose meetings are detached in both time and space.

Community databases grew out of the facility of electronic mail systems to make widespread distribution easy. That is, distributing to a group required no extra work for the sender since it was handled automatically by the system. Such databases have proven to be the most useful and popular service provided by all major packet and data networks. This was true for the ARPAnet, originally intended for remote computing, for the U.S. videotex trials, originally intended for news and shopping, and for the personal computer networks, such as The Source and Compuserve originally intended for videotex. In addition to these public facilities, numerous personal computer owners have set up private bulletin board services. (These private administrators are referred to as "sysops", short for "system operators".)

Such activities show that sharing information inside a community of interest satisfies a basic human need. The ability of electronic communications systems to make information sharing independent of time, space, and previous acquaintance shows such systems to be an area of enormous potential [Turoff]. The existence of a transparent WorldNet is certain to greatly accelerate the spread of such information communities. In many respects, the major goal of a telesophy environment is to provide direct support for sharing within an information community.

The technology for publishing will also fuel the rise of community databases. "Publishing boxes", optical-disk-based retrieval systems with built-in search capability, are already starting to appear. By the time the bus-speed wideband network reaches the general public, such boxes will be inexpensive enough to allow mass distribution of magazines for low initial investment. That is, a potential publisher need only buy a publishing box and plug it into the network. This is a more extravagant example of private decentralized enterprise currently serviced by bulletin board sysops, just as the future resource centers serviced by the central office will be more extravagant examples of the public centralized enterprise currently supported by Compuserve and Dialog.

4.3.3 Future

In the future, the number of electronic information databases will be even more staggering as new generations of producers emerge to meet increased demand.

Part of this will result from widespread availability, the size of the community attached to the WorldNet. Part of this will result from extensions to new media, such as videodiscs permitting interactive novels, and combinations of old media, such as graphical foregrounds and video backgrounds permitting teletrips with realistic point-of-view simulation.

And part of this will just be that navigating in information space is better than the real thing. It is far more flexible. Browsing the WorldNet will be like playing an Adventure game with an inexhaustible domain!

5. MODEL

This chapter lays out the fundamental tenets underlying the telesophy system. These form the foundational concepts that must be supported by the architecture. They are what the system embodies and what the user must understand. The rest of the report then lays out the architecture.

As mentioned in the section on System Mapping, there are three major tenets: WorldNet, Sentimutation, and Nexogenesis. These correspond to the single information space, a style for locating information in it, and a style for generating new information from old. These tenets are the primary dimensions at the philosophy level of the system description. Each is discussed in its section below. The observation level for each dimension is given as motivation at the start of its section. These tenets, which determine the functionality of the system, should accurately reflect real-world properties of the use of information.

5.1 WorldNet

The first observation to make about information is interconnection. That is, no piece of information stands on its own: it is dependent upon and related to others [Burke]. For example, the price of cereal in the grocery store depends directly on the amount of rain in the Midwest and the health benefits of the Teamsters. Often, the value of the information depends on interrelating the values it is linked to and this "recomputation" must take place dynamically. For example, choosing the route to drive to work depends on the time of day and the current weather conditions.

The first tenet of the model is thus WorldNet. That is, there is a logically single space containing "all the world's information". In actual implementation, there are numerous physical networks each connecting numerous information resources together. What the telesophy environment does is to provide location transparency so that information can be stored and retrieved from these subnets as though they were a single network.

Linked Information Units form the information space. These are partitioned throughout the physical network topology. The space is a graph structured like a forest of linked trees. Single categorizations of information do not reflect real-world usage, either for location or generation. Each information unit exists in many trees as the child of many parents and the parent of many children.

Considering the sophport as a portal into information space, the user's commands select regions to be viewed then modify this local space. The process of selecting a region involves issuing a query locally, which progressively propagates into further subnets. Resources with matching information return those results. As these are fetched into the local cache, the information must be converted into the canonical format. Note that since the information is fetched interactively in closest-first order, query results can depend on physical location in the network.

5.2 Sentimutation

The second observation about information is browsing. Once it is realized that all data is interconnected, desired information can be found by following pointers until something satisfactory has been located. This browsing paradigm for search terminates when appropriate sources have been found. The pointer following consists of repeated switching of views, repeated jumping to parents of different category trees. For example, locating an article on cockroaches as neurophysiological specimens can lead to the articles on pest eradication by jumping to the region of the space on cockroaches then over to the region on pests.

The second tenet of the model is accordingly sentimutation. This is a flexible form of view switching. There are potentially many viewing contexts, each with its own buffer and feedback.

Each buffer forms a local region of information space that the user is currently manipulating.

Buffer contents are generated by region formation, i.e. the results of a query spreading out into the network or a new subpartition of an existing query. These contents are then examined by region traversal, i.e. selection and display of information in the buffer. The view switching is accomplished by zooming in on a selected item to reveal its subpieces then following the links to other items that point to those subpieces. This allows examination of the surround for a different categorization for the item. Both associative search (via queries) and hierarchical search (via following) are supported. Since the space is a graph rather than a tree, the following actually supports a general relationship search.

The ability to generically follow links and examine items relies on a uniform packaging format for the information. These information units (IUs) provide the classification and connection for the information space. The size of a unit can thus vary from a whole database (or larger) to a single paragraph (or smaller). A number of dimensions are used for the classification including: topic, level, length, refinement, and currency.

The basic rationale for sentimutation is that single categorization is unsatisfactory. Each piece of data is "stored" simultaneously in many IUs so the traversal must facilitate switching between these different points of view.

5.3 Nexogenesis

The third observation about information is filtering. As a user switches views during a search, previously unrelated information is gathered along the way. This must be filtered and stored to generate new knowledge after modification to customize it. Expertise can thus be seen to be knowledge, or filtered cached information.

The third tenet of the model is accordingly nexogenesis. This is automatic storage of gathered information. Transparent write-out of cached IUs is done to support nexogenesis just as transparent read-in of cached IUs is done to support sentimutation. The newly created IUs can be named for later retrieval by specifying their properties.

Creation is more than merely gathering. Filtering and manipulation are also required. To perform these, there are a set of generic operations on all types plus specific operations for each type. When an information unit is cached into the local buffer, the operations associated with the types of its contained data are also cached into the local library (including transformation into canonical form).

Facilities for history must be supported. Since there is only a permanent copy of gathered information, rollback is permitted to undo changes that are no longer desired. Future queries may be stored to provide a kind of SDI. The system also exhibits a form of learning: local caching modifies the nearby regions of space to contain information more of interest to the user.

The basic rationale for nexogenesis is that new information is formed from filtered snippets of old. Jumping from "book" to book gathering units of "paragraph" size, the user forms a dynamic virtual book containing his or her experiential knowledge gained from searching the information space.

6. TELE I - Hardware

The remainder of this report describes the architecture of a telesophy environment. The goal is to demonstrate the feasibility of constructing such an environment and outline its construction. This chapter and the subsequent discuss "tele", providing transparent caching over the network. The next three discuss "sophy", providing the knowledge location and generation. Finally, the "sophport", the underlying operating environment, is discussed.

This chapter concentrates on the hardware support for "tele": the single computer model, the network, terminal, and resource technology. The subsequent concentrates on the software support: the fetching, the addressing, propagating, and caching. It should be noted that the technology, particularly the hardware, changes very fast. Any data should be considered as a representative sample circa mid-1984.

6.1 The Single Computer Model

The fundamental idea behind "tele" is that of location independence. That is, access to information should be independent of both its location and the location of the requester. This casts the notion of a WorldNet into the notion of a single computer consisting of "all" the world's computing and communications systems. The local workstations are then merely remote terminals connected to this computer for retrieval purposes. The browsing environment provides yet a higher level of abstraction and makes the terminal viewable as a portal into information space.

The primary components of a "computer" have not changed much over the years. They consist of the user-visible terminals for interaction and the system-visible resources for storage, computation, and printing. There is also the system bus, the connection between these. Although the current telephone network permits connection worldwide, its speed does not allow such connections to be transparent to the user. This will change since the coming wideband telephone network transmits far faster than any available components. That is, the network itself will serve as the system bus.

As mentioned in the History chapter, the single computer model attempts to combine the advantages of the mainframe computer and personal computer models. Community sharing is facilitated since information is stored in the single (geographically distributed) central resource as in the mainframe model. Interactive feedback is facilitated since the local context is stored and local manipulation is performed in the terminal as in the personal model. Thus the terminal is used for small manipulation and good display while the resource is used for large manipulation and good storage. Each is used to its best advantage without the straining necessary for an isolated personal computer to provide extensive computation and storage or an overloaded mainframe computer to provide extensive feedback. The single computer model also has the mainframe advantages of common facilities and support.

With the current media used in computing systems, text and graphics, the network must run at disk speeds in order to be transparent. The maximum disk transfer rate might conveniently be thought of as 10 megabits/second. Such speeds have existed in local area networks for over a decade, e.g. Ethernet. These have typically been used for file transfer rather than data caching. Commercial networks are now beginning to appear that implement the single computer model on a local scale. One notable example is the Apollo Domain.

In an Apollo ring, a workstation node may access information on any disk connected anywhere in the network [Apollo]. Each disk is controlled by a server node that handles requests from other nodes. A user requests information by giving a network address, consisting of the server name and the file name. The file is then downloaded over the network to the local memory of the requesting node, where execution takes place. A single namespace can be provided by specifying links on each server to file hierarchies that are available remotely.

6.1.1 Beyond Local Area Networks

As the size of the network grows beyond a building or campus, a change in strategy is necessary to maintain transparency. Although the network transmission is still faster than disk speed, the propagation delay over long distances is significant in the time scale of computer memory accesses. For example, it is on the order of 30 microseconds for a 100 mile roundtrip acknowledgement. Maintaining access speed transparency with this degree of protocol overhead requires downloading of larger amounts of information, megabyte chunks at a time. This transmission pattern, which has more the flavor of downloading than demand paging, also helps reduce the possibility of network blockage.

The underlying assumption for access transparency is point-to-point connection. The network must have adequate switching topology to provide complete routing (and must provide ultra-reliable transmission). This routing can be implemented with a hierarchy of pointer directories. That is, given a network address, the local switching machine or gateway must either have a direct connection or a pointer to another machine to handle the request. The number and levels of machines necessary depends on their capacity compared with the number of terminals and the corresponding traffic.

As the current telephone network demonstrates, a hierarchy of switching machines would suffice to handle a network the size of the WorldNet. For example, in the United States there are about 10,000 switching machines averaging about 10,000 lines in a 5-level hierarchy. Local PBXs extend this hierarchy by a level or two and increase the number of telephones well past the 100 million mark.

6.2 Execution Distribution

With the coming wideband technology, optical fibers, the network transmits enormously faster than any device that can be connected to it. (The speed is limited by the interfaces rather than the fiber itself.) So, theoretically, the different functions performed by the computer could be placed at any point in the network. In practice, the distribution might be as follows.

The terminal requests information and performs local processing on it. Requests are sent to a server that routes them to an appropriate resource. The resource gathers the specified information and sends it back to the server to be passed to the requesting terminal. The key components are thus: the terminal where local caching and execution takes place, the network over which cached information is transmitted, and the resource where the information is permanently stored. Current and future technology for each of these is discussed in more detail below.

To be effective, the actual distribution of resources must insure that most of the time most of the information being viewed and manipulated is already available locally. This is necessary to maintain "transparency" of access because the speed of transmission is limited primarily by the speed of the resource sending engine. In addition, the propagation delay is significant for non-local requests. So the local storage must be sufficient to hold a working set of information, to avoid network fetching overhead. If the resource is being used frequently enough, it should be moved "closer" in the network to the requester in order to prevent delays. (In addition to providing faster service, it may be more economical to buy rather than rent certain resources.)

6.3 Network Technology

Optical fibers are the primary technology that will make the extension of bus-speed networks beyond local areas possible. This is due to the fiber's extremely high carrying capacity (high bandwidth, small size) and extremely reliable transmission (low noise, small attenuation).

A sample terminal in the Single Computer Model might contain 10 megabytes of memory; fast downloading into this in I second thus requires a 100 megabits/second channel. Optical fibers are already commercially available that run at this rate and it is conservative to predict them being economic in the near future. Note that, without local caching, the sending engine is the bottleneck so the desired effective bandwidth rate is really the disk-speed number, approximately 10 megabits/second. This is as true for internal computer buses as for local area networks. Fiber has the ability to extend this rate over non-local distances. (Protocol overhead typically reduces the throughput to 10% of the raw channel speed. The full gigabit/second transmission rate actually needed to effectively download 10 megabytes in a second is not quite yet in the commercial domain.)

A brief discussion of some of the most 'important features of this technology considerable ongoing research, current commercial components are little more than mechanical butting or beam redirecting devices. This provides mechanical switching

This section deals with the memory and processing power for the terminal serving as the local cache. This power must be adequate to handle most of the information processing required.

A sample terminal for the Single Computer Model might have 10 megabytes of local memory and a processor equivalent to a current mid-range mainframe. As the cursory discussion below shows, such a terminal will be economically feasible for businesses in the near future.

The memory must hold the local working set and might be partitioned as follows: 2 megabytes for the system code (telesophy environment), 3 megabytes for the screen (1000*1000 resolution with 24 color planes), and 5 megabytes for the browsed information (10,000 IUs at 500 bytes each).

Typically, a terminal has trouble keeping up with data being fed it by a network. Current 24*80 character terminals often lose or garble characters at 9600 baud. In the Single Computer Model case, the network might potentially run two orders of magnitude faster than either end. Neither the disk nor the terminal processor can keep up with gigabit/second speeds with current silicon-based technology, forcing parallel i/o hardware to handle these rates. The terminal's main function is to serve as a memory-mapped i/o buffer for the network-wide system bus.

6.4.1 Memories

The capacity of memory of all kinds continues to increase rapidly. One megabit chips exist, both RAMs (read/write) and ROMs (read only). Development is already proceeding on four megabit RAMs. CCD's of 300-kilobit capacity have been developed by Phillips for use in digital TV's. Japanese firms are reputed to have 1125 line digital high-definition television systems developed already that incorporate megabyte-size memories, and express the belief that memories of this size are on the verge of being cost-effective [HighTech I]. In fact, megabyte memories are expected to cost $100 within 5 years [Elec I]. Memory in workstations will thus begin to approach that in current large mainframes, which typically have at least 16 megabytes.

6.4.2 Processors

The recently announced Motorola 68020 32-bit microprocessor, together with its associated floating point math chip, the 68881, has a processing capability roughly equivalent to a DEC Vax 11/780. [Elec II]. When combined with a memory management chip, a system would be able to support most of the features presently familiar to a user of a Vax 780 running the Unix operating system for about $500 in parts cost. Such processors are already starting to appear in workstations.

6.4.3 Workstations

The Apollo workstation nodes also give an indication of the hardware power currently commercially available. For approximately $20,000, one can purchase a Domain workstation with monochrome display of 800*1000 resolution and 3 megabyte RAM. The processor for this personal machine is roughly the "speed" of a DEC Vax 11/750. For approximately $40,000, one can purchase the same workstation with a 8-plane color display and 3 megabyte RAM. For approximately $60,000, one can purchase a Domain workstation with 24-plane color display of 1000*1000 resolution and 4 megabyte memory. The processor for this personal machine is roughly the "speed" of a DEC Vax 11/780. Apollo's strategy is to double the price/performance ratio every generation period. Empirically, the generations have been a year and a half apart. Since the cost range for a business personal computer is $5,000-10,000, extrapolation predicts 5-10 years for commercial terminals of the power needed to effectively implement the Single Computer Model.

6.5 Resource Technology

This section deals with the central relible exception of a few phone number databases. Note that the laser technology for photonic switching and optical storage could enable the ¬

6.6 Trl

The function of a switched network is to provide effective non-blocking point-to-point connection. Typically the number of total available connections is much smaller than the number of terminals plugged into the network. The allocated switch size depends on the traffic patterns. For example, with the current voice telephone network, the average call lasts for two minutes and the maximum switch capacity can only handle about 10% of the subscribers simultaneously [Bell]. Having terminals strictly within the mainframe model would require a completely connected network to handle requests since they would need to be continuously fed a display bitmap. Building switches to provide actual non-blocking is orders of magnitude more expensive than to provide effective non-blocking. Thus downloading to allow independent operation with short holding times on the network is critical for switch economics.

The location transparent access makes potential contention at the resources another critical item. This is the major failing of the mainframe model; overloading at the central resources causes inadequate response as the number of users increases. The resource center must have enough disks and query processing to support the local traffic without excessive blocking or thrashing. Considerable independent processing taking place in the terminals is necessary for support of large numbers of users.

A browsing network essentially permits the renting of information. Where a piece of information is actually stored depends on the required economics and speed for its access. Within the resources themselves, frequently used information will often be duplicated at many distributed points to provide faster local access. A person's local information space built by nexogenic caching speeds up access to frequently referenced pieces, since they will be stored "closer" in the network based on costs and times of physical links. Finally, heavily used information may often be more cost effective to buy in permanent form than to rent over the network. For example, many people would buy one general encyclopedia on CD-ROM or videodisc but rent access to all of the others that they reference only occasionally.

Browsing interactively database by database lends itself to downloading. After a query is processed in a resource, a packet consisting of all matches from the accessed database plus surrounding context can be transmitted. The propagation delay for accesses from non-local areas also makes using the powerful workstation in independent mode a necessity. To facilitate this downloading, the local memory must be adequate to contain a working set. That is, the length of time required to fetch information must be small compared to the length of time it will be processed locally. Thus the downloading must be "infrequent". With network transmission speeds in the gigabit/second range permitting megabyte packets to be demand paged, this becomes primarily a question of information type versus local memory.

6.6.2 Media

For the types of stored information currently handled by computing systems, the 5 megabytes of RAM available in the terminal is more than adequate. This includes: text (alphabetic or numeric), voice (stored sequences or canned messages), and graphics (static displays or video frames). Limited forms of animation could also be handled. But full-motion video (and animated graphics that are arbitrarily changing bitmaps) requires a far greater amount of storage.

A Single Computer Model terminal should be capable of displaying a moving digital high-definition television picture. Accordingly, part of its RAM is a 3 megabyte buffer holding a single frame at a time (I K* 1 K pixels with 24 bits per pixel, namely 8 bits describing each of 3 colors). Full motion display requires 30 frames per second, which is 90 megabytes/second uncompressed. Storing a reasonable segment of full-motion video, such as 10 seconds, would thus need nearly a gigabyte of local (RAM) storage. This far exceeds any forseeably economic working set memory. Downloading a complete movie, as in a "pay per view" service, is not feasible.

Until such memory is available, the Model is only feasible for "playing" video information. That is, a circuit switch is created to connect the terminal to an appropriate resource that will continuously feed it. Since the resource doing the playing is completely allocated during the period of time it is circuit switched, considerable possibility of blocking exists. For example, if 10,000 different video sources in a LATAnet are requested at any one time, there must be at least 10,000 videodisc players in the central resource facility to avoid blocking. This apparently rules out point-to-point full-motion digital video as an economic solution for the masses. In special cases of concentrated need, such as office buildings or apartment complexes, point-to-point video may still be economic.

Various forms of broadcast video are feasible. These include, for example, allocating a channel to display a program selected by weighted voting of the viewers or allocating a channel that contains still frames interactively selected by a number of users (the channel contains an intermixed parade of many frames but individual users are only shown theirs).

There are other possibilities for displaying video within the Model. Video playing intermixed with data downloading on the same line may be tolerable if the time between requests greatly exceeds the length of the displayed segment. This may well be true of the local decision or processing time in interactive novels or games whose resul