7 InfoCrystal Software ................ 101

The postscript version of this chapter.

Table of Contents.


Chapter 7

 

InfoCrystal Software

7.1 Introduction

This chapter provides a brief overview of the key features of the InfoCrystal that have been implemented and that have not discussed in great detail elsewhere in this thesis. The InfoCrystal has been implemented using the object-oriented MacLISP programming language for the Macintosh computer.

We will demonstrate some of the key features of the InfoCrystal by describing how it can be used to retrieve information. We begin by creating a structured-list or outline of our information need by using the query outliner tool, which functions like the familiar outlining tool available in word-processing packages (see Figure 7.1). Once the query outline has been generated, we can issue the command to have it evaluated and visualized. What does it mean to have an InfoCrystal query evaluated ? The atom or "leaf" nodes of the query structure represent the criteria that the user has decided not to break down any further. The atoms specify the query statement that a retrieval engine will use to search in the selected database(s). The choice of the query statement and its corresponding retrieval engine is absolutely flexible. The query statement could be a reference document and we specify that all the documents with a certain degree of similarity should be retrieved. The query statement could be a concept from a thesaurus and we could use the explode feature to retrieve all documents that have this concept as well as all its children concepts. The query statement could be a simple keyword or a complex Boolean statement. In short, the InfoCrystal works for any retrieval method and its retrieved set of data objects. The InfoCrystal uses an object-oriented design and it is therefore easy to support any data objects and their retrieval methods.

For the purpose of this thesis, we primarily used synthetic data sets created by using a random generator that would specify the database id of a data object. For each of the atom inputs this generator would select in a random fashion from a range of possible id values. This way of generating the input data streams highlights that the InfoCrystal works for arbitrary data object. We also developed and experimented with a database driver to retrieve book and technical report abstracts stored in the on-line library of the Laboratory of Computer Science at MIT. However, we were not satisfied by the slow retrieval performance. We decided to not invest more energy at this stage to improve its performance characteristics. We will design a diverse set of fast database drivers in the future when we have migrated the InfoCrystal to a more powerful platform.

Once the input sets at the atom nodes have been computed, then the retrieved data items are propagated through the query structure based on the way the different InfoCrystals have been programmed, i.e., how the interior icons have been selected. When we create a structured-list, we do not have to specify any operators. The current default is to select all the interior icons in an InfoCrystal, which is equivalent to the Boolean OR. Hence, we can observe at the root and top-level InfoCrystal the results of performing the broadest possible query. We have mentioned that retrieval specialists often suggest to searchers to generate queries, where quasi-synonymous words for each conceptual factor are ORed and these different synonym lists are then ANDed [Cooper 1988, Marcus 1991]. Our default selection of the interior icons generates a query that is equivalent to one suggested by retrieval specialists. The Boolean AND operator is only of relevance at the top-level InfoCrystal, because its inputs are the ORed synonyms, and the center interior icon reflects the effect of applying the AND to these inputs. A key advantage of the InfoCrystal is that it not only shows the effects of the AND operation but all the other possible Boolean operations involving the inputs. Similarly, our default selection of the interior icons retrieves the same documents as would be retrieved by a vector space query. In contrast to the ranked list generated by a vector space query, the InfoCrystal not only presents the documents in a ranked order, but it presents them in a structured way that reveals how the documents are related to the specified interests. The InfoCrystal emphasizes relationships and ranks them based on their relevance. The ranked-list displays the documents based on their relevance. The current implementation also provides a ranked-list interface that displays the individual items retrieved by an individual interior icon or by all the selected icons of an InfoCrystal (see Figure 7.3). As we mentioned in chapter 2, users can provide relevance feedback by selecting the items in the ranked-list that they consider as satisfying their information need. We could use this relevance feedback to determine which of the selected interior icons in the query structure should remain selected.

Once the InfoCrystal query structure has been visualized and its contents initialized, we can see how the retrieved data items distribute across different relationships provided we display the selected icons in the pie-chart or number mode (see Figure 7.3). An interactive state-sheet is associated with an InfoCrystal and it has a set of radio-buttons that indicate the visual style of the selected and not selected interior icons, respectively (see Figure 7.2). The following styles can be selected: icon&border, icon, point, number, and pie-chart. A state-sheet contains also buttons to perform the following actions (from top to bottom): to change the size of an InfoCrystal; to change the scale at which the interior icons using pie-chart style are displayed; to reveal or hide the interior icons; to display the interior icons using either the rank or the bull's-eye layout; and to descend in the query structure and make the selected child the new top node from which to visualize the query structure, or to ascend and make the parent InfoCrystal the new top node. The descend or ascend operations do not modify the query and they are equivalent to a zoom operation.

We can use the standard copy, cut, and paste operations to modify the InfoCrystal query structure. If we want to add a new input to an InfoCrystal, then we first need to specify the new input InfoCrystal by selecting it and using the copy or cut operation. Next, we need to select the recipient InfoCrystal and apply the paste operation. Hence, it is very easy to modify existing queries or to create new queries by combining and integrating existing InfoCrystal queries. We can also reorganize the query structure using click-drag-drop operations (see Figures 7.8, 7.10 and 7.11).

7.2 InfoCrystal Software in Pictures

We will now provide visual examples of the major InfoCrystal operations that have been implemented. In particular, we will show how we can create an InfoCrystal query structure, change its appearance at the structural as well as at the individual InfoCrystal level. We will show how we can navigate InfoCrystal query structure and descend or zoom in to be able to examine an input InfoCrystal in more detail. We will perform a what-if analysis by changing how the retrieved data is propagated through the query structure. We will show how we can modify the query structure by selecting an InfoCrystal, dragging and dropping it in the desired new location, where the structure is automatically updated and the content assignments are recomputed.


 

How to get started ? Create an outline.

Figure 7.1: shows the query outline that we need to generate to begin the process of retrieving information.


 


State-Sheet of an InfoCrystal


 


The query outline has been visualized as an InfoCrystal

Figure 7.3: shows what users will see when they execute and visualize the query outline shown in Figure 7.1. Only the root-node InfoCrystal is displayed. Its interior icons use the pie-chart style to show how the retrieved documents distribute across the different relationships. The button of the state-sheet that lets users change the appearance of the InfoCrystal is inhibited, because the interior of the root-node InfoCrystal is always visible, provided the crystal is visible. The button that lets users navigate up/down in the query structure is also inhibited, because by definition the root node has no parent, hence there is nowhere to ascend to. The ranked-list window displays a ranked list of all the retrieved documents, where the left column contains the weights and the right column the document ids. We can double-click on a list item to see its contents.


 


Displaying the InfoCrystal query structure one level deep

Figure 7.4: displays the InfoCrystal query structure one level deep by showing the root node and its children. Users can show or hide a child InfoCrystal by double-clicking on the criterion icon representing it in the parent InfoCrystal. Users can select a particular InfoCrystal by clicking on it, and the state-sheet will automatically be updated to reflect the states of the newly selected InfoCrystal. If users wanted to explore the four-concept InfoCrystal (shown at the very top) in more detail and promote it to be the new top node that is visible, then they need to select it and click on the "Descend" button (which is at the very bottom of the state-sheet, but not shown here).


 


Descending in the query structure

Figure 7.5: shows the result of selecting a child InfoCrystal and promoting it to be the current top-node that is visible. The navigation button in the state-sheet has changed to say "Ascend", because this four-concept crystal has a parent. If we were to select it then we would return to state of affairs shown in the previous figure.


 


What-if Analysis (before)

Figure 7.6: displays how the retrieved data distributes across the different possible relationships, when the data elements associated with the leaf-node, whose circular InfoCrystal is displayed in full detail and where we can see that its singular interior icon is selected (darkly shaded), are propagated through the query structure.


 


What-if Analysis (after)

Figure 7.7: shows the effects of suppressing the propagation of the data elements that are associated with the circular InfoCrystal shown in full detail (its singular interior icon is shown in solid white to indicate that it is not selected). One of the clearly visible consequences of this action is that there are now no data elements anymore that are related to the concepts "visualization" and "human factors" but not "information retrieval". The change in the distribution of the data elements will be readily perceptible because the size of the pie-charts will change and hence create a motion or animation effect. The suppression of this circular crystal is equivalent to dropping an alternate term and in effect reducing its parent to a two-factor crystal.


 


Reorganizing the query structure using click-drag-drop (before)

Figure 7.8: shows the same query structure as in Figure 7.3 and where none of the circular inputs are suppressed. In the next figure we show how the distribution of the documents will change if we promote the circular InfoCrystal, which represents the concept "graphical interfaces" and whose parent is a triangular crystal shown in the bottom right. We will add it as a new concept to the top level InfoCrystal. We perform this change by selecting the circular InfoCrystal icon, dragging and dropping in the border area of the root node InfoCrystal.
We elect to perform this change of the query structure because we want to find out how many of the retrieved documents are only retrieved by the concept "graphical interfaces". We also want to see how the distribution of the documents changes if we consider an additional concept at the root level.


 


Reorganizing the query structure using click-drag-drop (1st move)

Figure 7.9: shows how the distribution of the documents has changed after we have promoted the concept "graphical interfaces" and have added it as new input to the root-node InfoCrystal. We can observe that there are no documents that are only retrieved by the concept "graphical interfaces". Hence, we will drop this concept by selecting it and applying the cut operation.


 


Reorganizing the query structure using click-drag-drop (2nd move)

Figure 7.10: shows how the distribution of the documents has consolidated after we have dropped the concept "graphical interfaces" as an input concept. Next we will promote the alternate term "filtering" and add it as a new concept to the top level InfoCrystal. Again we perform this change because we want to find out how many of the retrieved documents are only retrieved by the concept "filtering". We also want to see how the distribution changes if we consider an additional concept at the root level.


 


Reorganizing the query structure using click-drag-drop (3rd move)

Figure 7.11: shows how the distribution of the documents has changed after we have promoted and added the concept "filtering" to the root node InfoCrystal. We can observe that the concept "filtering" retrieves documents that are not retrieved by any of the other input concepts. Finally we will promote the alternate term "database access" and add it as a new concept to the top level InfoCrystal.


 


Reorganizing the query structure using click-drag-drop (4th move)

Figure 7.12: shows how the distribution of the documents has changed after we have promoted and added the concept "database access" to the root node InfoCrystal. We can observe that are no documents that are only retrieved by this concept, and therefore we could drop it without losing any information.


 


Complex Query Structure

Figure 7.13: displays the outline for a deeply nested InfoCrystal query structure. We present this example to demonstrate that the InfoCrystal software can be used to formulate arbitrarily complex queries. In the subsequent figure we show one way that this basic outline can be visualized, where not all the InfoCrystals are shown in full detail.




Figure 7.14: visualizes the query structure shown in Figure 7.13. Users can easily view the interior of a crystal only shown as an outline by double-clicking it and this crystal and its children will be redrawn appropriately. This figure demonstrates the spreadsheet quality of the InfoCrystal, where users can make changes deep in the query structure and observe how these changes are propagated and affect the higher levels.


7.3 How to Drop an Input from an InfoCrystal ?

It is common that we have to initially experiment to find the most appropriate search concepts. There will be concepts that we will have to drop, because they do retrieve no or very few many documents that are not also retrieved by the other input concepts. There is also the possibility that too much information is retrieved, and users have to decide which concept(s) to drop. The InfoCrystal can be used to anticipate the effect of dropping a concept from a query. Each of the icons of rank one, i.e., the circular interior icon, represents the information that is not retrieved by any of the other concepts used to define the InfoCrystal. Depending on their specific needs, user can choose to drop the concept whose circular interior icon has a specific value or has the highest or lowest value of all the icons of rank one.
Users can drop a concept in the following two ways: First, they can just deselect the circular icon associated with the concept in question. This leaves open the possibility that users can easily change their mind at a later stage by selecting the circular icon again to include the information associated with it in the output of the InfoCrystal. There is also the possibility that they wish to suppress a concept that is in turn defined by several other concepts. In this case they just deselect all the interior icons of the InfoCrystal that represents the concept in question. Second, users can drop a concept by eliminating it as one the inputs to the InfoCrystal. This is an irreversible action, but it has the advantage that it reduces the complexity of the InfoCrystal, because it decreases its dimensionality and therefore it reduces the number of relationships made explicit simultaneously. In the current implementation users can eliminate a concept by clicking on the InfoCrystal that represents it and applying the familiar Macintosh cut command, Command-X.

7.4 How to Add an Input to an InfoCrystal ?

There will be occasions where users want to add a further concept to an InfoCrystal, because they have discovered a further relevant concept in their exploration so far, or they want to move a concept from one part of the InfoCrystal query structure to another location. In former case users can add a further concept to the outline of the existing query by using the outliner tool. In the latter case users can make use of the fact that the current implementation of the InfoCrystal supports click, drag and drop operations. Users can add a new factor or concept to an existing InfoCrystal, called the receiving InfoCrystal, by clicking on an InfoCrystal that is not a direct input to the receiving InfoCrystal, and dragging it to and releasing it over the receiving InfoCrystal. This will have the effect of adding the selected crystal as a further input to the receiving InfoCrystal. Hence, users can modify in a visual way the structure of a query by moving its members into new positions by selecting, dragging and dropping them in the desired location. By rearranging the structure of the query hierarchy users, decrease the complexity for the InfoCrystal that loses an input, and they increase it for the InfoCrystal that receives a new input.

7.5 How to Update the Selection Pattern in a Modified InfoCrystal ?

If we modify an InfoCrystal by adding or removing one of inputs, then the question arises which of the interior icons in the modified InfoCrystal to select. If we add a further input, called I(add), to an InfoCrystal, then an interior icon representing the relationship R in the unmodified InfoCrystal will be split in two and will be represented by two interior icons in the modified InfoCrystal. These two interior icons represent the relationships (R and I(add)) and (R and (not I(add))), respectively, and they will inherit the selection status of the interior icon satisfying R in the unmodified InfoCrystal. However, we can not infer the selection status of the icon with rank one and that satisfies only the criteria represented by the new input, unless the complement of the unmodified InfoCrystal has a selection status assigned to it. In this case we can elect to select this icon with rank one as a default.
If we remove input, called I(remove), from an InfoCrystal, then the situation is more complicated. The interior icons, which satisfy the same criteria and differ only with respect to the criterion for I(remove), can be paired. These pairs of interior icons will be represented by a single icon in the modified InfoCrystal, which will inherit the same selection status as its corresponding pair of interior icons in the unmodified InfoCrystal, provided these two icons share the same selection status. However, if these two icons do not share the same selection status, then we can not infer the selection status of the corresponding interior icon in the modified InfoCrystal.