Four Critical Elements for Designing Information Exploration Systems
Beth Hetzler, Nancy Miller; Pacific Northwest National Laboratory
Four Critical Elements for Designing Information Exploration Systems. B Hetzler and N Miller. 1998. Four Critical Elements for Designing Information Exploration Systems. Presented at Information Exploration workshop for ACM SIGCHI '98. Los Angeles, CA. April 1998. PNNL-SA-29745
Designing an information exploration system requires attention to four critical components. Since information exploration is a highly interactive process, the user is a key element. The second and third critical elements are the presentation methods that are used to communicate information and the interaction techniques that enable that user to actively explore that information. Finally, powerful mathematics are needed to identify and manipulate features of the information. This paper describes how these four critical components can work together to flexibly meet varied user goals.
The User - Tasks and Paradigms
By its nature, information exploration is an interactive process, making the user important to design choices. The processes that are most natural for users and the human capabilities that allow users to interact productively with the information will help drive the design. And as Saracevic reminds us, humans and their information needs are diverse [Saracevic]. We believe that the process of information exploration must be flexible to accommodate various user methods. For example, one user may wish to focus on the similarities among documents; another may gain insight from the key themes. In their Data Visualization tutorial, Grinstein and Ward [Grinstein] list three main task types that visualization is used to accomplish: Production, Confirmatory, and Exploratory. Production is using visualization to communicate what is already known. Confirmatory refers to a situation where the user has a theory or hypothesis and wants to find evidence to confirm or refute that hypothesis. Exploratory is a situation where the user does know what they are looking for.
In practice, we believe that all three tasks, and especially the last two, are often intermingled. A user may start out by purely exploring, wanting to get an understanding of what's in the information collection. For example, a user may first want a thematic overview, then seek detailed understanding of topics in a small subset, and then explore how topics change over time. A user may search for trends, for structure, for anomalies, or for gaps. As the exploration continues, the user begins to notice things of interest, perhaps unexpected things. The user may develop a hypothesis about why the data are showing the characteristics observed. The next step is to test out the hypothesis. The user may seek data confirm it, may view the evidence over time, may find outliers which conflict, and so on. Once the user is satisfied regarding this hypothesis, he or she may proceed to explore other parts of the information collection, developing new hypotheses, and so on.
To be most effective in such explorations, the user must be able to employ multiple paradigms. A task and data type taxonomy developed by Shneiderman notes the need for both an overview of an entire collection and the ability to select a group for details on demand [Shneiderman]. We might call this a part vs. whole paradigm. Other useful paradigms for information exploration might include
- relative vs. absolute: understanding relationships within an information collection or understanding how this information relates to an existing ontology
- object-based vs. theme-based: understanding how the objects relate to each other as determined by their attributes vs. how the various themes and attributes relate to each other as embodied in the objects
The key idea is not the desirability of any particular paradigm, but rather that for information exploration, a user should be able to employ multiple paradigms as well as interact among them.
Visual Presentation Methods
We have found that visualizations can provide a powerful method to communicate the salient features of the information, using methods that are natural for the human mind to perceive and understand. As an example, one visualization method for showing an overview of relationships among documents is a cluster projection in two or three dimensions, where proximity indicates similarity. The Galaxies visualization shown in the first figure is an example of such a 2D visualization. Documents are clustered according to similarities in their thematic content. Interactions allow the user to query for particular terms, form groups, and see changes over time.
Galaxies projection of 12,000 documents
A new prototype visualization, Rainbows, starts with a 2D cluster projection and adds details on demand by providing the ability to display individual relationships explicitly as colored arcs between selected objects. Interaction tools allow the user to control the set of relationships displayed, the entities of interest, and the level of detail. Using arcs both above and below the plane, this visualization shows evidence both of associations and dis-associations between entities.
Rainbows view of relationships among entities
In another example, SPIRE's Themescape(TM) provides a visual summary of theme relationships and concentrations across a document collection [Wise] using a terrain metaphor. Mountains indicate domi-nant themes; valleys indicate weak themes. Interaction tools allow the user to probe for a list of themes dominant at any point, to view changes over time, and to select specific sets of documents for exploring in greater detail. One way of showing such detail might be to use a VecNSpec visualization, which displays detailed theme strengths of a selected document set, using a display inspired by the parallel coordinates approach [Inselberg]. As the figure illustrates, this visualization can clearly show the thematic differences between small groups of documents. The Themescape and VecNSpec visualizations can be used to help the user explore part/whole questions.
Visualizations like the cluster projections are based on a "relative" layout, determined solely on the basis of the information collection at hand. However, a user might also be interested in how documents relate to a reference organization. For example, a prototype visualization, 32D Hypercube, is designed to show the relationship of a document collection to a static ontology of concepts, such as UMLS (Unified Medical Language System). The ontology is shown along the x, y, and z axes. Documents are mapped into the space according to their dominant concepts. The visualization shows how well various portions of the ontology are represented in this collection, indicates which concept combinations are prevalent, and enables users to more easily compare two information collections. Pairing the 32D Hypercube with a relative visualization, such as Galaxies, allows the user to explore differences between the relative vs. absolute paradigms. Interaction tools would include a variety of rotate, zoom, etc. manipulations to better view the display; selection and highlighting for determining which objects are where; and subsetting to allow examination of a group within one of the other paradigms. One of the most useful aspects of the Hypercube visualization is an explicit representation of themes that are not covered in an information collection. Similarly, it provides an excellent method for viewing theme coverage over time. A recent analysis of newswire data clearly showed periodic patterns, indicating the lack of new news stories over weekends, and also showed how particular themes died out or exhibited gaps.
In addition to such object-based visualizations, a user may be interested in examining relationships among the themes themselves. Another prototype visualization, the Cosmic Tumbleweed, displays the themes themselves in a 2D or 3D projection, with documents placed between the themes they most strongly embody. The result is a tumbleweed-like view (see figure). Themes displayed close to each other are strongly related; the density of documents between two themes shows the documents supporting that relationship. A document's base position is computed as a linear mixture of the positions of the two most dominant topics. Optionally, the third most dominant topic may be allowed to influence the final position of the document. Because the vertices of the Cosmic Tumbleweed represent actual topics, the visualization provides insights on the relationships between those topics, as well an explicit relative mapping for the spatial layout of the visualization. In addition, placing documents on or near a line between their primary and secondary topics should allow the user to tell at a glance what the topical thrust of a document is without any reading required of the user.This visualization can be viewed as the "dual" of the Galaxies visualization. The Galaxies plots documents, clustered based on their theme relationships. The Cosmic Tumbleweed plots themes, clustered based on their usage within documents. Together they provide the user with the capability for comparing the object-based vs. theme-based paradigms.
Interactions Within and Among Visualizations
We've described briefly interactions that allow users to further explore individual visualizations. To exploit the full power of multiple paradigms, users must be able to interact not only within the visualizations, but also across multiple ones. Users must be able to select items in one display and see their corresponding locations in other displays. They should be able to create a subset from one visualization and portray it in another one. They should have the ability to "link" the visualiza-tions so that changes or exploration in one are propagated to others. Using such methods, a user could explore how the documents clustered in the Galaxies cluster plot correspond to individual themes represented in the 32D Hypercube (as shown in the figure).
Exploring correspondences between visualizations
They could explore further to examine the relationships between those themes, using the Rainbows visualization. They could create a subset of documents from the 32D hypercude, representing a cohesive set of themes, such as all documents primarily covering events in the Middle East. These documents could then be visualized in other tools, such as Themescape to provide an overview of the subset content, Cosmic Tumbleweed, to see the interrelations of specific themes, and Rainbows to explore evidence of associations and dis-associations. In our recent experiments, we made heavy use of such cross-visualization interactions. We concluded that the interactions among these visual paradigms are equally as important as, if not more important than, the paradigms themselves.
These visualizations and interactions must rest on a solid foundation of mathematics. Techniques to identify the key features of an information collection; representations that lend themselves to one or more interpretive displays; methods for efficiently manipulating the representations - these are all key elements of information exploration. The visualizations shown here are supported by a vector representation, based on statistical term distributions among documents. The mathematical software analyzes the textual content of a document collection, identifies key topics and themes, and generates mathematical signals representing the various documents in the collection in terms of the key topics and themes. Because the mathematics and the representation address both the document relationships and the theme relationships, the flexibility to create the different visualization paradigms is achieved. The user may also weight themes differently based on interest. We have also investigated the use of unusual projection surfaces such as fractals to convey the relationship of themes among documents [Miller]. Interestingly, user interaction tools are critical to assessing the efficacy of one projection scheme over another.
User capabilities and goals, powerful mathematical techniques and algorithms, engaging visualizations, and flexible interaction techniques - together they can support a level of exploration and understanding that is not provided by any subset alone.
[Grinstein] Grinstein, Georges, and Ward, Matthew. Introduction to Data Visualization. IEEE Visualization '97. (Phoenix AZ, October 1997).
[Inselberg] Inselberg, Alfred. Multidimensional Detective, in Proceedings of Information Visualization '97 (Phoenix AZ, October 1997) IEEE Service Center, 100-107.
[Miller] Miller, Nancy, Hetzler, Beth, et al. The Need for Metrics in Visual Information Analysis, in Workshop on New Paradigms in Information Visualization and Manipulation (Las Vegas NV,November 1997).
[Saracevic] Saracevic, Tefko. Users Lost: Reflections on the Past, Future, and Limits of Information Science (Salton Award Lecture). SIGIR Forum, Fall 1997.
[Shneiderman] Shneiderman, Ben. The Eyes Have It: A Task by Type Taxonomy for Information Visualizations, in Proceedings of IEEE Symposium on Visual Languages (Boulder CO, September 1996) 336-343.
[Wise] Wise, J. A., Thomas, J. J., et al. Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents, in Proceedings of IEEE '95 Information Visualization (Atlanta GA, October 1995), IEEE Service Center, 51-58.