Skip to Main Content U.S. Department of Energy
Information Visualization

Core Area: Information Signatures

We create mathematical signatures from text, multimedia, and sensor data, discovering new ways of summarizing key features in large, heterogeneous data sets. We focus on performance and scalability, creating computationally efficient representations of complex data sets.

Research Topics and Products

IN-SPIRE™

screenshot

IN-SPIRE™ provides tools for exploring textual data, including Boolean and “topical” queries, term gisting, and time/trend analysis tools. This suite of tools allows the user to rapidly discover hidden information relationships by reading only pertinent documents. IN-SPIRE™ has been used to explore technical and patent literature, marketing and business documents, web data, accident and safety reports, newswire feeds and message traffic, and more. It has applications in many areas, including information analysis, strategic planning, and medical research.

Many analytical tools are provided to work in concert with the visualizations, allowing users to investigate the document groupings, query the document contents, investigate time-based trends, and much more.

Related Items

IN-SPIRE™ benefits from the RAKE/CAST Keyword Extraction and Story Flow research projects.

Facets for Discovery and Exploration in Text Collections

Rose, S; Roberts, I; Cramer, N, "Facets for Discovery and Exploration in Text Collections", IEEE Workshop on Interactive Visual Text Analytics for Decision Making, 2011 IEEE VisWeek.

Abstract

Faceted classifications of text collections provide a useful means of partitioning documents into related groups, however traditional approaches of faceting text collections rely on comprehensive analysis of the subject area or annotated general attributes. In this paper we show the application of basic principles for facet analysis to the development of computational methods for facet classification of text collections. Integration with a visual analytics system is described with summaries of user experiences.

Describing Story Evolution from Dynamic Information Streams

Rose SJ, RS Butner, WE Cowley, ML Gregory, and J Walker. 2009. Describing Story Evolution from Dynamic Information Streams. In IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST) VAST 2009, Oct. 12-13, 2009, Atlantic City, NJ, pp. 99-106. IEEE , Piscataway, NJ.

Abstract

Sources of streaming information, such as news syndicates, publish information continuously. Information portals and news aggregators list the latest information from around the world enabling information consumers to easily identify events in the past 24 hours. The volume and velocity of these streams causes information from prior days' to quickly vanish despite its utility in providing an informative context for interpreting new information. Few capabilities exist to support an individual attempting to identify or understand trends and changes from streaming information over time. The burden of retaining prior information and integrating with the new is left to the skills, determination, and discipline of each individual. In this paper we present a visual analytics system for linking essential content from information streams over time into dynamic stories that develop and change over multiple days. We describe particular challenges to the analysis of streaming information and explore visual representations for showing story change and evolution over time.

Visual Analysis of Weblog Content

Gregory ML, DA Payne, D McColgin, NO Cramer, and DV Love. 2006. "Visual Analysis of Weblog Content." In International Conference on Weblogs and Social Media '07. pp. 227-230. Boulder, March 26-28, 2007.

Abstract

In recent years, one of the advances of the World Wide Web is social media and one of the fastest growing aspects of social media is the blogosphere. Blogs make content creation easy and are highly accessible through web pages and syndication. With their growing influence, a need has arisen to be able to monitor the opinions and insight revealed within their content. This paper describes a technical approach for analyzing the content of blog data using a visual analytic tool, IN-SPIRE, developed by Pacific Northwest National Laboratory. We will describe both how an analyst can explore blog data with IN-SPIRE and how the tool could be modified in the future to handle the specific nuances of analyzing blog data.

Analysis Experiences Using Information Visualization

Hetzler, E. and Turner A. 2004. "Analysis experiences using information visualization." IEEE Computer Graphics and Applications, 24:5, pp. 22-26.

Abstract

To deliver truly useful tools, researchers must learn how to map between the knowledge domains inherent in information collections and the knowledge domains in users' minds. The true measure of this work is not what the software shows, but what the user is able to understand by using it. This article summarizes lessons learned from an observational study of the application of the In-Spire visually-oriented text exploitation system in an operational analysis environment.

Discovering Knowledge Through Visual Analysis

Thomas JJ, PJ Cowley, OA Kuchar, LT Nowell, JR Thomson, and PC Wong. 2001. "Discovering Knowledge Through Visual Analysis." Journal of Universal Computer Science 7(6):517-529. doi:10.3217/jucs-007-06-0517

Abstract

This paper describes our vision for the near future in digital content analysis as it relates to the creation, verification, and presentation of knowledge. We focus on how visualization enables humans to make discoveries and gain knowledge. Visualization, in this context, is not just the picture representing the data but also a two-way interaction between humans and their information resources for the purposes of knowledge discovery, verification, and the sharing of knowledge with others. We present visual interaction and analysis examples to demonstrate how one current visualization tool analyzes large, diverse collections of text. This is followed by lessons learned and the presentation of a core concept for a new human information discourse.

Error processing SSI file

Keyword Extraction and Themes with RAKE and CAST

The goal of this research is to provide more descriptive cues so that users have better insight into the features of a text collection. It also allows them to explore with greater precision and identify or evaluate more specific relationships. In order to accomplish this, the algorithm Computation and Analysis of Significant Themes (CAST) was created. CAST computes a set of themes for a collection of documents based on automatically extracted keyword information.

The Rapid Automatic Keyword Extraction (RAKE) algorithm provides this keyword information. RAKE automatically extracts single- and multi-word keywords from individual documents and then provides a set of high-value keywords to CAST which are clustered into themes. Each computed theme comprises a set of highly associated keywords and a set of documents that are highly associated with the theme’s keywords.

Whereas many text analysis methods focus on what distinguishes documents, RAKE and CAST focus on what describes documents, ideally characterizing what each document is essentially about. Keywords provide an advantage over other types of signatures as they are readily accessible to a user and can be easily applied to search other information spaces. The value of any particular keyword can be readily evaluated by a user for their particular interests and applied in or adapted to multiple contexts.

Related Items

RAKE and CAST are implemented in several products, including IN-SPIRE, Canopy, and SRS and Analytic Widgets.

Automatic Keyword Extraction from Individual Documents

Rose SJ, DW Engel, NO Cramer, and WE Cowley. 2010. Automatic Keyword Extraction from Individual Documents. Chapter 1 in Text Mining: Application and Theory, vol. 1, ed. MWBerry, J Kogan, pp. 3-20. John Wiley & Sons, Chichester, United Kingdom.

Abstract

This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method's configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.

Describing Story Evolution from Dynamic Information Streams

Rose SJ, RS Butner, WE Cowley, ML Gregory, and J Walker. 2009. Describing Story Evolution from Dynamic Information Streams. In IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST) VAST 2009, Oct. 12-13, 2009, Atlantic City, NJ, pp. 99-106. IEEE , Piscataway, NJ.

Abstract

Sources of streaming information, such as news syndicates, publish information continuously. Information portals and news aggregators list the latest information from around the world enabling information consumers to easily identify events in the past 24 hours. The volume and velocity of these streams causes information from prior days' to quickly vanish despite its utility in providing an informative context for interpreting new information. Few capabilities exist to support an individual attempting to identify or understand trends and changes from streaming information over time. The burden of retaining prior information and integrating with the new is left to the skills, determination, and discipline of each individual. In this paper we present a visual analytics system for linking essential content from information streams over time into dynamic stories that develop and change over multiple days. We describe particular challenges to the analysis of streaming information and explore visual representations for showing story change and evolution over time.

Starlight

screenshot of Starlight

The Starlight Information Visualization System™ graphically depicts information, dramatically accelerating and improving human ability to derive meaningful knowledge from increasingly large and complex information resources. It is simultaneously a powerful information analysis tool and a platform for conducting advanced visualization research.

Starlight is explicitly designed to manipulate the types of relationships humans need to understand in order to solve complex, multifaceted, real-world problems. Graphical representations enable the underlying relationships to be visually interpreted. Viewers can interactively move among multiple representations of the same information in order to uncover correlations that may span multiple relationship types.

For more information, see the Starlight web site.

Edge Computing

screenshot

Edge Computing is pushing the frontier of computing applications, data, and services away from centralized nodes to the logical extremes of a network. It enables analytics and knowledge generation to occur at the source of the data. This approach requires leveraging resources that may not be continuously connected to a network such as laptops, smartphones, tablets and sensors.

An Edge Computing reference platform named Kaval, running on the Android operating system, has been developed at PNNL to help provide Edge Computing solutions. Kaval is a common platform for clients who need the ability to collect and analyze data on a mobile device, while potentially sharing that data with others devices in the field.

Scalable Reasoning System (SRS)

SRS Social Media Analysis

The Scalable Reasoning System (SRS) is an analytic framework for developing web-based visualization applications. Using a growing library of both visual and analytic components, custom applications can be created for any domain, from any data source.

SRS incorporates the simplicity and accessibility of web-based solutions with the power of an extensible and adaptable back-end analytics platform. SRS applications have been deployed to:

  • Analyze unstructured text
  • Explore hierarchical taxonomies
  • Support real-time analysis of trends and patterns in streaming social media data
  • Organize and provide visual search and navigation of large document repositories

Related Items

SRS benefits from the RAKE/CAST Keyword Extraction and Story Flow research projects.

The Scalable Reasoning System: Lightweight Visualization for Distributed Analytics

Pike W, J Bruce, B Baddeley, D Best, L Franklin, R May, D Rice, R Riensche, and K Younkin. 2009 The Scalable Reasoning System: Lightweight Visualization for Distributed Analytics. Information Visualization. 8(1): 71-84.

Abstract

A central challenge in visual analytics is the creation of accessible, widely distributable analysis applications that bring the benefits of visual discovery to as broad a user base as possible. Moreover, to support the role of visualization in the knowledge creation process, it is advantageous to allow users to describe the reasoning strategies they employ while interacting with analytic environments. We introduce an application suite called the scalable reasoning system (SRS), which provides web-based and mobile interfaces for visual analysis. The service-oriented analytic framework that underlies SRS provides a platform for deploying pervasive visual analytic environments across an enterprise. SRS represents a 'lightweight' approach to visual analytics whereby thin client analytic applications can be rapidly deployed in a platform-agnostic fashion. Client applications support multiple coordinated views while giving analysts the ability to record evidence, assumptions, hypotheses and other reasoning artifacts. We describe the capabilities of SRS in the context of a real-world deployment at a regional law enforcement organization.

Canopy

screenshot of Canopy

Canopy is a suite of visual analytic tools designed to support deep investigation of large multimedia collections. Canopy combines the understanding of data represented in multiple formats: video, image, and text and presents that information to users through new visual representations. Users can explore relationships between documents and among subcomponents of documents. Canopy incorporates cutting-edge extraction techniques, state-of-the-art content analysis algorithms and novel interactive information visualizations so that analysts can comprehend and articulate the big picture. Canopy helps to reduce analysts’ workload and ultimately the effort of identifying critical intelligence for decision makers.

Canopy:

  • Aids and expedites triage of multimedia data. For example, Canopy can help with analytic problems such as, “I have data from a large collection of files; help me investigate this collection to determine the most relevant files without my having to watch every movie, view every image, and read all the text.”
  • Bootstraps the analysis process by providing visual clues to potential data relationships and highlights connections, giving the user an understanding of all the data and additional context of its structure. This facilitates discovering previously unknown content and/or unexpected or non-obvious relationships.
  • Provides insight into multimedia content similarities and relationships by discovering and visualizing the relationships in an interactive and dynamic user interface.
  • Provides true multimedia analysis, not just stovepipe analysis of an individual type of data augmented with metadata. For example, a Word document with components such as embedded images and text is evaluated as a cohesive information item, where the association of these various document elements is preserved.

Related Items

Canopy benefits from the Lighthouse and Arc Weld research projects.

Interactive Visual Comparison of Multimedia Data through Type-specific Views

Burtner, R., Bohn, S., & Payne, D. (2013, February). Interactive Visual Comparison of Multimedia Data through Type-specific Views. In IS&T/SPIE Electronic Imaging (pp. 86540M-86540M). International Society for Optics and Photonics.

Abstract

Analysts who work with collections of multimedia to perform information foraging understand how difficult it is to connect information across diverse sets of mixed media. The wealth of information from blogs, social media, and news sites often can provide actionable intelligence; however, many of the tools used on these sources of content are not capable of multimedia analysis because they only analyze a single media type. As such, analysts are taxed to keep a mental model of the relationships among each of the media types when generating the broader content picture. To address this need, we have developed Canopy, a novel visual analytic tool for analyzing multimedia. Canopy provides insight into the multimedia data relationships by exploiting the linkages found in text, images, and video co-occurring in the same document and across the collection. Canopy connects derived and explicit linkages and relationships through multiple connected visualizations to aid analysts in quickly summarizing, searching, and browsing collected information to explore relationships and align content. In this paper, we will discuss the features and capabilities of the Canopy system and walk through a scenario illustrating how this system might be used in an operational environment.

Lighthouse

Lighthouse is a content analysis system that facilitates the analysis, synthesis, and retrieval of multimedia content. It accomplishes this through an ensemble of state-of-the-art characterization and decision support processes that provide search, classification, summarization, and temporal analysis. This system provides our customers with the ability to find image and video duplicates including both exact matches and fuzzy or partial matches and includes classifiers for object recognition and face detection. Lighthouse provides temporal analysis of videos including shot boundary detection, video summaries, and event detection. The system summarizes the image and video content of a collection by clustering similar content. Lighthouse can be used to identify relationships in collections of data. For example a user may suspect that a video has been created from a portion of another video. To learn more the user can explore the similarity measures provided by Lighthouse to trace the repurposed video back to the video of origin. Lighthouse matches industry standards as demonstrated through an extensive set of benchmarks on collections commonly used within the image processing community.

Related Items

Lighthouse is the content analysis system powering Canopy

Multimedia Analysis + Visual Analytics = Multimedia Analytics

Chinchor N, JJ Thomas, PC Wong, M Christel, and MW Ribarsky. 2010. Multimedia Analysis plus Visual Analytics = Multimedia Analytics. IEEE Computer Graphics and Applications 30(5):52-60.

Abstract

Multimedia analysis has focused on images, video, and to some extent audio and has made progress in single channels excluding text. Visual analytics has focused on the user interaction with data during the analytic process plus the fundamental mathematics and has continued to treat text as did its precursor, information visualization. The general problem we address in this tutorial is the combining of multimedia analysis and visual analytics to deal with multimedia information gathered from different sources, with different goals or objectives, and containing all media types and combinations in common usage.

Graph Analytics

green arrow screenshot havegreen screenshot

Graph analytics is the study and analysis of data that can be transformed into a graph representation consisting of nodes and links. Scientists at the Pacific Northwest National Laboratory (PNNL) have been actively involved in graph analytics R&D in social network, cyber communication, electric power grid, critical infrastructure, bio-informatics, and earth sciences applications. The mission of graph analytics research at PNNL is not merely about research per se; it has the essential and enduring purpose of producing pragmatic working solutions that meet real-life challenges.

We have developed a series of cutting-edge graph analytics technologies to explore and analyze graphs with different sizes and complexities. For the exploration of small world graphs such as a social network, we developed the concept of a graph signature that extracts the local features of a graph node or set of nodes and used it to supplement the exploration of a complicated graph filled with hidden features. For larger graphs with about one million nodes, we further developed the concept of a multi-resolution, middle-out, cross-zooming technique that allows users to interactively explore their graphs on a common desktop computer. Currently, we are developing the concept of an extreme-scale graph analytics pipeline designed to handle graphs with hundreds of millions of nodes and tens of billions of links. Much of our work developed at PNNL has been loosely integrated into a graph analytics library known as Have Green.

Related Items

A Space-Filling Visualization Technique for Multivariate Small World Graphs

Wong PC, HP Foote, PS Mackey, G Chin, Jr, Z Huang, and JJ Thomas. 2012. "A Space-Filling Visualization Technique for Multivariate Small World Graphs." IEEE Transactions on Visualization and Computer Graphics 18(5):797-809 . doi:10.1109/TVCG.2011.99

Abstract

We introduce an information visualization technique, known as GreenCurve, for large multivariate sparse graphs that exhibit small-world properties. Our fractal-based design approach uses spatial cues to approximate the node connections and thus eliminates the links between the nodes in the visualization. The paper describes a robust algorithm to order the neighboring nodes of a large sparse graph by solving the Fiedler vector of its graph Laplacian, and then fold the graph nodes into a space-filling fractal curve based on the Fiedler vector. The result is a highly compact visualization that gives a succinct overview of the graph with guaranteed visibility of every graph node. GreenCurve is designed with the power grid infrastructure in mind. It is intended for use in conjunction with other visualization techniques to support electric power grid operations. The research and development of GreenCurve was conducted in collaboration with domain experts who understand the challenges and possibilities intrinsic to the power grid infrastructure. The paper reports a case study on applying GreenCurve to a power grid problem and presents a usability study to evaluate the design claims that we set forth.

Graph Analytics—Lessons Learned and Challenges Ahead

Pak Chung Wong, Chaomei Chen, Carsten Gorg, Ben Shneiderman, John Stasko, Jim Thomas, Graph Analytics—Lessons Learned and Challenges Ahead, IEEE Computer Graphics and Applications, vol. 31, no. 5, pp. 18-29, Sep./Oct. 2011, doi:10.1109/MCG.2011.72

Abstract

Graph analytics is one of the most influential and important R&D topics in the visual analytics community. Researchers with diverse backgrounds from information visualization, human-computer interaction, computer graphics, graph drawing, and data mining have pursued graph analytics from scientific, technical, and social approaches. These studies have addressed both distinct and common challenges. Past successes and mistakes can provide valuable lessons for revising the research agenda. In this article, six researchers from four academic and research institutes identify graph analytics' fundamental challenges and present both insightful lessons learned from their experience and good practices in graph analytics research. The goal is to critically assess those lessons and shed light on how they can stimulate research and draw attention to grand challenges for graph analytics. The article also establishes principles that could lead to measurable standards and criteria for research.

A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis

Jin S, Z Huang, Y Chen, D Chavarria-Miranda, JT Feo, and PC Wong. 2010. A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), pp. 1-7. Institute of Electrical and Electronics Engineers, Piscataway, NJ.

Abstract

In Energy Management Systems, contingency analysis is commonly performed for identifying and mitigating potentially harmful power grid component failures. The exponentially increasing combinatorial number of failure modes imposes a significant computational burden for massive contingency analysis. It is critical to select a limited set of high-impact contingency cases within the constraint of computing power and time requirements to make it possible for real-time power system vulnerability assessment. In this paper, we present a novel application of parallel betweenness centrality to power grid contingency selection. We cross-validate the proposed method using the model and data of the western US power grid, and implement it on a Cray XMT system - a massively multithreaded architecture - leveraging its advantages for parallel execution of irregular algorithms, such as graph analysis. We achieve a speedup of 55 times (on 64 processors) compared against the single-processor version of the same code running on the Cray XMT. We also compare an OpenMP-based version of the same code running on an HP Superdome shared-memory machine. The performance of the Cray XMT code shows better scalability and resource utilization, and shorter execution time for large-scale power grids. This proposed approach has been evaluated in PNNL's Electricity Infrastructure Operations Center (EIOC). It is expected to provide a quick and efficient solution to massive contingency selection problems to help power grid operators to identify and mitigate potential widespread cascading power grid failures in real time.

A Novel Visualization Technique for Electric Power Grid Analytics

Wong PC, K Schneider, P Mackey, H Foote, G Chin, R Guttromson, and J. Thomas. 2009 A Novel Visualization Technique for Electric Power Grid Analytics. Visualization and Computer Graphics, IEEE Transactions on 15(3):410-423

Abstract

The application of information visualization holds tremendous promise for the electric power industry, but its potential has so far not been sufficiently exploited by the visualization community. Prior work on visualizing electric power systems has been limited to depicting raw or processed information on top of a geographic layout. Little effort has been devoted to visualizing the physics of the power grids, which ultimately determines the condition and stability of the electricity infrastructure. Based on this assessment, we developed a novel visualization system prototype, GreenGrid, to explore the planning and monitoring of the North American Electricity Infrastructure. The paper discusses the rationale underlying the GreenGrid design, describes its implementation and performance details, and assesses its strengths and weaknesses against the current geographic-based power grid visualization. We also present a case study using GreenGrid to analyze the information collected moments before the last major electric blackout in the Western United States and Canada, and a usability study to evaluate the practical significance of our design in simulated real-life situations. Our result indicates that many of the disturbance characteristics can be readily identified with the proper form of visualization.

A Multi-Level Middle-Out Cross-Zooming Approach for Large Graph Analytics

Wong PC, PS Mackey, KA Cook, RM Rohrer, HP Foote, and MA Whiting. 2009. A Multi-Level Middle-Out Cross-Zooming Approach for Large Graph Analytics. In IEEE Symposium on Visual Analytics Science and Technology (VAST 2009), ed. J Stasko and JJ van Wijk, pp. 147 - 154. IEEE , Piscataway, NJ.

Abstract

This paper presents a working graph analytics model that embraces the strengths of the traditional top-down and bottom-up approaches with a resilient crossover concept to exploit the vast middle-ground information overlooked by the two extreme analytical approaches. Our graph analytics model is developed in collaboration with researchers and users, who carefully studied the functional requirements that reflect the critical thinking and interaction pattern of a real-life intelligence analyst. To evaluate the model, we implement a system prototype, known as GreenHornet, which allows our analysts to test the theory in practice, identify the technological and usage-related gaps in the model, and then adapt the new technology in their work space. The paper describes the implementation of GreenHornet and compares its strengths and weaknesses against the other prevailing models and tools.

Geometry-Based Edge Clustering for Graph Visualization

Cui WW, H Zhou, H Qu, PC Wong, and XM Li. 2008. "Geometry-Based Edge Clustering for Graph Visualization." IEEE Transactions on Visualization and Computer Graphics 14(6):1277 - 1284. doi:10.1109/TVCG.2008.135

Abstract

Graphs have been widely used to model relationships among data. For large graphs, excessive edge crossings make the display visually cluttered and thus difficult to explore. In this paper, we propose a novel geometry-based edge-clustering framework that can group edges into bundles to reduce the overall edge crossings. Our method uses a control mesh to guide the edge-clustering process; edge bundles can be formed by forcing all edges to pass through some control points on the mesh. The control mesh can be generated at different levels of detail either manually or automatically based on underlying graph patterns. Users can further interact with the edge-clustering results through several advanced visualization techniques such as color and opacity enhancement. Compared with other edge-clustering methods, our approach is intuitive, flexible, and efficient. The experiments on some large graphs demonstrate the effectiveness of our method.

A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs

Wong PC, HP Foote, PS Mackey, G Chin, Jr, HJ Sofia, and JJ Thomas. 2008. "A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs." Information Visualization 7:105-117.

Abstract

We present an information visualization tool, known as GreenMax, to visually explore large small-world graphs with up to a million graph nodes on a desktop computer. A major motivation for scanning a small-world graph in such a dynamic fashion is the demanding goal of identifying not just the well-known features but also the unknown–known and unknown–unknown features of the graph. GreenMax uses a highly effective multilevel graph drawing approach to pre-process a large graph by generating a hierarchy of increasingly coarse layouts that later support the dynamic zooming of the graph. This paper describes the graph visualization challenges, elaborates our solution, and evaluates the contributions of GreenMax in the larger context of visual analytics on large small-world graphs. We report the results of two case studies using GreenMax and the results support our claim that we can use GreenMax to locate unexpected features or structures behind a graph.

Have Green - A Visual Analytics Framework for Large Semantic Graphs

Wong PC, G Chin, Jr, HP Foote, PS Mackey, and JJ Thomas. 2006. "Have Green - A Visual Analytics Framework for Large Semantic Graphs." In IEEE Symposium on Visual Analytics Science and Technology, pp 67-74. Baltimore, Maryland, October 31-November 2, 2006.

Abstract

A semantic graph is a network of heterogeneous nodes and links annotated with a domain ontology. In intelligence analysis, investigators use semantic graphs to organize concepts and relationships as graph nodes and links in hopes of discovering key trends, patterns, and insights. However, as new information continues to arrive from a multitude of sources, the size and complexity of the semantic graphs will soon overwhelm an investigator's cognitive capacity to carry out significant analyses. We introduce a powerful visual analytics framework designed to enhance investigators' natural analytical capabilities to comprehend and analyze large semantic graphs. The paper describes the overall framework design, presents major development accomplishments to date, and discusses future directions of a new visual analytics system known as Have Green.

Graph Signatures for Visual Analytics

Wong PC, HP Foote, G Chin, JR, PS Mackey, and KA Perrine. 2006. "Graph Signatures for Visual Analytics." IEEE Transactions on Visualization and Computer Graphics 12(6)

Abstract

We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization but also the advantages and strengths of our signature-guided approach presented in the paper.

Generating Graphs for Visual Analytics through Interactive Sketching

Wong PC, HP Foote, PS Mackey, KA Perrine, and G Chin, JR. 2006. "Generating Graphs for Visual Analytics through Interactive Sketching." IEEE Transactions on Visualization and Computer Graphics 12(6)

Abstract

We introduce an interactive graph generator, GreenSketch, designed to facilitate the creation of descriptive graphs required for different visual analytics tasks. The human-centric design approach of GreenSketch enables users to master the creation process without specific training or prior knowledge of graph model theory. The customized user interface encourages users to gain insight into the connection between the compact matrix representation and the topology of a graph layout when they sketch their graphs. Both the human-enforced and machine-generated randomnesses supported by GreenSketch provide the flexibility needed to address the uncertainty factor in many analytical tasks. This paper describes over two dozen examples that cover a wide variety of graph creations from a single line of nodes to a real-life small-world network that describes a snapshot of telephone connections. While the discussion focuses mainly on the design of GreenSketch, we include a case study that applies the technology in a visual analytics environment and a usability study that evaluates the strengths and weaknesses of our design approach.

Clique

screenshot of clique

To help analysts detect and assess potentially malicious events in large amounts of streaming computer network traffic, PNNL researchers have developed a new behavioral model-based anomaly detection technique. The Correlation Layers for Information Query and Exploration (CLIQUE) system builds models for the expected behavior for user defined host groups on a network and compares these models to a specified time window which can be real-time streaming or exploratory data to generate early indicators of 'non-normal' network activity.

CLIQUE's visual interface allows analysts to view detailed displays of network activity and spot the machines, buildings, facilities or other sources of traffic behaving anomalously. Users can navigate through their data temporally, viewing time periods as short as a few minutes or as long as several days. CLIQUE models and visualizations are designed to scale to immense data volumes, operating on datasets that are comprised of billions of transactions per day, helping to meet the data-intensive cyber security challenge.

Related Items

High-Throughput Real-Time Network Flow Visualization

Best DM, DV Love, WA Pike, and SJ Bohn. 2010. High-Throughput Real-Time Network Flow Visualization. FloCon2010, New Orleans, LA.

Abstract

This presentation and demonstration will introduce two interactive, high-throughput visual analysis tools, Traffic Circle and CLIQUE, and will discuss the analytic requirements of the U.S. government cyber security capabilities for which the tools were developed and are being deployed. Both tools take a time-based approach to visual analysis, with Traffic Circle displaying raw data and CLIQUE computing real-time behavioral models. Performance benchmarks will also be discussed; the tools are currently capable of ingesting and presenting data volumes on the order of hundreds of millions of flow records at once.

Real-Time Visualization of Network Behaviors for Situational Awareness

Best DM, SJ Bohn, DV Love, AS Wynne, and WA Pike. 2010. Real-Time Visualization of Network Behaviors for Situational Awareness. In Proceedings of the Seventh International Symposium on Visualization for Cyber Security, pp. 79-90. ACM , New York, NY.

Abstract

Plentiful, complex, and dynamic data make understanding the state of an enterprise network difficult. Although visualization can help analysts understand baseline behaviors in network traffic and identify off-normal events, visual analysis systems often do not scale well to operational data volumes (in the hundreds of millions to billions of transactions per day) nor to analysis of emergent trends in real-time data. We present a system that combines multiple, complementary visualization techniques coupled with in-stream analytics, behavioral modeling of network actors, and a high-throughput processing platform called MeDICi. This system provides situational understanding of real-time network activity to help analysts take proactive response steps. We have developed these techniques using requirements gathered from the government users for which the tools are being developed. By linking multiple visualization tools to a streaming analytic pipeline, and designing each tool to support a particular kind of analysis (from high-level awareness to detailed investigation), analysts can understand the behavior of a network across multiple levels of abstraction.

Related Papers

Show all abstracts

Interactive Visual Comparison of Multimedia Data through Type-specific Views

Burtner, R., Bohn, S., & Payne, D. (2013, February). Interactive Visual Comparison of Multimedia Data through Type-specific Views. In IS&T/SPIE Electronic Imaging (pp. 86540M-86540M). International Society for Optics and Photonics.

Abstract

Analysts who work with collections of multimedia to perform information foraging understand how difficult it is to connect information across diverse sets of mixed media. The wealth of information from blogs, social media, and news sites often can provide actionable intelligence; however, many of the tools used on these sources of content are not capable of multimedia analysis because they only analyze a single media type. As such, analysts are taxed to keep a mental model of the relationships among each of the media types when generating the broader content picture. To address this need, we have developed Canopy, a novel visual analytic tool for analyzing multimedia. Canopy provides insight into the multimedia data relationships by exploiting the linkages found in text, images, and video co-occurring in the same document and across the collection. Canopy connects derived and explicit linkages and relationships through multiple connected visualizations to aid analysts in quickly summarizing, searching, and browsing collected information to explore relationships and align content. In this paper, we will discuss the features and capabilities of the Canopy system and walk through a scenario illustrating how this system might be used in an operational environment.

Coherent Image Layout using an Adaptive Visual Vocabulary

Dillard SE, MJ Henry, SJ Bohn, and LJ Gosink. 2012. "Coherent Image Layout using an Adaptive Visual Vocabulary." In IS&T/SPIE Electronic Imaging. PNNL-SA-92482, Pacific Northwest National Laboratory, Richland, WA Proc. SPIE 8661, Image Processing: Machine Vision Applications VI, 86610Q (March 6, 2013); doi:10.1117/12.2004733

Abstract

When querying a huge image database containing millions of images, the result of the query may still contain many thousands of images that need to be presented to the user. We consider the problem of arranging such a large set of images into a visually coherent layout, one that places similar images next to each other. Image similarity is determined using a bag-of-features model, and the layout is constructed from a hierarchical clustering of the image set by mapping an in-order traversal of the hierarchy tree into a space-filling curve. This layout method provides strong locality guarantees so we are able to quantitatively evaluate performance using standard image retrieval benchmarks. Performance of the bag-of-features method is best when the vocabulary is learned on the image set being clustered. Because learning a large, discriminative vocabulary is a computationally demanding task, we present a novel method for efficiently adapting a generic visual vocabulary to a particular dataset. We evaluate our clustering and vocabulary adaptation methods on a variety of image datasets and show that adapting a generic vocabulary to a particular set of images improves performance on both hierarchical clustering and image retrieval tasks.

Speech Information Retrieval: a Review

Hafen RP, and MJ Henry. 2012. "Speech Information Retrieval: a Review." Multimedia Systems 18(6):499-518. doi:10.1007/s00530-012-0266-0

Abstract

Speech is an information-rich component of multimedia. Information can be extracted from a speech signal in a number of different ways, and thus there are several well-established speech signal analysis research fields. These fields include speech recognition, speaker recognition, event detection, and fingerprinting. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major speech analysis fields. The goal is to introduce enough background for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

Error processing SSI file

Speech information retrieval: a review

Hafen RP, and MJ Henry. 2012. "Speech information retrieval: a review." Multimedia Systems 18(6):499-518. doi:10.1007/s00530-012-0266-0

Abstract

Speech is an information-rich component of multimedia. Information can be extracted from a speech signal in a number of different ways, and thus there are several well-established speech signal analysis research fields. These fields include speech recognition, speaker recognition, event detection, and fingerprinting. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major speech analysis fields. The goal is to introduce enough background for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

Error processing SSI file
Error processing SSI file
Error processing SSI file

A Highly Parallel Implementation of K-Means for Multithreaded Architecture

Mackey PS, JT Feo, PC Wong, and Y Chen. 2011. "A Highly Parallel Implementation of K-Means for Multithreaded Architecture." In 19th High Performance Computing Symposium (HPC 2011): SCS Spring Simulation Multiconference (SpringSim 2011), April 3-7, 2011, Boston, MA. ACM , New York, NY.

Abstract

We present a parallel implementation of the popular k-means clustering algorithm for massively multithreaded computer systems, as well as a parallelized version of the KKZ seed selection algorithm. We demonstrate that as system size increases, sequential seed selection can become a bottleneck. We also present an early attempt at parallelizing k-means that highlights critical performance issues when programming massively multithreaded systems. For our case studies, we used data collected from electric power simulations and run on the Cray XMT.

Graph Analytics—Lessons Learned and Challenges Ahead

Pak Chung Wong, Chaomei Chen, Carsten Gorg, Ben Shneiderman, John Stasko, Jim Thomas, Graph Analytics—Lessons Learned and Challenges Ahead, IEEE Computer Graphics and Applications, vol. 31, no. 5, pp. 18-29, Sep./Oct. 2011, doi:10.1109/MCG.2011.72

Abstract

Graph analytics is one of the most influential and important R&D topics in the visual analytics community. Researchers with diverse backgrounds from information visualization, human-computer interaction, computer graphics, graph drawing, and data mining have pursued graph analytics from scientific, technical, and social approaches. These studies have addressed both distinct and common challenges. Past successes and mistakes can provide valuable lessons for revising the research agenda. In this article, six researchers from four academic and research institutes identify graph analytics' fundamental challenges and present both insightful lessons learned from their experience and good practices in graph analytics research. The goal is to critically assess those lessons and shed light on how they can stimulate research and draw attention to grand challenges for graph analytics. The article also establishes principles that could lead to measurable standards and criteria for research.

Automatic Keyword Extraction from Individual Documents

Rose SJ, DW Engel, NO Cramer, and WE Cowley. 2010. Automatic Keyword Extraction from Individual Documents. Chapter 1 in Text Mining: Application and Theory, vol. 1, ed. MWBerry, J Kogan, pp. 3-20. John Wiley & Sons, Chichester, United Kingdom.

Abstract

This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method's configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.

Events and Trends in Text Streams

Engel DW, PD Whitney, and NO Cramer. 2010. Events and Trends in Text Streams. Chapter 9 in Text Mining: Application and Theory, vol. 1, ed. MWBerry, J Kogan, pp. 3-20. John Wiley & Sons, Chichester, United Kingdom.

Abstract

Text streams--collections of documents or messages that are generated and observed over time--are ubiquitous. Our research and development are targeted at developing algorithms to find and characterize changes in topic within text streams. To date, this research has demonstrated the ability to detect and describe 1) short duration, atypical events and 2) the emergence of longer-term shifts in topical content. This technology has been applied to predefined temporally ordered document collections but is also suitable for application to near-real-time textual data streams.

Real-Time Visualization of Network Behaviors for Situational Awareness

Best DM, SJ Bohn, DV Love, AS Wynne, and WA Pike. 2010. Real-Time Visualization of Network Behaviors for Situational Awareness. In Proceedings of the Seventh International Symposium on Visualization for Cyber Security, pp. 79-90. ACM , New York, NY.

Abstract

Plentiful, complex, and dynamic data make understanding the state of an enterprise network difficult. Although visualization can help analysts understand baseline behaviors in network traffic and identify off-normal events, visual analysis systems often do not scale well to operational data volumes (in the hundreds of millions to billions of transactions per day) nor to analysis of emergent trends in real-time data. We present a system that combines multiple, complementary visualization techniques coupled with in-stream analytics, behavioral modeling of network actors, and a high-throughput processing platform called MeDICi. This system provides situational understanding of real-time network activity to help analysts take proactive response steps. We have developed these techniques using requirements gathered from the government users for which the tools are being developed. By linking multiple visualization tools to a streaming analytic pipeline, and designing each tool to support a particular kind of analysis (from high-level awareness to detailed investigation), analysts can understand the behavior of a network across multiple levels of abstraction.

Multimedia Analysis + Visual Analytics = Multimedia Analytics

Chinchor N, JJ Thomas, PC Wong, M Christel, and MW Ribarsky. 2010. Multimedia Analysis plus Visual Analytics = Multimedia Analytics. IEEE Computer Graphics and Applications 30(5):52-60.

Abstract

Multimedia analysis has focused on images, video, and to some extent audio and has made progress in single channels excluding text. Visual analytics has focused on the user interaction with data during the analytic process plus the fundamental mathematics and has continued to treat text as did its precursor, information visualization. The general problem we address in this tutorial is the combining of multimedia analysis and visual analytics to deal with multimedia information gathered from different sources, with different goals or objectives, and containing all media types and combinations in common usage.

High-Throughput Real-Time Network Flow Visualization

Best DM, DV Love, WA Pike, and SJ Bohn. 2010. High-Throughput Real-Time Network Flow Visualization. FloCon2010, New Orleans, LA.

Abstract

This presentation and demonstration will introduce two interactive, high-throughput visual analysis tools, Traffic Circle and CLIQUE, and will discuss the analytic requirements of the U.S. government cyber security capabilities for which the tools were developed and are being deployed. Both tools take a time-based approach to visual analysis, with Traffic Circle displaying raw data and CLIQUE computing real-time behavioral models. Performance benchmarks will also be discussed; the tools are currently capable of ingesting and presenting data volumes on the order of hundreds of millions of flow records at once.

A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis

Jin S, Z Huang, Y Chen, D Chavarria-Miranda, JT Feo, and PC Wong. 2010. A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), pp. 1-7. Institute of Electrical and Electronics Engineers, Piscataway, NJ.

Abstract

In Energy Management Systems, contingency analysis is commonly performed for identifying and mitigating potentially harmful power grid component failures. The exponentially increasing combinatorial number of failure modes imposes a significant computational burden for massive contingency analysis. It is critical to select a limited set of high-impact contingency cases within the constraint of computing power and time requirements to make it possible for real-time power system vulnerability assessment. In this paper, we present a novel application of parallel betweenness centrality to power grid contingency selection. We cross-validate the proposed method using the model and data of the western US power grid, and implement it on a Cray XMT system - a massively multithreaded architecture - leveraging its advantages for parallel execution of irregular algorithms, such as graph analysis. We achieve a speedup of 55 times (on 64 processors) compared against the single-processor version of the same code running on the Cray XMT. We also compare an OpenMP-based version of the same code running on an HP Superdome shared-memory machine. The performance of the Cray XMT code shows better scalability and resource utilization, and shorter execution time for large-scale power grids. This proposed approach has been evaluated in PNNL's Electricity Infrastructure Operations Center (EIOC). It is expected to provide a quick and efficient solution to massive contingency selection problems to help power grid operators to identify and mitigate potential widespread cascading power grid failures in real time.

A Multi-Level Middle-Out Cross-Zooming Approach for Large Graph Analytics

Wong PC, PS Mackey, KA Cook, RM Rohrer, HP Foote, and MA Whiting. 2009. A Multi-Level Middle-Out Cross-Zooming Approach for Large Graph Analytics. In IEEE Symposium on Visual Analytics Science and Technology (VAST 2009), ed. J Stasko and JJ van Wijk, pp. 147 - 154. IEEE , Piscataway, NJ.

Abstract

This paper presents a working graph analytics model that embraces the strengths of the traditional top-down and bottom-up approaches with a resilient crossover concept to exploit the vast middle-ground information overlooked by the two extreme analytical approaches. Our graph analytics model is developed in collaboration with researchers and users, who carefully studied the functional requirements that reflect the critical thinking and interaction pattern of a real-life intelligence analyst. To evaluate the model, we implement a system prototype, known as GreenHornet, which allows our analysts to test the theory in practice, identify the technological and usage-related gaps in the model, and then adapt the new technology in their work space. The paper describes the implementation of GreenHornet and compares its strengths and weaknesses against the other prevailing models and tools.

A Novel Visualization Technique for Electric Power Grid Analytics

Wong PC, K Schneider, P Mackey, H Foote, G Chin, R Guttromson, and J. Thomas. 2009 A Novel Visualization Technique for Electric Power Grid Analytics. Visualization and Computer Graphics, IEEE Transactions on 15(3):410-423

Abstract

The application of information visualization holds tremendous promise for the electric power industry, but its potential has so far not been sufficiently exploited by the visualization community. Prior work on visualizing electric power systems has been limited to depicting raw or processed information on top of a geographic layout. Little effort has been devoted to visualizing the physics of the power grids, which ultimately determines the condition and stability of the electricity infrastructure. Based on this assessment, we developed a novel visualization system prototype, GreenGrid, to explore the planning and monitoring of the North American Electricity Infrastructure. The paper discusses the rationale underlying the GreenGrid design, describes its implementation and performance details, and assesses its strengths and weaknesses against the current geographic-based power grid visualization. We also present a case study using GreenGrid to analyze the information collected moments before the last major electric blackout in the Western United States and Canada, and a usability study to evaluate the practical significance of our design in simulated real-life situations. Our result indicates that many of the disturbance characteristics can be readily identified with the proper form of visualization.

Describing Story Evolution from Dynamic Information Streams

Rose SJ, RS Butner, WE Cowley, ML Gregory, and J Walker. 2009. Describing Story Evolution from Dynamic Information Streams. In IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST) VAST 2009, Oct. 12-13, 2009, Atlantic City, NJ, pp. 99-106. IEEE , Piscataway, NJ.

Abstract

Sources of streaming information, such as news syndicates, publish information continuously. Information portals and news aggregators list the latest information from around the world enabling information consumers to easily identify events in the past 24 hours. The volume and velocity of these streams causes information from prior days' to quickly vanish despite its utility in providing an informative context for interpreting new information. Few capabilities exist to support an individual attempting to identify or understand trends and changes from streaming information over time. The burden of retaining prior information and integrating with the new is left to the skills, determination, and discipline of each individual. In this paper we present a visual analytics system for linking essential content from information streams over time into dynamic stories that develop and change over multiple days. We describe particular challenges to the analysis of streaming information and explore visual representations for showing story change and evolution over time.

Two-stage Framework for Visualization of Clustered High Dimensional Data

Choo J, SJ Bohn, and H Park. 2009. Two-stage Framework for Visualization of Clustered High Dimensional Data. In IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST). Pacific Northwest National Laboratory, Richland, WA. [Unpublished]

Abstract

In this paper, we discuss 2D visualization methods of high dimensional representation of the data that are clustered and their associated label information is available. We propose a two-stage framework for visualization of such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by a supervised dimension reduction method such as linear discriminant analysis that preserves the original cluster structure in terms of its criterion. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, in order to further reduce the dimension to 2 for visualization purposes, we apply another dimension reduction method such as principal component analysis that minimizes the distortion in the lower dimensional representation of the data obtained in the first stage. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.

A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs

Wong PC, HP Foote, PS Mackey, G Chin, Jr, HJ Sofia, and JJ Thomas. 2008. "A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs." Information Visualization 7:105-117.

Abstract

We present an information visualization tool, known as GreenMax, to visually explore large small-world graphs with up to a million graph nodes on a desktop computer. A major motivation for scanning a small-world graph in such a dynamic fashion is the demanding goal of identifying not just the well-known features but also the unknown–known and unknown–unknown features of the graph. GreenMax uses a highly effective multilevel graph drawing approach to pre-process a large graph by generating a hierarchy of increasingly coarse layouts that later support the dynamic zooming of the graph. This paper describes the graph visualization challenges, elaborates our solution, and evaluates the contributions of GreenMax in the larger context of visual analytics on large small-world graphs. We report the results of two case studies using GreenMax and the results support our claim that we can use GreenMax to locate unexpected features or structures behind a graph.

BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment

Chin G, Jr, D Chavarría-Miranda, GC Nakamura, and HJ Sofia. 2008. "BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment." BMC Bioinformatics.

Abstract

We introduce a computational framework for graph analysis called the Biological Graph Environment (BioGraphE), which provides a general, scalable integration platform for connecting graph problems in biology to optimized computational solvers and high-performance systems. This framework enables biology researchers and computational scientists to identify and deploy network analysis applications and to easily connect them to efficient and powerful computational software and hardware that are specifically designed and tuned to solve complex graph problems. In our particular application of BioGraphE to support network analysis in genome biology, we investigate the use of a Boolean satisfiability solver known as Survey Propagation as a core computational solver executing on standard high-performance parallel systems, as well as multi- threaded architectures.

Scalable Visual Analytics of Massive Textual Datasets

Krishnan M, SJ Bohn, WE Cowley, VL Crow, and J Nieplocha. 2007. Scalable Visual Analytics of Massive Textual Datasets. In IEEE International Parallel & Distributed Processing Symposium. Long Beach, CA, March 26-30, 2007.

Abstract

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

Visual Analysis of Weblog Content

Gregory ML, DA Payne, D McColgin, NO Cramer, and DV Love. 2006. "Visual Analysis of Weblog Content." In International Conference on Weblogs and Social Media '07. pp. 227-230. Boulder, March 26-28, 2007.

Abstract

In recent years, one of the advances of the World Wide Web is social media and one of the fastest growing aspects of social media is the blogosphere. Blogs make content creation easy and are highly accessible through web pages and syndication. With their growing influence, a need has arisen to be able to monitor the opinions and insight revealed within their content. This paper describes a technical approach for analyzing the content of blog data using a visual analytic tool, IN-SPIRE, developed by Pacific Northwest National Laboratory. We will describe both how an analyst can explore blog data with IN-SPIRE and how the tool could be modified in the future to handle the specific nuances of analyzing blog data.

Diverse Information Integration and Visualization

Havre SL, A Shah, C Posse, and BM Webb-Robertson. 2006."Diverse Information Integration and Visualization." In Visualization and Data Analysis 2006 (EI10). SPIE The International Society for Optical Engineering, San Jose, CA.

Abstract

This paper presents and explores a technique for visually integrating and exploring diverse information. Society produces, collects, and processes ever larger and diverse data including semi- and un-structured text, as well as transaction, communication, and scientific data. It is no longer sufficient to analyze one type of data or information in isolation. Users need to explore their data/information in the context of related information to discover often hidden, but meaningful, complex relationships. Our approach visualizes multiple, like entities across multiple dimensions where each dimension is a partitioning of the entities. The partitioning may be based on inherent or assigned attributes of the entities (or entity data) such as meta-data or prior knowledge captured in annotations. The partitioning may also be derived from entity data. For example, clustering, or unsupervised classification, can be applied to arrays of multidimensional entity data to partition the entities into groups of similar entities, or clusters. The same entities may be clustered on data from different experiment types or processing approaches. This reduction of diverse data/information on an entity to a series of partitions, or discrete (and unit-less) categories, allows the user to view the entities across a variety of data without concern for data types and units. Parallel coordinates visualize entity data across multiple dimensions of typically continuous attributes. We adapt parallel coordinates for dimensions with discrete attributes (partitions) to allow the comparison of entity partition patterns for identifying trends and outlier entities. We illustrate this approach through a prototype, Juxter (short for Juxtaposer).

Generating Graphs for Visual Analytics through Interactive Sketching

Wong PC, HP Foote, PS Mackey, KA Perrine, and G Chin, JR. 2006. "Generating Graphs for Visual Analytics through Interactive Sketching." IEEE Transactions on Visualization and Computer Graphics 12(6)

Abstract

We introduce an interactive graph generator, GreenSketch, designed to facilitate the creation of descriptive graphs required for different visual analytics tasks. The human-centric design approach of GreenSketch enables users to master the creation process without specific training or prior knowledge of graph model theory. The customized user interface encourages users to gain insight into the connection between the compact matrix representation and the topology of a graph layout when they sketch their graphs. Both the human-enforced and machine-generated randomnesses supported by GreenSketch provide the flexibility needed to address the uncertainty factor in many analytical tasks. This paper describes over two dozen examples that cover a wide variety of graph creations from a single line of nodes to a real-life small-world network that describes a snapshot of telephone connections. While the discussion focuses mainly on the design of GreenSketch, we include a case study that applies the technology in a visual analytics environment and a usability study that evaluates the strengths and weaknesses of our design approach.

Graph Signatures for Visual Analytics

Wong PC, HP Foote, G Chin, JR, PS Mackey, and KA Perrine. 2006. "Graph Signatures for Visual Analytics." IEEE Transactions on Visualization and Computer Graphics 12(6)

Abstract

We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization but also the advantages and strengths of our signature-guided approach presented in the paper.

Have Green - A Visual Analytics Framework for Large Semantic Graphs

Wong PC, G Chin, Jr, HP Foote, PS Mackey, and JJ Thomas. 2006. "Have Green - A Visual Analytics Framework for Large Semantic Graphs." In IEEE Symposium on Visual Analytics Science and Technology, pp 67-74. Baltimore, Maryland, October 31-November 2, 2006.

Abstract

A semantic graph is a network of heterogeneous nodes and links annotated with a domain ontology. In intelligence analysis, investigators use semantic graphs to organize concepts and relationships as graph nodes and links in hopes of discovering key trends, patterns, and insights. However, as new information continues to arrive from a multitude of sources, the size and complexity of the semantic graphs will soon overwhelm an investigator's cognitive capacity to carry out significant analyses. We introduce a powerful visual analytics framework designed to enhance investigators' natural analytical capabilities to comprehend and analyze large semantic graphs. The paper describes the overall framework design, presents major development accomplishments to date, and discusses future directions of a new visual analytics system known as Have Green.

Core Areas

Resources

On this page