SECTION II TechnoIogy and Systems CHAPTER 4 Information Visualization Bin Zhu Boston University Hsinchun Chen The University of Arizona Introduction Advanced technology has resulted in the generation of about one million terabytes of information every year. Ninety-nine percent of this is available in digital format (Keim, 2001). More information will be generated in the next three years than was created during all of previous human history (Keim, 2001). Collecting information is no longer a problem, but extracting value from information collections has become progressively more difficult. Various search engines have been developed to make it easier to locate information of interest, but these work well only for a person who has a specific goal and who understands what and how information is stored. This usually is not the case. Visualization was commonly thought of iri terms of representing human mental processes (MacEachren, 1991; Miller, 1984). The concept is now associated with the amplification of these mental processes (Card, Mackinlay, & Shneiderman, 1999). Human eyes can process visual cues rapidly, whereas advanced information analysis techniques transform the computer into a powerful means of managing digitized information. Visualization offers a link between these two potent systems, the human eye and the computer (Gershon, Eick, & Card, 19981, helping to identify patterns and to extract insights from large amounts of information. The identification of patterns is important because it may lead to a scientific discovery, an interpretation of clues to solve a crime, the prediction of catastrophic weather, a successful financial investment, or a better understanding of human behavior in a computermediated environment. Visualization technology shows considerable promise for increasing the value of large-scale collections of information, as evidenced by several commercial applications of TreeMap (e.g., http://www.smartmoney.com) and Hyperbolic tree (e.g., http://www.inxightxom) to visualize large-scale hierarchical structures. Although the proliferation of visualization technologies dates from the 1990s when sophisticated hardware and software made increasingly 139 140 Annual Review of Information Science and Technology faster generation of graphical objects possible, the role of visual aids in facilitating the construction of mental images has a long history. Visualization has been used to communicate ideas, to monitor trends implicit in data, and to explore large volumes of data for hypothesis generation. Imagine traveling to a strange place without a map, having to memorize physical and chemical properties of an element without Mendeleyev’s periodic table, trying to understand the stock market without statistical diagrams, or browsing a collection of documents without interactive visual aids. A collection of information can lose its value simply because of the effort required for exhaustive exploration. Such frustrations can be overcome by visualization. Visualization can be classified as scientific visualization, software visualization, or information visualization. Although the data differ, the underlying techniques have much in common. They use the same elements (visual cues) and follow the same rules of combining visual cues to deliver patterns. They all involve understanding human perception (Encarnacao, Foley, Bryson, & Feiner, 1994) and require domain knowledge (Tufte, 1990). Because most decisions are based on unstructured information, such as text documents, Web pages, or e-mail messages, this chapter focuses on the visualization of unstructured textual documents. The chapter reviews information visualization techniques developed over the last decade and examines how they have been applied in different domains. The first section provides the background by describing visualization history and giving overviews of scientific, software, and information visualization as well as the perceptual aspects of visualization. The next section assesses important visualization techniques that convert abstract information into visual objects and facilitate navigation through displays on a computer screen. It also explores information analysis algorithms that can be applied to identify or extract salient visualizable structures from collections of information. Information visualization systems that integrate different types of technologies to address problems in different domains are then surveyed; and we move on to a survey and critique of visualization system evaluation studies. The chapter concludes with a summary and identification of future research directions. Overview of Visualization History and Background Although (computer-based) visualization is a relatively new research area, visualization has a long history. For instance, the first known map was created in the 12th century (Tegarden, 1999), and multidimensional representations appeared in the 19th century (Tufte, 1983). Bertin (1967) identified basic elements of diagrams in 1967, and Tufte (1983) published his theory regarding maximizing the density of useful information in Information Visualization 141 1983. Both Bertin’s and Tufte’s theories have had substantial impact on subsequent information visualization. Most early visualization research focused on statistical graphics (Card et al., 1999) until the data explosion of the 1980s when supercomputers were able to run complex simulation models and advanced scientific sensors also generated huge quantities of data (Nielson, 1991). Researchers from earth science, physics, chemistry, biology, and computer science turn to visualization for help in analyzing copious data and identifying patterns. The National Science Foundation (NSF) launched its “scientific visualization” initiative in 1985 (McCormick, Defanti, & Brown, 1987) and the Institute of Electrical and Electronic Engineers (IEEE) held its first visualization conference in 1990. At the same time, visualization technologies were being applied in many nonscientific contexts, including business, digital libraries, human behavior, and the Internet. As the application domains expanded and computer hardware and software became more powerful and affordable, visualization techniques continued to improve. Since 1990, a vast amount of nonscientific data has been generated as a consequence of easy information creation and the emergence of the Internet. The term “information visualization” was first used in Robertson, Card, and Mackinlay (1989) to denote the presentation of abstract information through a visual interface. Early information visualization systems emphasized interactivity and animation (Robertson, Card, & Mackinlay, 19931, interfaces to support dynamic queries (Shneiderman, 19941, and various layout algorithms on a computer screen (Lamping, Rao, & F’irolli, 1995). Later visualization systems presenting the subject hierarchy of the Internet (H. Chen, Houston, Sewell, & Schatz, 1998), summarizing the contents of a document (Hearst, 19951, describing online behaviors (Donath, 2002; Zhu & Chen, 2001), displaying Web site usage patterns (Eick, 2001), and visualizing the structures of a knowledge domain (C. Chen & Paul, 2001) have been stimulated by the networked and virtual nature of human society resulting from the adoption of advanced technologies. Information visualization is unquestionably an interdisciplinary research field. It integrates the understanding of domain knowledge and human visual perception with computer graphics techniques. It also needs the support of information analysis algorithms (H. Chen et al., 1998). After a decade of focusing on system development, the lack of thorough, summative approaches to evaluating existing visualization systems has become increasingly apparent (C. Chen & Czerwinski, 2000). Special issues of International Journal of Human-Computer Studies have demonstrated the level of effort being extended to tackle this issue (C. Chen & Czerwinski, 2000). We believe more disciplines will contribute to visualization research as the technology moves forward and application domains expand. 142 Annual Review of Information Science and Technology A Theoretical Foundation for Visualization Visualization research is important because the human eye can process many visual cues simultaneously. For example, humans can detect a single dark pixel in a 500 x 500 array of white pixels in less than a second. The display can be replaced every second by another, enabling a search of 15 million pixels in a minute (Ware, 2000). Also, people have a truly remarkable ability to recall pictorial images. In one study, Standing, Conezio, and Haber (1970) showed subjects 2,560 pictures, each for 10 seconds over seven hours, in a four-day period. Afterward, subjects were asked to classify pictures presented at a rate of 16 pictures per second and achieved better than 90 percent accuracy. People identify patterns through visual aids but may fail to do so when looking a t tables and numbers. However, the human visual system identifies patterns according to its own rules. Because patterns will be invisible if they are not presented in certain ways, understanding visual perception can be helpful in the design of visualization systems. Ware (2000) surveyed perception studies related to visualization. Believing that the best visualization is one that can help problem solving, Ware (2000) sketched a model of human memory by synthesizing studies by Card, Moran, and Newel1 (1983);Anderson, Matessa, and Lebiere (1997); Kieras and Meyer (1997); and Strothotte and Strothotte (1997). According to Ware (ZOOO), the human memory structure contains iconic, working, and long-term memories, each of which can be enhanced by visualization in a different way, Iconic memory is the memory buffer where pre-attentive processing operates. This involves a massive number of parallel processes that extract diverse visual cues for every visual point on the interface. Incoming visual information stays in iconic memory for less than a second before part of it is “read out” into working memory. Pre-attentive processing is important to visualization design because certain visual patterns can be detected a t this stage without having to go through the cognition process. The theory of visual processing channels and their independent status is fundamental to understanding pre-attentive processing (Ware, 2000). Many studies work with this theory to help make an object visually appealing to viewers. Visual cues such as color and proximity are independent of each other because they are processed in different visual channels. As such, they can be empIoyed independently to convey different attributes. This theory serves as the theoretical foundation for glyph representation (Chernoff, 1973). Other visual cues such as color and luminance can interfere with each other because their visual channels overlap. Gestalt laws (KoMEa, 1935) suggest several ways t o combine independent and related visual cues to deliver perceivable static patterns. Designing effective visualizations with computer animation also relies on understanding perception of motion patterns. Information Visualization 143 Working memory integrates information extracted from iconic memory with information loaded from long-term memory for problem solving. Abstract visual patterns perceived by preattentive processing are mapped into patterns of the information space a t this stage. Working memory holds information for pending tasks; people’s attention decides the space allocated to a task. Similar to the RAM (random access memory) of a computer, input and intermediate results of an ongoing operation are stored, but discarded once the task is accomplished. Visualization can augment the working memory in two ways, memory extension and visual cognition extension (Ware, 2000). The high bandwidth of visual input enables working memory to load external information a t the same speed as loading internal memory (Card et al., 1983; Kieras & Meyer, 1997). Visualization thus can serve as an external memory, saving space in the working memory. In addition to memory extension, visualization can facilitate internal computation. Because it makes solutions perceivable (Zhang, 19971, visualization reduces the cognitive load of mental reasoning and mental image construction that is necessary for certain tasks. Interaction with a visual interface can enhance such cognition extension. The best example is a computer aided design (CAD) system’s helping an engineer design a product without having to build it. Long-term memory stores information associated with a lifetime’s experiences. It is not just a repository of information; it is a network of linked concepts (Collins & Loftus, 1975;Yufik & Sheridan, 1996). The way this network is built determines whether certain ideas will be easier t o recall than others. A sketch of links between concepts is believed to be’an effective learning aid for students (Jonassen, Beissner, & Yacci, 1993). Using proximity to represent relationships among concepts in constructing a concept map has a long history in psychology (Shepard, 1962). Visualization systems such as Spatial Paradigm for Information Retrieval and Exploration (SPIRE) (Wise, Thomas, Pennock, Lantrip, Pottier, Schur, et al., 1995) and ET Map (H. Chen et al., 1998) also use proximity to indicate semantic relationships among concepts. Those systems generate from a large collection of text documents a concept map that can help users better understand the collection that is depicted. In summary, visualization augments iconic memory, working memory, and long-term memory in different ways. Psychologists and neuroscientists have conducted many related studies; a complete survey is beyond the scope of this chapter. Interested readers are referred to Ware 144 Annual Review of Information Science and Technology (2000). Most perception studies can be helpful to the design of visualization systems, but converting their results to design principles that can be applied immediately remains a challenge. Visua I izat ion Classification: Application Focus Visualization is commonly classified based on application focus. Categories usually include scientific visualization, software visualization, and information visualization. These categories are not mutually exclusive and have fuzzy boundaries. For instance, scientific visualization often involves visualizing the multidimensional attribute space of a physical object; this overlaps with information visualization, which delivers patterns embedded in large-scale information collections. Seesoft (Eick, Steffen, & Sumner, 1992) is a system monitoring the change of software code. It has been discussed in books about both information visualization (Card et al., 1999) and software visualization (Stasko, Domingue, Brown, & Price, 1998). The abstract nature of input leads Card et al. (1999) t o regard both software visualization and information visualization as information visualization. Scientific Visualization Scientific visualization helps scientists and engineers more efficiently understand physical phenomena embedded in large volumes of data (Nielson, 1991). The data may come from complex simulation models or from sensors such as satellites, medical scans, or telescopes. What distinguishes scientific visualization is the fact that it is always about physical objects. This condition provides natural counterparts such as the earth, the human body, the molecule, DNA, or an airplane to which the information can be mapped. Developing mathematical models t o describe physical objects plays an essential role in mapping information. Colors or other visual cues are usually added to a physical object to describe different attributes. Isosurfaces, volume rendering, and glyphs are commonly used techniques for the description of attributes in scientific visualization. Isosurfaces depict the distribution of certain attributes. One example is the use of color contours t o convey temperature distribution over a map. Volume rendering allows viewers to see the entire volume of 3-D data in a single image (Nielson, 1991). The 3-D data may come from medical magnetic resonance imaging (MRI), CAD, or remote sensing. Interaction between a visual display and its viewers directly affects the effectiveness of a volume-rendering visualization. Glyphs provide a way to display multiple attributes through combinations of various visual cues (Chernoff, 1973). Scientific visualization typically uses glyphs t o describe flow information. A commonly used glyph is an arrow (Fayyad, Grinstein, & Wierse, 2002). A map with arrows representing magnitude and direction of wind a t a place suggests the movement of air over a geographical area. Information Visualization 145 In addition to displaying distributions of attributes over a physical object, scientists and engineers also need visual aids to describe relationships among abstract attributes. Techniques used to visualize some of these attributes overlap with those used in information visualization and are discussed in the next subsection. Software Visualization and Informa tion Visualization Unlike scientific visualization, software visualization and information visualization usually do not have inherent geometries by which to map information. They share approaches t o representing abstract information on a computer screen. For instance, the TreeMap representation has been used to represent a hierarchical relationship in software (Jeffery, 1998), financial data (http:llwww.smartmoney.cordmarketmap),and Usenet messages (Smith & Fiore, 2001). However, each visualization type has its own application focus. Software visualization helps people understand and use computer software effectively (Stasko et al., 1998). Generally two types of software visualization are used, program visualization and algorithm animation. Program visualization, also positioned as a subfield of software engineering, helps programmers manage complex software (Baecker & Price, 1998). For instance, the Microsoft Windows 95 system has ten million lines of code, for which maintenance can be expensive. Program visualization tackles this problem by visualizing the source code (Baecker & Marcus, 1990), the data structure employed, changes made to the software (Eick et al., 1992), and run-time performance. Program visualization can be an effective tool for software maintenance, understanding, optimization, and debugging. Algorithm animation, on the other hand, is mainly used for education. Starting with the movie Sorting Out Sorting (Baecker, 1981), various algorithm animation systems have been developed to motivate and support the learning of computational algorithms. Information visualization helps users identify patterns, correlations, or clusters. The information visualized can be structured or unstructured. Structured information, usually in numerical format, has welldefined variables. Examples include business transaction data, Internet traffic data, and Web usage data. Visualization of this type focuses on graphical representation to reveal patterns. Early on, standard, static graphics such as line graphs, scatter plots, bar charts, or pie charts were used to enhance understanding of stored data. Widely used commercial tools including Spotfire (http:llwww.spotfire.com), SAS/GRAPH (http:/l www.sas.cordtechnologiesh~query-reporting/graph), SPSS (http://spss. com), ILOG (http://www.ilog.com), and Cognos (http://www. cognos.com) offer interactive visualizations t o help users gain value from structured information. The recent integration of this type of visualization with various data mining techniques has attracted attention, as huge volumes of data are 146 Annual Review of Information Science and Technology routinely being generated and stored in databases. Computerized visualizations are vehicles for the delivery of patterns or structures identified by data mining algorithms. Without visualization, such patterns or structures might be too complex to be understandable (Fayyad et al., 2002). Interaction between the visualization system and the user also permits the inclusion of human expertise or feedback in data mining, leading t o more effective data exploration. At the same time, data mining algorithms serve as preprocessors, finding appropriate perspectives and dimensions for visualization. Stronger interaction between visualization and data mining algorithms can be found in the systems of Thearling, Becker, and Decoste (2002) and Johnston (2002) where data mining models are visualized to help users understand back-end algorithms. Such interaction is usually employed to facilitate computational steering, a process defined as the ongoing intervention of users in the execution of an otherwise independent computational process (Parker, Johnson, & Beazley, 1997). Incorporating users’ skill and expertise, the computational steering approach may improve the efficiency and performance of a data-mining tool. Unstructured information, on the other hand, usually does not have well-defined variables. Examples of unstructured information include a collection of office documents, a collection of Web sites, or an e-mail archive. Unlike the visualization of structured information, this type of application often needs to identify variables (e.g., titles, locations, subject keywords) and to construct visualizable structures before the graphical representation. Several commercial visualization systems, including Vantage Point (http://www.thevantagepoint.com),SemioMap (http:l/ www.entrieva.com/entrieva), and Knowledgist (http://www.inventionmachinexom), have applied different information analysis technologies to understand the semantics of unstructured information. In summary, software visualization and information visualization transform data and map information into a visual space differently. But both use similar metaphors to represent abstract information and they adopt similar techniques for user-computer interactions. The next section provides detailed descriptions of those representation and interaction approaches. A Framewo rk for Inf ormat ion Visua Iizat ion TechnoIogies Previous studies have constructed various taxonomies to categorize visualization research from different perspectives. Chuah and Roth (1996) list the tasks of information visualization, and Bertin (1967) and Mackinlay (1986) describe the characteristics of basic visual variables and their applications to different data types. Card and Mackinlay (1997) expand the research of Bertin (1967) and Mackinlay (1986) by constructing a data type-based taxonomy. Based on the features of data domains, the taxonomy divides the visualization field into several categories: scientific visualization, geographic information systems (GIs), Information Visualization 147 multidimensional tables, information landscapes and spaces, nodes and links, trees, and text. Although a taxonomy based on data type may help the implementer select appropriate visualization technologies, the taxonomy Chi (2000) has proposed indicates how to apply these technologies. Chi (2000) breaks the visualization data pipeline into four distinct stages: value, analytic abstraction, visual abstraction, and view. Visualization techniques are thus classified based on the data stage at which they are applied. Chi (2000) contends that there are three types of techniques for transforming data from one stage to the next and four types of technology operating within each stage. The technologies applied at the early data stages extract or construct visualizable structures; those applied at later stages convert these structures into visual metaphors and provide appropriate user-interface interactions. Chi (2000) surveys thirty visualization systems and lists the technologies they apply at different data stages. Because the development of a visualization system usually integrates several techniques, it may be helpful to provide a framework of visualization technologies based on their functionalities. Shneiderman (1996) identified two aspects of visualization technology that can be directly applied to a given structure. One focuses on mapping abstract information to a visual representation and the other provides user-interface interactions for effective navigation over displays on a screen. To fulfill users’ requirements, visualization systems usually combine techniques from these two aspects. However, as indicated by C. Chen (19991, when it comes to visualizing unstructured or high-dimensional information, another set of technologies is needed to create structures that characterize the data set. Along with representation and user-interface interaction, information analysis technology also helps support a visualization system. It serves as a preprocessor, deciding what is to be displayed on a computer screen. Such automatic preprocessing becomes especially critical when manual preprocessing is not ’ possible. The remainder of this section reviews the three research dimensions that support the development of an information visualization system: information representation, user-interface interaction, and information analysis. The framework described in this section is consistent with Chi’s (2000) taxonomy, but focuses more on the characteristics of technologies available in each dimension. information Representation Shneiderman (1996) proposed seven types of representation methods: the 1-D, 2-D, 3-D, multidimensional, tree, network, and temporal approaches. We use this framework to review related research. 9 The 1-D approach represents abstract information as onedimensional visual objects and displays them on the screen in a linear (Eick et al., 1992; Hearst, 1995) or a circular (Salton, 148 Annual Review of Information Science and Technology Allan, Buckley, & Singhal, 1995) manner. Representation in 1D has been used to display either the contents of a single document (Hearst, 1995; Salton et al., 1995) or to provide an overview of a document collection (Eick et al., 1992). Colors usually represent some attributes of each visual object. For instance, colors indicate document type in the SeeSoft system (Eick et al., 1992) and depict the location in a document of search terms in TileBars (Hearst, 1995).A second axis may also play a role, presenting some characteristic of each visual object. One example is the SeeSoft system that piles up documents on the x-axis and uses the y-axis to visualize the number of lines in each document. Figure 4.1' displays an interface from the TdeBars system that shows the occurrence of search terms in documents. The darkness of each tile indicates the frequency of a search term in a document. Figure 4.1 TileBars uses a l - D approach to show term-document relevance (http:llwww.acm.org/sigchilchi95lElectronicldocumnts/ paperslmahPfg4.gif, 0 1995 ACM, Inc.). Figure available in color at http:llwww.asis.orglPublicationslARISTlvol39ZhuFigures.html Information Visua I izat ion 149 A 2-D approach represents information a s two-dimensional visual objects. Visualization systems based on 2-D output of a self-organizing map (SOM) (Kohonen, 1995) belong to this category. Such systems display categories created over a large collection of textual documents, with the layout of each category based on its location in the two-dimensional area of the SOM. Spatial proximity on the interface represents the semantic proximity of the categories created. The challenge in this approach is to help users deal with the large number of categories that will have been created for the mass textual data. A 3-D approach represents information as three-dimensional visual objects. One example is the WebBook system (Card, Robertson, & York, 1996) that folds Web pages into threedimensional books. Realistic metaphors such as rooms (Card et al., 1996), bookshelves (Card et al., 1996), or buildings (Andrews, 1995) are employed to depict abstract information. Visualization systems using a 3-D version of a tree or network representation also belong to this category. One example is the 3-D hyperbolic tree created by Munzner (2000) to visualize large-scale hierarchical relationships. Figures 4.2 and 4.3 show screenshots of WebBook and WebForager, respectively, where the book metaphor is applied to organize Web pages from the same Web site and the WebForager provides a workspace t o place books in use. The multidimensional approach represents information as multidimensional objects and projects them into a three-dimensional Figure 4.2 The WebBook (http://acm.org/sigchi/chi96lproceedings/papers/ Cardlskcltxt.htm1, 0 1996 ACM, Inc.). Figure available in color at http://www.asis.org/Publications/ARIST/vol39~huFigures.html 150 Annual Review of Information Science and Technology Figure 4.3 The WebForager (http://www.acm.org/sigchi/chi96/proceedings/ papers/Card/skcltxt.htrnl,0 1996 ACM, Inc.). Figure available in color at http://www.asis.org/Publications/ARlST/vol39Zhu Figureshtrnl or a two-dimensional space. This approach often represents textual documents a s a set of key terms that identify the theme of a textual collection. A dimensionality reduction algorithm, such as multidimensional scaling (MDS), hierarchical clustering, Kmeans algorithms, or principle components analysis, is used to project document clusters or themes that have been sorted into a two-dimensional or three-dimensional space. The SPIRE system presented in Wise et al. (1995) and the VxInsight system (Boyack, Wylie, & Davidson, 2002) belong to this category. Figures 4.4 and 4.5 display two types of visualization developed for the SPIRE system. The Galaxy (Figure 4.4) clusters 567,437 abstracts of cancer literature based on the semantic similarity; the Themeview (Figure 4.5) visualizes relationships among topics of a document collection. Glyph representation, another type of multidimensional representation, uses graphical objects or symbols to represent data through visual parameters t h a t are spatial (positions x or y ) , retinal (color and size), or temporal (Chernoff, 1973). I t has been used in various social visualization techniques (Donath, 2002) to describe human behavior during computer-mediated communication (CMC). 9 The tree approach is often used to represent hierarchical relationships. The most common example is a n indented text list. Other tree structure systems include the Tree-Map (Johnson & Shneiderman, 19911, t h e Cone Tree system (Robertson, Mackinlay, & Card, 19911, and the Hyperbolic Tree (Lamping e t Information Visualization 151 Figure 4.4 Galaxy visualization of text documents (http://www.pnl.gov/ infoviz/galLcancer800.gif, reprinted with permission). Figure available in color at http://www.asis.org/Publications/ARIST/ vol39ZhuFigures.html al., 1995). One crucial challenge to this approach is that the number of nodes grows exponentially as the number of tree levels increases. As a consequence, different layout algorithms have been applied. For instance, the Tree-Map (Johnson & Shneiderman, 1991) allocates space according to attributes of nodes, while the Cone Tree (Robertson et al., 1991) takes advantage of the 3-D visual structure to pack more nodes on the screen. Figure 4.6 displays the visual interface of the Cat-aCone system (Hearst & Karadi, 1997) that applies the 3-D Cone Tree to visualize hierarchies in Yahoo!. The Hyperbolic Tree (Lamping et al., 1995), on the other hand, projects subtrees on a hyperbolic plane and puts the plane into the range of display. A 3-D version of the hyperbolic tree has also been developed by Munzner (2000) to visualize large-scale hierarchies (Figure 4.7). The network representation method is often applied when a simple tree structure is insufficient for representing complex relationships. Complexity is evident, for example, in citations 152 Annual Review of Information Science and Technology Figure 4.5 ThemeView.The height of a peak indicates the strength of a given topic in the collection of documents (http://www.pnl.gov/ inofviz/theme~cnn800.gif,reprinted with permission). Figure available in color at http://www.asis.org/Publications/ARIST/ vol39ZhuFigures.html among academic papers (C. Chen & Paul, 2001; Mackinlay, Rao, & Card, 1995) or among textual documents that are distributed over, and linked by, the Internet (Andrews, 1995). Various network visualizations have been created to represent citation relationships (Mackinlay e t al., 1995) or to display the World Wide Web (Andrews, 1995). The spring-embedder model, originally proposed by Eades (1984), along with its variants (Davidson & Harel, 1996; Fruchterman & Reingold, 19911, have become the most popular drawing algorithms €or network relationships. Figure 4.8 presents the visualization of coauthorship among 555 scientists using a spring-embedder equivalent algorithm. The temporal approach visualizes information based on temporal order. Location and animation are two commonly used visual variables to reveal the temporal aspect of information. Visual objects are usually listed along one axis according to the Information Visualization 153 Figure 4.6 Cat-a-Cone tree that displays hierarchies in Yahoo!. The label of a node can be brought to the foreground with a click (http:// www.sims.berkeley.edu/-hearst/cac-overview.html,O 1997 ACM, Inc.). Figure available in color at http://www.asis.org/ Publications/ARIST/vol39ZhuFigures.html time when they occurred, while the other axis may be used to display the attributes of each temporal object (Eick et al., 1992; Robertson et al., 1993). For instance, the Perspective Wall (Robertson et al., 1993) lists objects along the x-axis based on time sequence and presents attributes along the y-axis. Using animation is another way to display temporal information. In the VxInsight system (Boyack et al., 20021, the landscape changes its appearance as a user chooses a different point of time on a time-slider. The seven types of representation methods turn abstract textual documents into objects that can be displayed. A visualization system usually applies several methods at the same time. For instance, there are 2-D hyperbolic trees and 3-D hyperbolic trees. The multilevel ET map created by H. Chen et al. (1998) combines both 2-D and the tree structure, where a large set of Web sites is partitioned into hierarchical categories based on the sites’ content. The entire hierarchy is organized in a tree 154 Annual Review of Information Science and Technology Figure 4.7 A 3-D hyberbolic space (http://graphics.stanford.edu/papers/ munzner-thesislhyp-figshtml, reprinted with permission of Tamara Munzner). Figure available in color at http://www.asis. org/Publications/ARIST/vol39ZhuFigures.html structure, and each node in the tree is a two-dimensional SOM on which the subcategories are displayed graphically. Some representation methods also need to have a precise information analysis technique at the back end. For instance, the TileBar system (Hearst, 1995) employs a text-tiling analysis algorithm to segment a document; alternatively, the Themeview and Galaxy (Wise et al., 1995) use multidimensional scaling to cluster and lay out documents on the screen. The “small screen problem” (Robertson et al., 1993) is common to representation methods of any type. To be effective, a representation method needs to be integrated with the user interface. Recent advances in hardware and software allow rapid user-interface interaction, and various combinations of representation methods and user interface interactions have been employed. For instance, Cone Tree (Robertson et al., 1991) applies 3-D animation to provide direct manipulation of visual Information Visualization 155 Figure 4.8 Visualization of a large-scale co-authorship network (http://mpifg-koeln.mpg.de:80/-lk/netvis/Huge.html, reprinted with permission of Lothar Krempel). Figure available in color at http://www. asis.org/Publications/ARlST/vol39ZhuFigures.html objects, and Lamping et al. (1995) integrate hyperbolic projection with the fish-eye view technique to visualize a large hierarchy. User-Interface Interaction Immediate interaction between a n interface and its users not only allows direct manipulation of the visual objects displayed, but also allows users to select what is to be displayed and what is not (Card et al., 1999). Shneiderman (1996) summarizes six types of interface functionality: overview, zoom, filtering, details on demand, relate, and history. Techniques have been developed to facilitate various types of interactions and this subsection briefly reviews the two most commonly used interaction approaches: overview + detail and focus + context (Card et al., 1999). Overview + detail provides multiple views, with the first being a n overview: providing overall patterns to users. Details about the part of interest to the user can be displayed. These views may be displayed a t the same time or separately. When a detailed view is needed, two types of zooming are usually involved (Card et al., 1999): spatial zooming and semantic zooming. Spatial zooming refers to the process of enlarging 156 Annual Review of Information Science and Technology selected visual objects to obtain a closer look, whereas semantic zooming provides additional information about a selected visual object by changing its appearance. The focus + context technique provides detail (focus)and overview (context) dynamically on the same view. One example is the 3-D perception approach adopted by systems such as Information Landscape (Andrews, 1995) and Cone Tree (Robertson et al., 1991), where visual objects at the front appear larger than those at the back. Another commonly used focus + context technique is the fish-eye view (Furnas, 1986), a distortion technique that acts like a wide-angle lens to amplify part of the display. The objective is t o simultaneously provide neighboring information in reduced detail and supply greater detail on the region of interest. In any focus + context approach, users can change the region of focus dynamically. A system that applies the fish-eye technique is the Hyperbolic Tree (Lamping et al., 1995), in which users can scrutinize the focus area and scan the surrounding nodes for the big picture. Other focus + context techniques include filtering, highlighting, and selective aggregation (Card et al., 1999). Overview + detail and focus + context are the two types of interaction usually provided by a visualization system to help users deal with large volumes of information. In formation Analysis Confronted with large quantities of unstructured information, an information visualization system needs to apply information analysis to reduce complexity and to extract salient structure. Such an application often consists of two stages, indexing and analysis. The indexing stage aims to extract the semantics of information to represent its content. Different preprocessing algorithms are needed for different media types, including text (natural language processing), image (color, shape, and texture-based segmentation), audio (indexing by sound and pitch), and video (scene segmentation). This subsection briefly reviews selected approaches to textual document processing. Automatic indexing (Salton, 1989) is a method commonly used to represent the content of each document as a vector of key terms. When implemented using multiword (or multiphrase) matching (Girardi & Ibrahim, 1993), a natural language processing noun-phrasing technique can capture a rich linguistic representation of document content ( h i c k & Vaithyanathan, 1997). Most noun phrasing techniques rely on a combination of part-of-speech-tagging (POST) and grammatical phraseforming rules. This approach has the potential to improve precision over other document indexing techniques. Examples of noun-phrasing tools include the Massachusetts Institute of Technology’s Chopper, Nptool (Voutilainen, 1997),and the Arizona Noun Phraser (Tolle & Chen, 2000). +formation extraction is another way to identify useful information from text documents automatically. It extracts names of entities of Information Visualization 157 interest, such as persons (e.g., “John Doe”), locations (e.g., “Washington, D.C.”), and organizations (e.g., “National Science Foundation”) from textual documents. It also identifies other entities, such as dates, times, number expressions, dollar amounts, e-mail addresses, and Web addresses (URLs). Such information can be extracted based on either human-created rules or statistical patterns occurring in the text. Most existing information extraction approaches combine machine learning algorithms such as neural networks, decision trees (Baluja, Mittal, & Sukthankar, 1999), hidden Markov models (Miller, Leek, & Schwartz, 1999), and entropy maximization (Borthwick, Sterling, Agichtein, & Grishman, 1998) with a rule-based or a statistical approach. The best systems have been shown to achieve more than 90 percent accuracy in both precision and recall rates when extracting persons, locations, organizations, dates, times, currencies, and percentages from a collection of New York Times articles (Chinchor, 1998). At the analysis stage, classification, and clustering are commonly used to identify embedded patterns. Classification assigns objects into predefined groups (using supervised learning), whereas clustering aggregates objects dynamically based on their similarities (unsupervised learning). Both methods generate groups by analyzing characteristics of objects extracted a t the indexing stage. Widely used classification methods include the naive Bayesian method (Koller & Sahami, 1997; Lewis & Ringuette, 1994; McCallum, Nigam, Rennie, & Seymore, 1999), k-nearest neighbor (Iwayama & Tokunaga, 1995; Masand, Linoff, & Waltz, 1992), and network models (Lam & Lee, 1999; Ng, Goh, & Low, 1997; Wiener, Pedersen, & Weigend, 1995). Unlike classification, clustering determines groups dynamically. A commonly used clustering algorithm is Kohonen’s self-organizing map, which produces a two-dimensional grid representation for N-dimensional features and has been widely applied in information retrieval (Kohonen, 1995; Lin, Soergel, & Marchionini, 1991; Omig, Chen, & Nunamaker, 1997). Other popular clustering algorithms include multidimensional scaling, the k-nearest neighbor method, Ward’s algorithm (Ward, 19631, and the K-means algorithm. Information analysis represents each textual document with semantically rich phrases or entities (indexes) and identifies interesting patterns by using classification and clustering algorithms. Supporting a visualization system with these methods of analysis enables the system to deal with larger and more complex collections of information. Emerging Information Visualization Applications Information visualization can be applied to any domain where people need to extract insights from a vast amount of information. This is evidenced by the publication of several new books. Bederson and Shneiderman (2003) document various applications of visualization developed a t the University of Maryland; Borner and Chen (2003) record 158 Annual Review of Information Science and Technology different visualization applications in the development of digital libraries. In addition, C . Chen (1999) describes many visualization applications in virtual environments. This section explores various approaches to building visualization systems in the domains of digital libraries, the Web, and virtual communities, where large amounts of information are routinely generated. Digital Library Visualization Digital library research aims a t enhancing information collection by facilitating access to, and the exploration of, stored information. A digital library may contain millions of objects including journal papers, books, maps, photographs, films, videos, and audio recordings. Because standard search engine techniques are no longer sufficient for accessing information in digital libraries, visualization can be applied to support both the browsing and the searching activities of users. Browsing a Digital Library Browsing is a way to retrieve information when a user does not have a specific goal (H. Chen et al., 1998; Marchionini, 1987). Visualization supports browsing by providing an effective overview that summarizes the contents of a collection. Interaction techniques are employed t o lead a user to information of interest. Providing a subject hierarchy is a conventional way to help browse information in a digital library. For example, MEDLINE, the largest and most widely used medical bibliographic database in the world, utilizes the vocabulary of the Medical Subject Headings (MeSH) to index its textual documents manually and organizes MeSH terms into 15 hierarchies called the MeSH tree structures (Lowe & Barnett, 1994).A user can traverse the MeSH tree to locate appropriate medical terms. Such a largescale subject hierarchy can readily become unmanageable because users can easily become lost when scrolling through the headings (Lowe & Barnett, 1994). Several visualization systems have been developed to display this large-scale hierarchy more effectively. The MeSHBROWSE system (Korn & Shneiderman, 1995) enables users to browse a subset of the MeSH tree interactively. Subcategories for a selected category are displayed, but the two-dimensional tree representation employed suffers from the problem of limited space. To utilize the space on a computer screen more effectively, Hearst and Karadi (1997) proposed using a three-dimensional Cone Tree and animation to display the MeSH tree. However, being able to display a large-scale hierarchy is not enough. Both the MeSHBROWSE system and the 3-D Cone Tree rely on a MeSH tree that is manually generated. This approach cannot be adopted for other digital libraries unless there is an existing subject hierarchy. In addition, manual generation of a subject hierarchy is not only expensive but also too slow to catch emerging topics in a timely fashion. Information Visualization 159 Figure 4.9 The Interface of CancerMap. Category “Liver Neoplasms” was selected at the top level and the submap of “Liver Neoplasms” was displayed. Figure available in color at http://www.asis.org/ Pub1ications/ARIST/vol39Zhufigures. html 160 Annual Review of Information Science and Technology The CancerMap system described by H. Chen, Lally, Zhu, and Chau (2003) adopted the SOM and Arizona Noun Phraser (Tolle & Chen, 2000) approaches to generate a subject hierarchy automatically. Figure 4.9 presents two consecutive screen shots, displaying the top-level categories and subcategories under the category of “Liver Neoplasms.” The empirical study described by H. Chen et al. (2003) indicates that this approach generated a meaningful subject hierarchy to supplement or enhance human-generated hierarchies in digital libraries. The interface applies the overview + detail approach by combining the 2-D display of SOM with a l-D text-based alphabetic display. Such a combination appears t o be a promising approach to visualizing large-scale subject hierarchies (Ong, Chen, Sung, & Zhu, in press). Users can find a correct path systematically when using a l-D display. The 2-D SOM map also provides more visual cues and delivers richer information about each node within a hierarchy by using spatial location to illustrate semantic relationships among categories (size for the number of documents within a category and color for the number of levels beneath a category). These features allow easy comparison of categories on the same level. It appears that the best strategy for using the interface is to use the l-D display for path management when traversing a hierarchy and to use the 2-D SOM map to compare categories at the same level (Ong et al., in press). In addition to subject hierarchy, other approaches to support browsing behavior have been proposed. If all documents are geo-referenced, users may browse a digital library by geographical locations. A clickable geographic map can serve as an overview (Cai, 2002). Christoffel and Schmitt (2002) built a virtual-reality interface to simulate a real-world library, aiming to provide an environment that would be familiar to users, who could navigate the interface as if walking in a library. Searching a Digital Library When a user has a specific goal, searching rather than browsing is often the preferred mode of interaction. Visualization can support searching behavior in two ways: query specification and search results analysis. Providing a subject hierarchy not only facilitates browsing but also suggests appropriate query terms for searching. Users can combine the terms in the hierarchy to specify their queries. Visualization approaches to providing an overview may also be applied to help users organize search results. For instance, H. Chen, Chau, and Zeng (2002) used dynamic SOM to categorize search results based on content. Other visualization systems such as VIBE (Olsen, Korfhage, & Sochats, 1993) and TileBars (Hearst, 1995) provide visual cues to indicate the extent of match between a document returned and a query term. The VIBE system displays both documents and search terms, with the spatial distance between a document and a term indicating their semantic relationshipt h i shorter the distance the stronger the relationship. TileBars, on the other hand, uses grayscale colors to indicate the frequency of search Information Visualization 161 terms in a document (Figure 4.1). Visualization can also help users maintain their search results. For example, in Hearst and Karadi (1997), the system organizes documents returned into a book, with the book cover showing the search terms, thereby helping store and manage search results (Figure 4.4). With the proliferation of digital library content and services, we believe that visualization can significantly enhance the value of a digital library by facilitating browsing and searching. Web Visualization The vast Web information space has probably become the most dominant information and communication resource for both academic researchers and the general public. Its rapid growth and constant changes also have posed a formidable challenge to visualization research. Involving both academic and commercial efforts, Web visualization aims to provide a more effective way to access and maintain the Web. Two types of Web visualization, visualization of a single Web site and visualization of a collection of Web sites, will be discussed in the remainder of this subsection. Visualization of a Single Web Site The structure of a Web site can be visualized to provide “table of contents” information for effective Web site surfing and maintenance. Most sites have site maps for this purpose, but designing an effective graphical site map remains challenging, especially when a site may contain thousands of pages. A tree metaphor is commonly used to represent the hierarchical structure of a Web site. Visualizations such as the StarTree by InXight Software (http://www.inxight.com), the SiteBrain by Brain Technologies Corporation (http://mappa.mundi.net), and the Z-factor site map of Dynamic Diagrams (http://www.dynamicdiagrams.com)all employ a tree representation but differ in the type of tree used. Visual cues such as color, shape, or icon are applied to describe the attributes of a tree node in the hierarchy. The attributes may include the title of a page, the status of a page (the date of latest update), the type of page (text or image), or usage. A visualization system selects certain attributes for display based on the intended functionality. It may also link nodes with arrows in the tree to describe traffic direction within a Web site (Cugini & Scholtz, 1999). Eick (2001) describes several visual interfaces, all of which use the hyperbolic tree + fish-eye view approach. One interface assigns labels to nodes to construct a site map; another represents a node with a 3-D vertical line indicating the usage of the Web page. In addition, Chi, Pitkow, Mackinlay, Pirolli, Gossweiler, and Card (1998) used several Cone Trees along the x-axis in chronological order to depict the temporal evolution of a Web site. Each Cone Tree presents the usage pattern of the site over a four-week period, with colors to describe the usage of each Web page. Figure 4.10 shows an example of Web site visualization for a company (bestbuy.com)based on a hyperbolic tree. 162 Annual Review of information Science and Technology Figure 4.10 A graphical site map. StarTree (by InXight), which applies hyperbolic tree and fish-eye view algorithms, was used to visualize a Web site’s structure. Colors were used to distinguish sub-trees (http://inxight.com/products/oem/star-tree/ demos.php). Figure available in color at http://www.asis.org/ Publications/ARIST/vol39ZhuFigures.html Most existing visualization systems for a single Web site apply a tree representation and use visual cues to describe each page, relying on users to identify patterns. The challenge faced with this type of visualization is the same as that faced by the tree representation: how can a very large-scale tree be displayed on a computer screen in an understandable way. Almost no information analysis technology is involved because the tree metaphor appears to be a natural representation of the hierarchical structure of a Web site. However, as Web log analysis becomes more popular for understanding online behavior, visualization of a single Web site may need to apply information analysis technology Information Visualization 163 to identify and display patterns embedded in the Web log data. Those patterns may include user demographics, browsing behaviors, and online purchases. Visualization for a Collection of Web Sites (and Web Pages) The common goal for visualizing collections of Web sites is to support information exploration over the Internet. Systems like ET map (H. Chen et al., 1998) organize Web pages based on content, applying the output of a self-organizing map to project categories. Other visualization systems organize cyberspace based on the link structure among Web pages (Andrews, 1995; Bray, 1996). Three-dimensional icons are presented on a two-dimensional map, where each icon represents a single Web page or a Web site. The mapping of 3-D icons is based on predefined hierarchical categories (Andrews, 1995) or on the strength of linkages among Web sites (Bray, 1996). Visual cues are supplied to each 3-D icon to represent attributes including size, type, number of incomingloutgoing links, and title of the Web page or Web site. Intensive computation is usually conducted to preprocess Web pages before visualization. ET Map (H. Chen et al., 1998) used automatic indexing to represent the content of a Web page and SOM to generate the subject hierarchy. Bray (1996) calculated links among Web sites to measure the “visibility” (number of links pointing to the site) and the “luminosity” (number of outgoing links) of each Web site. With the ever-increasing quantity of Web sites and Web content, Web visualization promises to be a fertile ground for information visualization research. Virtual Community Visualization The Internet not only opens the door to information foraging but also offers new communication media such as e-mail, discussion groups, news groups, and chat rooms. These new media facilitate communication across geographical and time boundaries, stimulating the formation of virtual communities or new social networks centered on common interests and beliefs. The archives of communication contain rich information about discussion content and participant behavior, information that can be processed and displayed. The proliferation of computermediated communication and online communities inevitably poses challenges to people trying to locate a particular person or community, retrieve useful information from an archive, or manage their own communication archives. Many visualization systems have been developed to cope with these issues. Visualization systems in this area generally belong to one of two categories: tools for communication management and tools for community analysis. ContactMap (Whittaker, Jones, & Terveen, 2002) and Chat Circles (Donath, Rarahalios, & Viega, 1999) belong to the first category. The ContactMap system acts like a visual address book with all contacts 164 Annual Review of Information Science and Technology displayed on the computer screen as icons. An icon contains a picture and a name. A user can assign a n icon to one or more predefined groups and that icon is mapped on the screen according to its groups. Interactions with a contact can be retrieved by a click on its icon. While ContactMap helps people manage their social networks, the Chat Circles system helps users form subgroups in a chat room. It assigns each user a colored circle enclosing text. The user needs to move his or her own circle closer to another circle in order to “speak to” and “hear” t h a t person. Chat Circles 2 offers the capability of tracing the path of a circle in a chat room. Figure 4.11 presents a screen shot of Chat Circles 2 where the local user is “media lab.” Hollow circles represent other people far away from the local user and semi-transparent, faded circles show the traces of people who have chatted on that spot before. Figure 4.1 1 Interface of Chat Circles 2 (http:llchatcircles.media.mit.edu, reprinted with permission of Judith Donath). Figure available in color at http:llwww.asis.orglPublicationslARIST/vol39Zhu Figures.html Information Visualization 165 Both ContactMap and Chat Circles facilitate communication within a community, but users may also need help to identify and to understand a community. Visualization systems such as the Loom (Donath et al., 1999),Conversation Map (Sack, ZOOO), Netscan Dashboard, and Netscan Treemap visualize the Usenet, the most popular discussion space on the Internet. Both the Loom and Conversation Map apply information analysis technology before visualization. The Loom system uses 2-D representation to describe the temporal patterns of postings in Usenet. Messages are mapped according to the sender and the time of posting. A rule-based algorithm is applied to classify messages into four categories: angry, peaceful, informational, and other. Conversation Map depicts a community by displaying its social and semantic relationships using the network metaphor. Information analysis techniques are applied to construct a semantic network. Message structure and quotation analysis are employed for constructing the social networks. As part of the Netscan project in Microsoft Research, both Netscan Dashboard and Netscan Treemap use tree representation to describe different aspects of online discussion groups. Netscan Dashboard employs a conventional 2-D tree structure to display the hierarchical structure of a thread, while Netscan Treemap uses Treemap (Shneiderman, 1994) to present hierarchical relationships among Usenet newsgroups. These relationships can be inferred from the name of a newsgroup; the size of a node corresponds with the number of postings in a group. PeopleGarden (Xiong & Donath, 1999) uses glyphs to summarize the social activity of a community. A flower metaphor is used to represent participants, with the number of petals representing the number of postings by the participant and the height of the flower conveying the length of time that the individual stays. As a community becomes a garden, the overall activity of this community can be seen a t a glance. CommunicationGarden combines just such a floral representation with SOM t o describe the liveliness of each subtopic within a community and to help locate the most active persons in a certain area. Active participants may not be the most knowledgeable, but will probably be the most helpful. Figures 4.12a, b, and c display the visualization components of the CommunicationGarden system: Content Summary (Figure 4.12a), Interaction Summary (Figure 4.12b), and Expert Indicator (Figure 4.12~).Each type displays a certain aspect of a computer-mediated communication process. In addition, Content Summary, Interaction Summary, and Expert Indicator divide their display panels into subgardens based on the output of SOM and the Arizona Noun Phraser. Thus each subgarden represents one subtopic. Evaluation Research for Information Visualization In spite of a decade of innovative visualization systems development, evaluation research for information visualization is still at an early stage ( C . Chen & Yu, 2000). Our literature survey identified two types of 166 Annual Review of Information Science and Technology Figure 4.1 2a Content Summary. The x-axis represents time; categories generated by the SOM are laid vertically. Each dark line represents one message. The vertical thickness of each subtopic indicates its activity on a particular day. The length in the x-dimension of each subtopic represents the time duration of that subtopic. Figure available in color at http://www.asis. org/Publications/ARlST/vol39ZhuFigures. html empirical study in information visualization: 1)empirical usability studies that aim to understand the pros and cons of specific visualization designs or systems, and 2) fundamental perception studies t h a t try to investigate basic perceptual effects of certain visualization factors or stimuli. As the consequence of both the diversity of visualization systems and the relative novelty of computer-based visualization, stringent metrics-based evaluations such as those adopted in TREC (Text REtrieval Conference) or MUC (Message Understanding Conference) (Chinchor, 1998) are nonexistent. Empirical Usability Studies Most empirical usability studies employ laboratory experiments to validate the performance of visualization systems and designs, for example, comparing a glyph-based interface and a text-based interface (Zhu & Chen, 2001), comparing different visualization techniques (Stasko, Catrambone, Guzdial, & McDonald, ZOOO), or studying a visualization system in a working environment (Graham, Kennedy, & Hand, 2000; Pohl & Purgathofer, 2000). Information Visualization 167 Figure 4.12b Interaction Summary. The panel is divided into subgardens based on the SOM output. Each subgarden is a subtopic. Each flower represents one thread, where the number of petals represents the number of messages posted for the thread the number of leaves represents the number of participants in the thread and the height of the flowers represents the time duration of the thread. Figure available in color at http://www. asis.org/Publications/ARIST/vol39ZhuFigures.html Studies such as Stasko e t al. (2000),Graham et al. (2000), Morse and Lewis (2000), and Zhu and Chen (2001) use simple but basic visual operations for evaluation. Sometimes referred to as the “de-featuring approach,” these studies examine generic operations such a s searching objects with a given attribute value, specifying the attributes of a n object, clustering objects based on similarity, counting objects, and visual object comparison. Accuracy of operation results and time to completion are two commonly used measures. Taking such a n approach would make i t easier to design a n evaluation study and to attribute the task performance to differences in visualization designs. However, because of the complexity of real-life system interface tasks, the validity of such a design and the applicability of research conclusions are sometimes questioned by practitioners. For example, several studies have been conducted to evaluate popular tree representations such as Hyperbolic Tree (Pirolli, Card, & Van Der Wege, SOOO), Treemap (Stasko e t al., ZOOO), multilevel SOM (Ong et al., in press), and Microsoft Windows Explorer. These studies all involve simple visual operations of node searching and node comparison. Representations such as Treemap 168 Annual Review of Information Science and Technology Figure 4 . 1 2 ~ Expert Indicator. The interface is divided into subgardens based on the SOM output. Each subgarden is a subtopic. Each flower represents one person, where the number of petals represents the number of messages posted by this person for this subtopic the number of leaves represents the number of threads participated by this person in the subtopic and the height of flowers represents how long this person has stayed in this subtopic. Figure available in color at http://www. asis.org/Publications/ARIST/vol39ZhuFigures.html are multilevel SOM effective for node-comparison operations because they offer more visual cues for each node, while hyperbolic tree and Microsoft Windows Explorer, providing a global picture, are more effective in supporting node-searching operations. But how these basic nodesearching and node-comparison operations are related to a user’s real-life, complex searching or browsing tasks is unclear. Complex, realistic, task-driven evaluation studies have been conducted frequently in visualization research, for example, Pohl and Purgathofer (2000); Risden, Czerwinski, Munsner, and Cook, (2000);and North and Shneiderman (2000). The experimental tasks are based on functionalities that the visualization system aims to provide. Subjects conduct tasks such a s maintaining a hierarchy of subject categories Information Visualization 169 (Risden et al., ZOOO), writing a paper (Pohl & Purgathofer, 20001, or selecting appropriate visualization methods to display different information (North & Shneiderman, 2000). The usefulness of a given visualization system can be directly measured by this approach, but it is difficult to identify or isolate the visualization factors that contribute to user performance (partially due to the intertwining nature of the system, task, and user). Although laboratory experimentation has been useful in information visualization research, we believe other well-grounded behavioral methods such as protocol analysis (to identify qualitative observations and comments), individual and focus group interviews (to solicit general feedback and group responses), ethnographic studies (to record behaviors and organizational cultures), and technology and system acceptance surveys (to understand group or organizational adoption process) also need to be considered. Instead of relying on a one-time, quantitative laboratory experiment, visualization researchers can triangulate and substantiate their findings using qualitative, long-term assessment methodologies. Fundamental Perception Studies and Theory Building Unlike empirical usability studies, fundamental perception studies are grounded in psychology and neuroscience. Theories from those disciplines are used to understand the perceptual impact of such visualization parameters as animation (Bederson & Boltman, 1999), information density (Pirolli et al., ZOOO), 3-D effect (Tavanti & Lind 20011, and combinations of visual cues (Nowell, Schulman, & Hix, 2002). What distinguishes this type of study from conventional perception studies is that it usually involves some form of computer-based visualization. For instance, Bederson and Boltman (1999) used the Pad++ program to study the impact of animation on users' learning of hierarchical relationships; a hyperbolic tree with fish-eye view was applied by Pirolli et al. (2000) to study the effect of information density. Hypotheses, tasks, and measures are developed under the guidance of theories from psychology. However, because of the unique system, task, and perception factor combinations, results may be applied only to the particular visualization system under study. A well-grounded visualization theory and research framework that can be used to guide visualization system development is urgently needed. Summary and Future Directions This chapter has reviewed information visualization research based on a framework of information representation, user-interface interaction, and information analysis. We have presented the field's history, theoretical foundations, and three important, emerging application domains: digital libraries, the Web, and virtual communities. We summarized the 170 Annual Review of Information Science and Technology status of visualization system evaluation research and suggested areas for future research, in particular, long-term, qualitative, theorygrounded evaluation studies. Although this chapter focuses on the visualization of textual information, many associated techniques can be applied to multimedia visualization. For example, the visualization system described in Christel, Cubilo, Gunaratne, Jerome, 0, and Solanki (2002) applied the video indexing and segmentation techniques developed a t the Carnegie Mellon University Informedia project (Wactlar, Christel, Gong, & Hauptmann, 1999) to help users browse video digital libraries. In summary, information visualization can help people gain insights from large-scale collections of unstructured information. Developments in computer hardware and software will not only advance information visualization technology but also stimulate wider adoption. Even though more-and more innovative-visualization systems are expected to be developed soon, there is also a critical need for advancing visualization theories and evaluation. Lastly, we suggest several promising research areas that could benefit from information visualization research: visual data mining, virtual reality-based visualization, and visualization for knowledge management. Visual Data Mining Visual data mining enables users t o identify patterns that a data mining algorithm might find difficult to locate. Visualization could play two types of roles in a data mining tool. It could support interaction between users and data-the exploration of an unknown data set. Integrated with such user-interface interaction approaches as zooming or fish-eye view, representation methods such as scatter plot, parallel coordinates, glyphs, and self-organizing maps can be applied to project data (Simoff, 2002). Visualization can also support interaction with the analytical process and output of a data mining system. Such interaction can incorporate human expertise and judgment (Hinneburg, Keim, & Wawryniuk, 1999; Niwa, Fujikawa, Tanaka, & Oyama, 2001; Wong, 19991, which may be critical to the performance of a system but impossible to incorporate in computer code. How to integrate data mining algorithms and various visualization techniques seamlessly in an effective analytical process is still a pressing research challenge. Virtual Reality- Based (Immersive) Visualization Although most visualization tools rely on human visual perception to deliver patterns, virtual reality (or immersive) technology tries to take advantage of the entire range of human perceptions, including auditory and tactile sensations. However, in addition to the technological challenges such as inpuVoutput devices, virtual reality research still faces many human factors challenges, such as individual differences, inpuYsensor overload, and cyber-sickness (Kalawsky, 1993; Stanney, Information Visualization 171 1995). In spite of such challenges, we believe that the current generation, which has grown up with Internet surfing and video games, will be more ready to adopt future virtual reality-based visualization technologies. Visualization for Knowledge Management Visualization can support knowledge management by facilitating knowledge sharing and knowledge creation. Knowledge itself is difficult to visualize because it often exists only in someone’s mind (referred to as tacit knowledge) (Nonaka, 1994). Visualization can accelerate internalization by presenting information in an appropriate format or structure or by helping users find, relate, and consolidate information (and thus helping to form knowledge) (C. Chen & Paul, 2001; Cohen, Maglio, & Barrett, 1998; Foner, 1997; Vivacqua, 1999).As knowledge management, data mining, and knowledge discovery research advances, we may begin to move from “information visualization” to “knowledge visualization.” Endnote 1. The figures in this chapter are available in color at http://www.asis.org/Publications/ ARIST/vol39ZhuFigure~.html References Anderson, J. R., Matessa, M., & Lebiere, C. (1997).ACT-R: A theory of higher-level cognition and its relation to visual attention. HumanComputer Interaction, 12, 439-462. Andrews, K. (1995).Visualizing cyberspace: Information visualization in the Harmony Internet browser. Proceedings of InfoVis’95, IEEE Symposium on Information Visualization, 97-104. Anick, P. G., & Vaithyanathan, S. (1997).Exploiting clustering and phrases for contextbased information retrieval. Proceedings of the ACM SIGIR Annual International Conference on Research and Development in Information Retrieual, 314-323. Baecker, R. (1981).Sorting out sorting [Film]. Toronto, Canada: Dynamic Graphics Project, University of Toronto. Baecker, R., & Marcus, A. (1990).Human factors and typography for more readable programs. Reading, MA: Addison-Wesley. Baecker, R., & Price, B. (1998).The early history of software visualization. In J. Stasko, J. Domingue, M. H. Brown, & B. A. Price (Eds.), Software uisualization: Programming as a multimedia experience (pp. 29-34). Cambridge, MA: MIT Press. Baluja, S., Mittal, V., & Sukthankar, R. (1999).Applying machine learning for high performance named-entity extraction. Proceedings of the Conference of the Pacific Association for Computational Linguistics, 365-378. Bederson, B. B., & Boltman, A. (1999).Does animation help users build mental maps of spatial information? Proceedings of the IEEE Symposium on Information Visualization, 28-35. Bederson, B. B., & Shneiderman, B. (2003).The craft of information uisualization: Readings and reflection. San Francisco: Morgan Kaufmann. Bertin, J. (1967).Semiology of graphics: Diagrams, networks, maps. Madison: University of Wisconsin Press. Borner, K., & Chen, C. (2003).Visual interfaces to digital libraries. Berlin: Springer-Verlag. 172 Annual Review of Information Science and Technology Borthwick, A,, Sterling, J., Agichtein, E., & Grishman, R. (1998). NYU: Description of the MENE named entity system as used in MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7). Retrieved January 3, 2004, from http://www.itl. nist.gov/iad/894.02/related_projects/muc/proceedings/muc~7~toc. html Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Domain visualization using VxInsight for science and technology management. Journal of the American Society for Information Science and Technology, 53(9), 764-774. Bray, T. (1996). Measuring the Web. Computer Networks and ISDN systems, 28(711), 992-1004. Cai, G. (2002). GeoVIBE: A visual interface for geographic digital libraries. In K. Borner & C. Chen (Eds.), Visual interfaces to digital libraries (pp. 161-170). Berlin: SpringerVerlag. Card, S. K., & Mackinlay, J . D. (1997). The structure of the information visualization design space. Proceedings of IEEE Symposium on Information Visualization 1997 (InfoVis'97), 92-99. Card, S. K., Mackinlay, J . D., & Shneiderman, B. (1999). Readings in information uisualization: Using vision to think. San Francisco: Morgan Kaufmann. Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: L. Erlbaum. Card, S. K., Robertson, G . G., & York, W. (1996). The WebBook and the WebForager: An information workspace for the World Wide Web. Proceedings of the ACM SIGCHI Conference on Human Factors i n Computing Systems (CHI'961, 111-117. Chen, C. (1999). Information visualization and virtual enuironm2nts. Berlin: SpringerVerlag. Chen, C., & Czerwinski, M. P. (2000). Empirical evaluation of information visualizations: An introduction. International Journal of Human-Computer Studies, 53, 631-635. Chen, C., & Paul, R. J. (2001). Visualizing a knowledge domain's intellectual structure. IEEE Computer, 34(3), 65-71. Chen C., & Yu,Y.(2000). Empirical studies of information visualization: A meta-analysis. International Journal of Human-Computer Studies, 53,851-866. Chen, H., Chau, M., & Zeng, D. (2002). CI Spider: A tool for competitive intelligence on the Web. Decision Support Systems, 34(1), 1-17. Chen, H., Houston, A. L., Sewell, R. R., & Schatz, B. R. (1998). Internet browsing and searching: User evaluation of category map and concept space techniques. Journal of the American Society for Information Science, 49(7),582403. Chen, H., Lally, A,, Zhu, B., & Chau, M. (2003). HelpfulMed: Intelligent searching for medical information over the Internet. Journal of the American Society for Information Science and Technology, 54(7), 683-694. Chernoff, H.(1973). The use of faces to represent points in K-dimensional space graphically. Journal of American Statistical Association, 68(342), 361-368. Chi, E. H.(2000). A taxonomy of visualization techniques using the data state reference model. Proceedings o f the IWWW Symposium on Information VisuaZization 2000 (ZnfoVis'OO), 69-75. Chi, E. H.,Pitkow, J., Mackinlay, J., Pirolli, P., Gossweiler, R., & Card, S. K. (1998). Visualizing the evolution of Web ecologies. Proceedings of ACM Conference on Human Factors in Computing Systems fCHI'98), 400-407. Chinchor, N.A. (1998). Overview of MUC-"/MET-2. Proceedings of the Seventh Message Understanding Conference (MUC-7). Retrieved January 3, 2004, from http://www.itl.nist.gov/iad/894.02/related~projects/muc/proceedings/muc~7~toc.html Christel, M. G., Cubilo, P., Gunaratne, J., Jerome, W., 0, E., & Solanki, S. (2002). Evaluating a digital video library Web interface. Proceedings of the 2nd ACM-IEEE Joint Conference on Digital Libraries, 389. Information Visualization 173 Christoffel, M., & Schmitt, B. (2002).Accessing libraries as easy as a game. In K. Borner & C. Chen (Eds.), Visual interfaces to digital libraries (pp. 25-38).Berlin: Springer-Verlag. Chuah, M. C., & Roth, S. F. (1996).On the semantics of interactive visualizations. Proceedings of IEEE Symposium on Information Visualization 1996 (INFOVIS ’961, 29-36. Cohen, A. L., Maglio, P. P., & Barrett, R. (1998).The expertise browser: How to leverage distributed organizational knowledge. Workshop on Collaborative Information Seeking, Conference on Computer-Supported Cooperative Work (CSCW981. Retrieved January 3, 2004, from http://domino.watson.ibm.com/cambridge/research.nsf/2b4f81291401771 785256976004a8d13/aa5a4c44~619d3c852566~006b~33?0penDocument Collins, A. M., & Loftus, E. F. (1975).A spreading activation theory of semantic processing. Psychological Review, 82,407-428. Cugini, J., & Scholtz J. (1999).VISVIP: 3D visualization of paths through Web sites. Proceedings of the International Workshop on Web-based Information Visualization (WebVis’99), 259-263. Retrieved December 12,2003,from http://www.itl.nist.gov/iaui/ wrg/cugini/webmet/visvip/webvis-paper.htm1 Davidson, R., & Harel, D. (1996).Drawing graph nicely using simulated annealing. ACM fiansactions on Graphics, 15(4),301-331. Donath, J . (2002).Supporting community and building social capital: A semantic approach to visualizing online conversations. Communications of the ACM, 45(4),45-49. Donath, J., Rarahalios, K., & Viega, F. (1999). Visualizing conversation. Journal of Computer-Mediated Communication, 4. Retrieved December 12, 2003, from http://www.ascusc.org/jcmdvol4/issue4/donath.html Eades, P. (1984).A heuristic for graph drawing. Congressus Numerantium, 42,149-209. Eick, S.G. (2001).Visualizing online activity. Communications of the ACM, 44(8),45-50. Eick, S.G., Steffen, J . L., & Sumner, E. E. (1992).Seesoft:Atool for visualizing line-oriented software. IEEE Transactions on Software Engineering, 18(11),11-18. Encarnacao, J., Foley, J. D., Bryson, S., & Feiner, S. K. (1994).Research issues in perception and user interfaces. IEEE Computer Graphics and Applications, 14(2),67-69. Fayyad, U.,Grinstein, G. G., & Wierse,A. (2002).Information visualization in data mining and knowledge discovery. San Francisco: Morgan Kaufmann. Foner, L. N. (1997).Yeta: A multi-agency, referral-based matchmaking system. Proceedings ofInternationa1 Conference on Autonomous Agents, 301-307. Fruchterman, T.M. J., & Reingold, E. M. (1991).Graph drawing by force-directed placement. Software Practice and Experience, 21, 1129-1192. Furnas, G. W. (1986).Generalized fisheye views. Proceedings of the ACM Conference on Human Factors in Computing Systems, 16-23. Gershon, N.,Eick, S. G., & Card, S. (1998). Design: Information visualization. ACM Interactions, 5(2),9-15. Girardi, M. R.,& Ibrahim, B. (1993).An approach to improving the effectiveness of software retrieval. Proceedings of 3rd Annual Irvine Software Symposium, 89-100. Graham, M., Kennedy, J., & Hand, C. (2000). A comparison of set-based and graph-based visualizations. International Journal of Human-Computer Studies, 53, 789-807. Hearst, M. (1995).TileBars: Visualization of term distribution information in full text information access. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 59-66. Hearst, M. A,, & Karadi, C. (1997).Cat-a-Cone: An interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. Proceedings of the ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, 246-255. Hinneburg, A,, Keim, D. A,, & Wawryniuk, M. (1999,September/October). HD-Eye: Visual mining of high-dimensional data. IEEE Computer Graphics and Application, 22-31. 174 Annual Review of Information Science and Technology Iwayama, M., & Tokunaga, T. (1995). Cluster-based text categorization: A comparison of category search strategies. Proceedings of the ACM SIGIR 18th Annual International Conference on Research and Development i n Information Retrieval, 273-281. Jeffery, C. L. (1998). A menagerie of program visualization techniques. In J. Stasko, J. Domingue, M. H. Brown, & B. A. Price (Eds.), Software visualization: Programming as MIT Press. a multimedia experience (pp. 73-79). Cambridge, W. Johnson B., & Shneiderman, B. (1991). Tree-maps: A space-filling approach to the visualization of hierarchical information structures. Proceedings of IEEE Visualization’Sl Conference, 284-291. Johnston, W. (2002). Model visualization. In U. Fayyad, G. G. Grinstein, &A. Wierse (Eds.), Information visualization in data mining and knowledge discovery (pp. 223-228). San Francisco: Morgan Kaufmann. Jonassen, D. H., Beissner, K., & Yacci, M. A. (1993). Structural knowledge: Techniques for conveying, assessing, and acquiring structural knowledge. Hillsdale, NJ: L. Erlbaum. Kalawsky, R. S. (1993). The science of virtual reality. Workingham, U K Addison-Wesley. Keim, D. A. (2001).Visual exploration of large data sets. Communications of the ACM, 44(8), 39-44. Kieras, D. E., & Meyer, D. E. (1997).An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12,391-438. KoMka, K. (1935). Principles of Gestalt psychology. New York: Harcourt-Brace. Kohonen, T. (1995). Self-organizing maps. Berlin: Springer-Verlag. Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. Proceedings of the 14th International Conference on Machine Learning (ZCML‘971, 170-178. Korn, F., & Shneiderman, B. (1995). Navigating terminology hierarchies to access a digital library of medical images (Technical Report HCIL-TR-94-03). College Park, MD: University of Maryland. Lam, S. L. Y., & Lee, D. L. (1999). Feature reduction for neural network based text categorization. Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA ’991, 195-202. Lamping, J . , Rao, R., & Pirolli, P. (1995). A focus + context technique based on hyperbolic geometry for visualizing large hierarchies. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 401-408. Lewis, D. D., & Ringuette, M. (1994). Comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval iSDAZR’941, 81-93. Lin, X., Soergel, D., & Marchionini, G. (1991). A self-organizing semantic map for information retrieval. Proceedings of the ACM SIGIR 14th Annual International Conference on Research and Development i n Information Retrieval, 262-269. Lowe, H. J., & Barnett, G. 0. (1994). Understanding and using the Medical Subject Headings (MeSH) vocabulary to perform literature searches. Journal of the American Medical Association, 271, 1103-1108. MacEachren, M. (1991). The role of maps i n spatial knowledge acquisition. The Cartographic Journal, 28, 152-162. Mackinlay, J . D. (1986). Automating the design of graphical presentations of relational information. ACM Dunsuctions on Graphics, 5, 110-141. Mackinlay, J . D., Rao, R., & Card, S . K. (1995).An organic user interface for searching citation links. Proceedings of the ACM Conference on Human Factors in Computing Systems, 67-73. Nfarchionini, G. (1987). An invitation to browse: Designing full text systems for novice users. Canadian Journal of Information Science, 12(3),69-79. Information Visualization 175 Masand, B., Linoff, G., & Waltz, D. (1992).Classifying news stories using memory based reasoning. Proceedings of the ACM SIGIR 15th Annual International Conference on Research a n d Development in Information Retrieual, 59-64. McCallum, A,, Nigam, K., Rennie, J., & Seymore, K. (1999).A machine learning approach to building domain-specific search engines. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAZ'99), 662-667. McCormick, B. H., Defanti, T. A., & Brown, M. D. (1987).Visualization in scientific computing. Computer Graphics, 21(6),1-14. Miller, A. I. (1984).Imagery in scientific thought: Creating 20th century physics. Boston: Birkauser. Miller, D. R. H., Leek, T., & Schwartz, R. M. (1999).A hidden Markov model information retrieval system. Proceedings of the ACM SIGIR 22nd Annual International Conference on Research a n d Development in Information Retrieval (SZGIR '991,214-221. Morse, E., & Lewis, M. (2000). Evaluating visualizations: Using a taxonomic guide. International Journal of HumanComputer Studies, 53,637-662. Munzner, T. (2000).Interactive visualization of large graphs and networks. Unpublished doctoral dissertation, Stanford University. Ng, H. T., Goh, W. B., & Low, K. L. (1997).Feature selection, perception learning, and a usability case study for text categorization. Proceedings of the ACM SIGIR 20th Annual International Conference on Research a n d Development in Information Retrieval, 67-73. Nielson, G. M. (1991).Visualization in science and engineering computation. IEEE Computer, 6(1),15-23. Niwa, T.,Fujikawa, K., Tanaka, K., & Oyama, M. (2001).Visual data mining using a constellation graph. In s. J. Simoff, M. Noirhomme-Fraiture, & M. H. Bohlen (Eds.), Proceedings of the International Workshop on Visual Data Mining. Retrieved January 3, 2004, from http://www-staff.it.uts.edu.au/-simeon/vdm~pkdd2OOl/web-proceedings/ 03-niwa.pdf Nonaka, I. (1994).A dynamic theory of organizational knowledge creation. Organization Science, 5(1),14-37. North, C., & Shneiderman, B. (2000).Snap-together visualization: Can users construct and operate coordinated visualizations? International Journal of Human-Computer Studies, 53,715-739. Nowell, L., Schulman, R., & Hix, D. (2002).Graphical encoding for information visualization: An empirical study. Proceedings of the IEEE Symposium' on Information Visualization (ZNFOVIS'O2), 43-50. Olsen, K. A., Korfhage, R. R., & Sochats, K. M. (1993).Visualization of a document collection: The VIBE system. Information Processing & Management, 29, 69-81. Ong, T. H., Chen, H., Sung, W. K., & Zhu, B. (in press). NewsMap: A knowledge map for online news. Decision Support Systems. Orwig, R., Chen, H., & Nunamaker, J . F. (1997).A graphical self-organizing approach to classifylng electronic meeting output. Journal of the American Society for Information Science, 48(2), 157-170. Parker, S.,Johnson, C., & Beazley, D. (1997).Computational steering software systems and strategies. IEEE Computational Science & Engineering, 4(4),50-59. Pirolli, P., Card, S. K., & Van Der Wege, K. M. (2000).The effect of information scent on searching information visualizations of large tree structures. Proceedings of the Working Conference on Advanced Visual Interfaces (AVZ 2000), 161-172. Pohl, M., & Purgathofer, P. (2000).Hypertext authoring and visualization. International Journal of HumanComputer Studies, 53,809-825. Risden, K., Czerwinski, M. P., Munsner, T., & Cook, D. D. (2000).An initial examination of ease of use for 2D and 3D information visualizations of Web content. International Journal of HumanComputer Studies, 53,695-714. 176 Annual Review of Information Science and Technology Robertson, G. G., Card, S. K., & Mackinlay, J . D. (1989). The cognitive co-processor for interactive user interfaces. Proceedings of UIST89, ACM Symposium on User Interface Software and Technology, 10-18. Robertson, G. G., Card, S. K., & Mackinlay, J . D. (1993). Information visualization using 3D interactive animation. Communications of the ACM, 36(4), 56-71. Robertson, G. G., Mackinlay, J. D., & Card, S. K. (1991). Cone Trees: Animated 3D visualizations of hierarchical information. Proceedings of the ACM SIGCHI Conference on Human Factors i n Computing Systems, 189-194. Sack, W. (2000). Conversation Map: A content-based Usenet newsgroup browser. Proceedings of the 5th International Conference on Intelligent User Interfaces, 233-240. Salton, G. (1989). Automatic text processing. Reading, MA: Addison-Wesley. Salton, G., Allan, J., Buckley, C., & Singhal, A. (1995). Automatic analysis, theme generation, and summarization of machine-readable text. Science, 264(3), 1421-1426. Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with unknown distance function: Part I. Psychometrika, 27(2), 125-140. Shneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 11(6), 70-77. Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualization. Proceedings of IEEE Workshop on Visual Languages, 336-343. Simoff, S. J . (2002). VDM@ECMWKDD2001:The International Workshop on Visual Data Mining at ECMWKDD 2001. ACM SIGKDD Explorations Newsletter, 3(2), 78-81. Smith, M., & Fiore A. T. (2001). Visualization components for persistent conversation. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’OI), 13G143. Standing, L., Conezio, I., & Haber, R. N. (1970). Perception and memory for pictures: Single trial learning of 2500 visual stimuli. Psychonomic Science, 19(2),73-74. Stanney, K. (1995). Realizing the full potential of virtual reality: Human factors issues that could stand in the way. Proceedings of IEEE Virtual Reality Annual International Symposium (VRAIS95), 28-34. Stasko, J., Catrambone, R., Guzdial, M., & McDonald, K. (2000). An evaluation of space-filling information visualizations for depicting hierarchical structures. International Journal of Human-Computer Studies, 53, 663-695. Stasko, J., Domingue, J., Brown, M. H., & Price, B. A. (1998). Software visualization: Programming as a multimedia experience. Cambridge, MA: MIT Press. Strothotte, C . , & Strothotte, T. (1997). Seeing between the pixels. Berlin: Springer-Verlag. Tavanti, M., & Lind, M. (2001). 2D vs. 3D: Implications on spatial memory. Proceedings of the IEEE Symposium on Information Visualization (INFOVIS’OI), 139-148. Tegarden, D. P. (1999). Business information visualization. Communications of the Association for Information Systems, 1, Article 4. Retrieved January 29, 2004, from http://cais.isworld.org/articles/default.asp?vol= l&art=4 Thearling K., Becker B., & Decoste, D. (2002). Visualizing data mining models. In U. Fayyad, G. G. Grinstein, & A. Wierse (Eds.), Information visualization in data mining and knowledge discovery (pp. 205-222). San Francisco: Morgan Kaufmann. Tolle, K. M., & Chen, H. (2000). Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science, 51(4), 352-370. Tufte, E, R. (1983). The visual display of quantitative information. Cheshire, C T Graphics Press. Tufte, E. R. (1990). Envisioning information. Cheshire, C T Graphics Press. Vivacqua, A. S. (1999). Agents for expertise location. Proceedings of the AAAI Spring Symposium on Intelligent Agents i n Cyberspace, 9-13. Information Visualization 177 Voutilainen, A. (1997). A short introduction to Nptool. Retrieved December 12, 2003, from http://www.lingsoft.fddoc/nttool/intro Wactlar, H. D., Christel, M. G., Gong, Y., & Hauptmann, A. G. (1999). Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2), 66-73. Ward, J. (1963). Hierarchical grouping to optimize a n objection function. Journal of the American Statistical Association, 58, 236-244. Ware, C. (2000). Information visualization perception for design. San Francisco: Morgan Kaufmann. Whittaker, S., Jones, Q., & Terveen, L. (2002). Managing long term communications: Conversation and contact management. Proceedings of the 35th Annual Hawaii International Conference on System Sciences, 115b. Wiener, E., Pedersen, J. O., & Weigend, A. S. (1995). A neural network approach to topic spotting. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95), 23-34. Wise, J. A,, Thomas, J. J.,Pennock, K., Lantrip, D., Pottier, M., Schur, A., & Crow, V. (1995). Visualizing the non-visual: Spatial analysis and interaction with information from text documents. Proceedings of InfoVis’95, IEEE Symposium on Information Visualization, 51-58. Wong, P. C. (1999, SeptembedOctober). Visual data mining. IEEE Computer Graphics and Application, 2-3. Xiong, R., & Donath, J. (1999). Creating data portraits for users. Proceedings o f the 12th Annual ACM Symposium on User Interface Software and Technology, 37-44. Yufik, Y. M., & Sheridan, T. B. (1996). Virtual networks: New framework for operator modeling and interface optimization in complex supervisory control systems. Annual Reviews in Control, 20, 179-195. Zhang, J. (1997). The nature of external representations in problem solving. Cognitive Science, 21(2), 179-217. Zhu, B., & Chen, H. (2001). Social visualization for computer-mediated communication: A knowledge management perspective. Proceedings o f the Eleventh Workshop on Information Technologies and Systems (WITS ’011, 23-28.