Measuring the Cognitive Overload Factor of a Data Visualization

A simple empirical method to compute COF, illustrated

Murali Kashaboina
Towards Data Science

--

Photo By ThisisEngineering RAEng on Unsplash

TL;DR: An empirical method to compute the cognitive overload factor of data visualization using an evaluation rubric is proposed. The evaluation criteria and scoring method used by the rubric are explained. An example data visualization is used to demonstrate the method. Further extensions and customization of the evaluation rubric are recommended.

Data Visualization

Data visualization transforms raw data into visual representations such as graphs and charts to effectively communicate the information of interest to the target audiences, which otherwise would be hard to interpret directly from the raw data. Representing raw data in visual forms narrates a story to target audiences through data findings, summarized data views, synthesized information, quantitative and qualitative trends, and other patterns mined from the data.

One crucial aspect of compelling storytelling is a reduction of cognitive overload. Cognitive overload occurs when the volume of information supply exceeds the information processing capacity of an individual. Essentially, cognitive overload happens when the information presented triggers excessive thinking on the part of the individual trying to synthesize and comprehend the information. Such overburdening of mental capacities would lead to severe discomfort, frustration, and inefficient decision-making. What matters most is how easy or hard it is for the target audiences to process the visual information cognitively to comprehend it. The presence of clutter in the visual, the inability of the visual to present the intended information clearly, the lack of well-designed aesthetics, and other visual deficiencies can cause significant cognitive overload for the consumer defeating the purpose of visualization.

Primary Challenge

The primary challenge is a lack of cognitive quality measurement of visualized data. Such lack of an objective method adds subjectivity in interpreting cognitive quality because the perceived cognitive overload differs significantly from one individual to another.

Proposed Approach

The proposed approach is an empirical method using a simple evaluation rubric to compute the cognitive overload quantitatively. The consequent empirical cognitive overload factor (COF) is a numeric value on a ratio scale between 0 and 1.

Evaluation Rubric

A rubric is an assessment and scoring tool that defines specific evaluation criteria, markers to measure the levels to which the evaluation criteria are met, and scoring scales to grade the measuring markers. A rubric is a reliable tool for assessing an object of interest using specific and highly relevant evaluation criteria. Therefore, the evaluation criteria in a rubric are critical to the effectiveness of the use of the rubric.

The developed rubric contains nine evaluative criteria across the rows and five measuring markers across the columns, forming a 9x5 evaluation grid. The five-column markers measure if an evaluation criterion meets the expectations — not addressed, does not meet expectations, approaches expectations, meets expectations, and exceeds expectations. Each evaluation marker is assigned a score on a 4-point scale. A score of 0 is assigned to ‘not addressed,’ a score of 1 to ‘does not meet expectations,’ a score of 2 to ‘approaches expectations,’ 3 to ‘meets expectations,’ and a score of 4 to ‘exceeds expectations.’ The explanations of each measurement marker applied to each evaluation criterion are defined in the rubric’s corresponding cells, as shown in Figure 1.

Figure 1: A Rubric for Evaluating Cognitive Overload Factor of Data Visualizations

Evaluation Criteria

The nine main criteria, as shown in the first column in Figure 1, help evaluate a data visualization output on the Gestalt Principles of visual perception: similarity, proximity, closure, enclosure, connection, continuity, the presence of clutter in the visual, ability of the visual to present the intended information clearly, and the design aesthetics of the visual representation.

Gestalt Similarity Principle Criterion
The Gestalt Similarity Principle criterion assesses if the visualization groups similar elements using similar sizes, shapes, and colors. Such grouping of similar elements helps humans easily identify the similarities and differences in visual representations.

Gestalt Proximity Principle Criterion
The Gestalt Proximity Principle criterion assesses if the visual presents the same group elements closer to each other. Such proximity helps humans classify elements closer together as belonging to the same group.

Gestalt Closure Principle Criterion
The Gestalt Closure Principle criterion evaluates if the elements of the visual are presented using full shapes. Past studies have shown that humans prefer complete shapes to comprehend an element. If the element is represented using incomplete shapes, humans tend to fill the gaps to interpret the element causing unexpected cognitive overload.

Gestalt Enclosure Principle Criterion
The Gestalt Enclosure Principle criterion assesses if the visual presents elements of the same group enclosed or bound together. Such enclosures help humans easily perceive elements belonging together, easing cognitive comprehension.

Gestalt Connection Principle Criterion
The Gestalt Connection Principle criterion evaluates if the visual presents elements of the same group connected together. Such connections help human audiences easily classify elements belonging to the same group.

Gestalt Continuity Principle Criterion
The Gestalt Continuity Principle criterion evaluates if the elements in the visual form a complete image when they appear to be a continuation of the others. Such visualization helps human audiences perceive the complete image formed from continuous elements. Any discontinuity results in humans trying to complete the image, causing unwanted cognitive overload.

Presence of Clutter Criterion
This criterion evaluates if the visual is too cluttered. The visual can become cluttered because of unnecessary chart borders, grid-lines, data labels, markers, noisy axis labels, inconsistent color schemes, etc. Clutter causes cognitive overload and distracts the audience from the primary information.

Clear Communication of Intended Information
This critical criterion evaluates if the visualization presents data by highlighting the important content, reducing the noise and eliminating distractions, and creating a clear hierarchy of information flow.

Furthermore, the visual should address the primary information topic’s ‘who,’ ‘what,’ and ‘how’ attributes. The ‘who’ attribute helps identify the target audience early in the process and helps understand the specific needs of those target audiences that data visualization should address. The narrower the target audience, the more specific and compelling the narrated data story can be. The ‘what’ attribute helps understand what information the target audience expects and what actions and decisions are expected from the target audience through the visualized data. Such a ‘what’ aspect helps realize the true purpose of visualizing data in tune with the target audience’s expectations. The ‘how’ attribute helps address how the available data will be processed and transformed to highlight the points of interest for the audiences through the data visualization.

Overall Aesthetics Criterion
The Overall Aesthetics criterion evaluates the thoughtful use of color schemes, consistent vertical and horizontal alignment of elements, and optimal use of white spaces not to clutter with noisy information. Aesthetically designed visuals can have a pleasing impact on the target audiences reducing cognitive overload.

Cognitive Overload Factor (COF)

The COF of data visualization can be calculated based on its secured total score (S) on the rubric. The maximum score any data visualization can secure is 36. The proposed empirical formula for COF is:

COF = 1 - (Secured Total Score/Maximum Score) => 1 - (S/36)

A COF close to 0 implies negligible cognitive overload. A COF close to 1 indicates the worst cognitive overload. A COF of about 0.3 could be optimal since cognitive overload may not be eliminated entirely. Further experimental evaluations are needed to determine a more accurate optimal COF.

COF Evaluators

It is recommended that the COF of a data visualization be evaluated by its specific target audiences or consumers. Target audiences are better suited because they know the visualization’s ‘who,’ ‘what,’ and ‘how’ attributes and the overall visualization context. Individuals from the target group can evaluate COF independently, and an average of all the evaluated values can be considered an estimated COF.

Example Evaluation

Figure 2 shows an example data visualization output comparing the monthly and yearly temperatures in Chicago, Illinois. The visualization has two sections. The first section is a bubble-chart showing months with highest average temperatures in year 2019. The second section is a heat-map showing months with highest temperatures recorded in years between 2009 and 2019.

Figure 2: Comparison of Monthly and Yearly Temperatures in Chicago, Illinois

Evidently, the target audiences of the example visualization are data scientists who are specifically interested in Chicago’s historical temperatures. They own specific contexts for such temperature information and therefore, are well-suited to evaluate the COF of the example data visualization more effectively.

Evaluation Using the Rubric
A data scientist who belongs to the target audience group could use the rubric to score the above clustering visualization, as shown in Figure 3. The hypothetical total score evaluated by such a data scientist is 23. Likewise, other data scientists’ hypothetical scores could be 18, 27, 21, 24, and 20. The average score is 22.17. Therefore, the estimated COF using the empirical formula is 0.3843, implying that cognitive overload is close to optimal and the visualization can be enhanced to improve the COF further.

Figure 3: Hypothetical Evaluation By A Medical Data Scientist

Recommendations

The COF evaluation rubric suggested in this article used the Gestalt Principles of visual perception, the presence of clutter in the visual, the ability of the visual to present the intended information clearly, and the design aesthetics of the visual representation. The rubric formed a simple 9x5 evaluation grid with nine evaluation criteria across the rows and five measuring markers across the columns. The scoring method was intentionally kept simple to help frame a COF quantification model and demonstrate the concept. It is recommended that interested readers extend this model further with additional finer criteria and customize it based on specific needs.

References

Best practices for data analytics reporting lifecycles: Quality in report building and data validation. (2018). Journal of AHIMA, 89(9), 40–45.

Carter, M. (2013). Designing science presentations: A visual guide to figures, papers, slides, posters, and more. Elsevier Science & Technology.

Knaflic, C. N. (2015). Storytelling with data : A data visualization guide for business professionals. Wiley.

Vetter, T. R. (2017). Descriptive statistics: Reporting the answers to the 5 basic questions of who, what, why, when, where, and a sixth, so what? National Library of Medicine, 125(5), 1797–1802.

--

--

AI Aficionado/Practitioner | Entrepreneur | Industry Speaker | PhD Researcher AI/ML/DS