This post is part of a series on visualizing data.
Selecting an appropriate chart type to present data visually is one of three critical steps to creating an effective visual vessel of information. The second step requires that supporting details about the data are provided. Axis labels, legends, units and other elements deliver the context within which to interpret results. Lack of such details requires the audience to make assumptions about the data, or utterly confuses them; in either case, it is a failure of clear communication. Here we discuss the building blocks of graphs, how to implement them effectively, and illustrate suggestions with a couple of examples. In the figure above, inspired by Stephen Few, we illustrate the many different elements that compose various charts and which we reference in our current discussion.
Axis, category and unit labels
Axes reflect what kind of information is presented and ought to be labeled. Numerical axis labels should include the units a data are measured in. For example, if company revenue across different continents is reported, do the numbers represent profit in dollars, yen, rubles, or some other currency? When a measure from a standard tool, such as a Net Promoter Score or System Usability Scale (SUS), is reported, the metric should be referenced in the axis label in lieu of units.
Axis that represents data category does not necessarily need an axis label, but must include category labels if more than one data point is shown. For instance, if monthly revenue is presented, months would serve as category labels for each numerical value shown in the chart while axis label (“Months”) can be omitted. When a single data point is shown, either an axis or a category label is sufficient. See elements B through G on the figure above.
Scale, numerical axis and tick marks
A scale divides an axis into equal segments, and tick marks denote those even segments of the scale. Tick marks on quantitative scales establish where on the axis specific number values are placed. Intervals that are nice round numbers, such as 10s or 100s, make it easy to read the chart. Tick marks should be designed to minimize visual clutter while still allowing the reader to quickly reference a data point to its value. A good test for too few tick marks is whether the audience can quickly and easily extract (an approximate) value of a particular data element in the chart. If not, then the number of tick marks should be increased.
With very few exceptions, numerical axis should always include 0 for appropriate representation of values. Manipulating the size of scale increments and its minimum and maximum points will distort the data. If the goal is to demonstrate a difference between data points, appropriate statistical procedures instead of manipulation of numerical axes should be used for supporting evidence.
Tick marks are also used on categorical axes. However, because category labels already denote distinct data points, tick marks serve no additional purpose and should be eliminated to reduce visual clutter. See elements G and J on the figure above.
When more than one category of data is presented in a chart, a legend informs what different colors or patterns represent. Occasionally, variables are labeled directly on the chart, omitting the need for a legend. Line graphs can often be shown without a legend by labeling categories next to the lines that are used to represent them. However, the choice to label categories directly on the graph should be made after considering data and graph design complexity; because a legend reduces visual clutter, it often is a better choice than labeling categories on a chart. A legend can be omitted when a single category is shown and category labels are used to describe the data.
Legends should be placed closely enough to the data components for easy reference, but in a way that does not interfere with the data shown. Moreover, because they serve a supporting role to the graph, legends should be designed to look less prominent than data elements. For example, they should have no borders around them and should be fairly small while still legible. See element H on the figure above.
The title communicates what information is presented in the chart. An effectively worded title removes the need for a subtitle that can visually clutter the chart. Whenever possible, the title should serve as the main takeaway of the data rather than a general description of what is shown in the graph. For example, while “Mobile device breakdown in Houston” informs the reader what the data are, the title “Majority of Houston residents have iPhones” expands that description into a summary of results. Titles are typically placed at the top of the graph and should be positioned closely to the information they describe without interfering with it. See element A on the figure above.
Graphs occasionally include additional elements, such as confidence intervals or data labels, that may not be necessary in every instance. When designing the chart, consider whether these components serve any real purpose, or whether they only add visual clutter.
Tick marks are usually sufficient to reference a data point to its value when graphs are fairly simple and show a small set of data. However, when charts are wide and contain large sets of data, grid lines help guide the eye between data categories and their numerical values. In large graphs, grid lines also help increase precision. Moreover, grid lines can be used when the goal is to highlight small differences between data values. In the cases where grid lines are used, they should be designed to appear less prominent than the data, and used sparingly to avoid visual clutter in the graph. See element L on the figure.
Reference lines and zones
A reference line or zone can provide context for the data by visually comparing it to some predetermined value. Reference lines or zones are especially useful when the goal is to show data deviations from the norm or to highlight that a benchmark was met or surpassed. For instance, when presenting a software usability score (as measured by SUS) to a stakeholder who is unfamiliar with the metric, highlighting the desirable range for such scores will help her meaningfully evaluate the reported results.
Reference lines can also be used to highlight significant events during the period of data recording that may have impacted the results. For example, e-commerce website traffic may significantly increase at the start of a marketing campaign and drop off soon after the campaign wraps up; marking such an event on a website traffic report can help stakeholders make informed business decisions about their marketing strategy and website. See element M on the figure above.
A trend line on a graph shows an overall change in the pattern of data across time or some other variable represented by the horizontal axis. Trend lines can be useful for highlighting a pattern in the data that may otherwise be obscured by individual data points. For example, if a particular stock price varies drastically over a period of time, it might be hard to gauge whether overall its performance is improving. A trend line, in this case, could show a decline, improvement, or no change in price over time. However, if data points themselves already show a clear pattern, then trend lines only add visual clutter and should be omitted from the graph. If trend lines are used, and if the overall pattern of data is the main takeaway of the chart, the trend line should be visually highlighted as more important than individual data. Alternately, if the trend line is secondary to the graph, it should be visually subtle. See element K on the figure above.
Ranges and error bars
In some cases, variations in data values, as reflected by such measures as standard deviation or confidence interval, may be required in the graph. Visually, a range or an error bar is shown as a horizontal or vertical line extending past the data point. Whether such detail ought to be included in the chart is dictated by research methods, statistical analyses, type of data reported, and by the message the graph is expected to communicate. If the range of values is more important than individual data points, then error bars should be visually prominent. Otherwise, error bars ought to be subtle.
Data labels show specific value information for each data point on the chart. The purpose of graphs is to visually showcase patterns in the data and not to communicate numerical precision; tables are much better suited for this goal. Therefore, as tick marks on a numerical scale and grid lines already allow to quickly reference data points to their values, data labels are redundant and add visual clutter to the chart. In the rare case when numerical data are presented on a chart without a quantitative axis, data labels can be used. In such instances, ensure that numbers are rounded up to reduce precision when it is not necessary, and position data labels relative to the data points in a way that makes it easy to read and reference the information quickly without cluttering the chart. See element N on the figure above.
Do’s and don’ts in practice
Many options in terms of chart type, layout, style, and supporting elements are available. Here we illustrate a couple use cases for poor and improved graph design.
Case 1: Bar graph
Axis and chart titles on the right side graph provide focused information about what the data show. Specifically, the good graph specifies that preference ratings were measured on a scale from 0 to 10. Additionally, the title on the right summarizes data in a single takeaway message. As shown on the left side graph, slanted category labels can create a rough visual edge, especially when label length varies, and can be harder to read than horizontally placed labels; whenever possible, opt for the latter. Because only three data points are represented, it is easy to extract their values by referencing the numerical axis. In the example on the left, the grid lines add unnecessary visual clutter and can be completely removed from the graph. Additionally, the scale on the numerical axis can be simplified and include tick marks for every 2 rather than every 1 point. Finally, error bars on the right side graph are less visually salient than on the left, but still provide information about variability in preference ratings in the sample.
Case 2: Line graph
As in the previous example, the graph on the right has more descriptive axis and chart titles that provide focused information about presented data. A reference line on the right indicates occurrence of a business event that provides additional context within which the audience can interpret data. Data labels clutter the chart on the left side and are not really necessary. As shown on the right side, each data point can be quickly aligned with its value by using the grid lines. The scale on the numerical axis can also be adjusted to show larger scale increments. Tick marks on the categorical axis, as shown on the left, serve no purpose and can be removed to reduce visual clutter. Circles on the right graph highlight exactly where on the line the data points are placed, and allow an easy reference between a point and a month it is associated with on the categorical axis. Finally, labeling data directly on the chart removes the need for a legend and reduces visual clutter.
Once the appropriate chart type to present data visually is selected and supporting details about the data are provided, the last step in effective visual communication with graphs is to carefully design these charts. Design ensures not only that the graphs look aesthetically appealing, but also, and more importantly, that the intended message is communicated. For example, color or texture can be used to highlight a particular data value from a larger set. Similarly, a color gradient can emphasize change in data, such as increase or decrease in cost of operations at a business, over time. A poorly designed chart may not only fail to communicate the intended message, but may also mislead the audience. In the business realm, visualized data are used to inform decisions; hence, graphs that are unclear or misleading can negatively impact operations and the bottom line. Successful data visualization stems from the synthesis of appropriate research methods, analysis, and keen application of design principles.