YiddisheViz
Excellence in statistical graphics is the communication of complex ideas with clarity, precision and efficiency. Graphics should reveal the data.
The principles of excellence are:
Unfortunately graphics can be used to distort the truth. Data graphics are no different from words in this regard, any form of communication can be used to deceive (luckily lies are systematic and quite predictable, nearly always exaggerating the rate of recent change). In any case, graphical excellence begins with telling the truth about the data.
A graphic does not distort if the visual representation of the data is consistent with the numbers its representing. The amount of distortion can be measured by the following formula:
Lie Factor= size of the effect shown in the graphic / size of the effect in the data
Another important principle in graphical integrity is context. Each part of a graphic generates visual expectations about its other parts and these expectations often determine what the eye sees. For example a time line moving in 1 year intervals is expected to continue like that, even if there are some larger or smaller intervals thrown in too. Therefore show data variation, not design variation. At its core the principle is, graphics must not quote data out of context.
Unfortunately, nearly all those who produce graphics for mass publication are trained exclusively in the fine arts and have had little experience with the analysis of data. When producing graphics a general dislike of quantitative evidence, and contempt for the intelligence of the audience – guarantees graphical mediocrity. Substantive and quantitative expertise must also participate in the design of data graphics.
Graphical sophistication comes when graphs are relational. Such a design will link two or more variables (excluding time, In other words everything changes over time. Just a time series by itself is not a good explanatory variable, chronology is descriptive, not a cause and effect explanation). This is essential for competent statistical analysis since it confronts statements about cause and effect with evidence, showing how one variable affects another. A good application of this is to add a spatial dimensions to the design of a graphic, so that the data is moving over space as well as time (for example a ‘small multiple’ graph).
Statistical graphics are instruments to help people reason about quantitative information. The fundamental principle of good statistical graphics is above all else show the data. This can be measured by the Data-Ink factor. This is the non-erasable core of a graphic, the non-redundant ink arranged in response to the variation in the numbers it is representing. When designing a graphic you should aim to maximise your Data-Ink Ratio:
Data Ink Ratio= Data Ink / total ink used to print the graphic
Therefore one should aim to erase non-data ink, within reason, to maximise the data ink ratio. In fact you should even try to erase redundant data-ink.
That being said, there remains many other considerations in the design of statistical graphics that might mean you should not erase– not only efficiency, but also complexity, structure, density and even beauty.
Interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. This can be referred to as chart junk, and should be removed wherever possible. Example include using the moire vibration, or grids where unnecessary (if it is necessary it should be muted, lest it compete with the data)
The same ink should often serve more than one graphical purpose. A graphical element may carry data information and also perform a design function usually left to non-data ink. The principle is then, mobilize every graphical element, perhaps several times over, to show the data.
Data graphics can also have multiple ways of reading them. Central to maintaining clarity with complex data are graphical methods that organize and order the flow of information presented to the eye. How can the data information be arranged so that the viewer is able to peel back layer after layer of data from a graphic? The answer is multiple viewing depths and multiple viewing angels. Depths include:
Graphics should be reserved for the richer, more complex more difficult statistical material. If something can be summarised in one or two numbers, a graphic is not needed. Tables usually outperform graphics in reporting on small data sets of 20 number or less. A good example of expressing complex data graphically are data-maps, which place millions of bits of information a single page before our eyes.
An empirical measure of graphical performance is data density.
data density of a display = number of entries in a table / area of data display
High-density displays can be genuinely interactive, allowing viewers to select, narrate, recast, and personalise the data for their own thinking. Thus the control of information is given over to the viewers. Simple things belong in tables or in the text; graphics can give a sense of large and complex data sets that cannot be managed in any other way. The principle is maximise the data density within reason.
One way to achieve this is to shrink down the graphics, leading to a powerful technique known as the small multiples. These resemble the frames of a movie, a series of graphics showing the same combination of variables, indexed by changes in another variable.
The basic structures for showing data are the sentence, the table and the graphic. The conventional sentence is a poor way to show more than two numbers because it prevents comparisons within the data. Tables work well when the data presentation requires many localized comparisons.
Words and pictures or tables belong together. Viewers need the help that words can provide, therefore words on data graphics are data-ink. The principle of data-text integration is, data graphics are paragraphs about data and should be treated as such. Therefore tables and graphics should be run into text wherever possible. A word of caution is necessary, for graphics that are meant for data exploration, words should tell the user how to read the design, not what to read in it.
Another point worth mentioning is that the use of two or three varying dimensions to show one-dimensional data is a weak and ineffective technique (for example the dreaded 3D bar chart). The number of information carrying dimensions depicted show not exceed the number of dimensions in the data.
Be wary of colour coding. The translation from visual to verbal needs to be quickly learned, automatic, and implicit – so that the visual image flows right through the verbal decoder initially necessary to understand the graphic. As Paul Valery wrote “Seeing is forgetting the name of the thing one sees”. Colour often achieves the opposite of this effect, attempts to give colour order results in those verbal decoders and mumbling of little mental phrases. Grey scale does have a visual hierarchy, and is therefore much better.
Other design choices that help readability include: