The Misrepresentation of Data – How graphs can lie

Square

We rely heavily on statistics and data in order to provide justification to our statements and points. However sometimes, it is this trust that can misused and instead manipulate the reader into believing what may be false where the conclusion that is derived is far different from the actual truth. With the importance of data ever-growing in this rapidly evolving digital world, it can be deemed vital to be familiar with how statistics and data can become misleading.

There are many different varying methods that get employed in order to convey a specific message to the audience which may be a result of emphasising or diminishing the importance of some parts of that data. These range from manipulation of the y-axis to the hiding of relevant data and the obfuscating of data among many others. There doesn’t necessarily have to be intent and purpose behind these methods, and can be indeed accidental, however misunderstandings can be created from either regardless.

The first method that we discuss from which misleading visuals can be created is by manipulating the axis either by using a truncated graph or by expanding or compressing the scale of the axis. A truncated graph is where the y-axis does not begin at 0, and this can lead to the impression of making a fairly insignificant change look like a very large difference. It has the effect of emphasising the importance of a relatively little change and thus causing the viewer to overestimate it. Furthermore, by expanding and compressing the axis, changes and trends in the graph can seem more or less significant then they truly are.

In the example above, the same set of data is used. However, in the graph to the left it can be seen the y-axis has been truncated, and instead of starting at 0%, it starts at 48.50%. This leads to the change in percentage between either groups seeming to be much greater than in reality. The actual difference between the two groups is a mere 1.80%, however the inclusion of the truncated axis makes it look much larger. On the other hand, in the graph to the right, the y-axis has instead not been truncated but starts at 0%. In this graph, the difference between the two groups can be accurately seen and it shows just how little difference there is between the two.

The next method we move onto is by “cherry picking” data. This involves including only certain points on the graphs and disregarding potentially relevant points that could completely change the story and trends the graph is showing if they were to be included. In this way, the viewer may be mislead to believe that the data concludes to a certain trend, where instead if all the points were to be included, a completely different trend would be shown.

In the example above, again the same set of data has been used. However, in the graph to the left, some of that data has been omitted from the graph, thus leading the viewer to believe that there is a smoother, linear positive trend in the data where instead this is not the case. It can be seen in the graph on the right, that there points at which there are negative outlooks. These may get removed in order to provide a more favourable visual impression to the reader and make it seem as if there has been a positive correlation throughout.

The third and most common method we move onto which is used is distorting the data using graphic forms. This can be either using the wrong graph format which is inappropriate for the type of data, or by displaying the data in a way that does not abide by conventions. In our daily lives, we are more susceptible to encounter this method of misrepresentation more than others. Examples of this method involve making the use of the 3rd dimension where it is unnecessary to do so as it can mislead us through the perspective of which we look at the graph from. Furthermore, it can also be done by over complicating the graph which can obfuscate the data and increase the difficulty to interpret the graph which is contrasting to the purpose of graphs themselves which is to allow easier interpretation of statistical data.

Having now covered some of the methods in which graphs can mislead us, we can move onto finding out how we can identify and quantify how distorted a graph is. One way in which this can be done is through the use of a method known as Lie factor. An equation is given below to calculate the Lie factor of a graph:

Where the size of the effect can be calculated using the following:

A graph with a lie factor which is above 1 means that it is exaggarating the change and differences that are contained within the data, whereas a lie factor below 1 shows that the change in the data is being diminshed and obscured. An accurate graph which contains no distortion should have a lie factor of 1.

We must always be aware of all the visual representations around us and ensure that we do not derive false conclusions from potentially misleading graphical impressions.


References

  1. [Featured Image] https://www.google.com/url?sa=i&url=https%3A%2F%2Fcutewallpaper.org%2F21%2Fwallpaper-graph%2Fview-page-21.html&psig=AOvVaw1OEjN_A4XpIsRTJwUNozCW&ust=1591015682679000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCLC3h8SR3ukCFQAAAAAdAAAAABAm
  2. Cairo A, “Graphics lies, misleading visuals”, New Challenges for Data Design (pp. 103-116), 2015, http://infovis.fh-potsdam.de/readings/Cairo2015.pdf
  3. Ryan Mccready, “5 Ways Writers use Misleading Graphs to Manipulate You”, Venngage, Apr. 17, 2020, https://venngage.com/blog/misleading-graphs/

Author

Leave a Comment