Graphing and the Visual Presentation of Data
by Justin Danial Meyer
version: 09Sep05

Part 1: Introduction and Basics
Part 2: Graph Elements
Part 3: Graph Types
Part 4: Putting the Graph Together
Part 5: Conclusion and Other Resources

 

Tables: Data and Figures

While graphs and charts are good for presenting and analyzing many types of data, sometimes a good table is the best way to go. This is commonly the case with text-based data points: the elements, sections of a class, rating a set of products over several categories, a project risk analysis matrix for the boss, et cetera.

The use of a table permits several layers of information to be grouped at one level for use by the viewer. In the example below, E-321 Section Schedule, Fall 2002, we have section name, TA, section status (gray = full, yellow = open), time, and date. We also have some additional data in a related secondary table below the first. That's 5 fields of data for each class/point on the table. The format is also desirable, because each of the viewers can compare the master schedule to theirs and easily extract the data points they wish to have in their personal schedule.

The periodic table is another example where several levels or "dimensions" of data are compressed to 2-D, while still maintaining the clarity and usability required. If there is a large amount of information, as in the periodic table, it is not clear a table is best--especially since some of the data lends itself to visual representation on a graph. In situations where either a table or a chart may be used, and if it is not clear which has the advantage, remember the viewer. What format would the viewer prefer for use and extraction of relevant data?

Tables can be highly stylized as in the E-321 Section Schedule, Fall 2002 example below, or they can be very plain. Consider the amount of usage, # of viewers, type of conveyance, and audience. If a table (such as a schedule) will be reused regularly, put a little extra effort into its appearance, and try to think ahead to future needs--it will be evident to the viewer and future users.

 

Axes and Scale(s)

In order to convey the story told by the data, a series must be constricted to a system of order. The most common and useful is the simple 2-D, x-y plot. A constant variable (sometimes considered the "input") is assigned to the x-axis, and the resulting values are placed on the y-axis scale (sometimes referred to as the "output" of the function). To each axis is assigned a scale of values and units to provide meaning and reference. Very common x-axis reference scales include time, date, distance, position, quantity, and catagory. Common y-axis measurement scales include energy, intensity, quantity, frequency, monetary value, and velocity.

Just as there are a variety of units typically used on the scales, there are a few common scale 'types'. Linear is obviously the most common scale type, by far, but other types include log/exponential and inverse (1/x). These scale types manipulate the data in such a way so as to make interpretation easier and more accurate. For example, as in [insert graph and title here], the x-axis is a base-10 log scale, with the resulting near-linear data confirming that the series exhibits a logarythmic relationship (time to ?????????). In fact, if you consider for a moment, these scale "types" are little more than mathematical operations performed on the x-values (and sometimes y-values) of the entire series. One could easily substitute any function needed...from 1/x to ln(Xfinal/xinitial) to xlog(x)+45x...or any formula. [As is done in an 'Arrhenius ' graph.]

When setting up your axes, strive for clarity. Pick an appropriate range of data to show, as not all of the series may be needed. If trends for two or more series are being compared, chose a y-axis scale which brings out the similarities. This can usually be acheived by changing to a log scale, or by mathematical transformation of one or more of the series. This consideration will prevent one or more plots from over-powering each other.

Once the scales are chosen, care must be taken to provide the viewer with the most accurate representation possible. While some graphs may lend themselves to range clipping (removing large sections of scale 'unused' by data points) and other types of scale manipulation, these should generally be avoided unless absolutely necessary. Other manipulations, such as multiple scales (on the y-axis) with different zero points should not be used unless absolutely necessary, and if you must, point this out and make it clear to the viewer.


Units

Units are critcal to any set of data and the numbers within. Without units, the data has little to no meaning. In almost every instance, it is absolutely necessary to include units when reporting a value: on an axis, labeling a data point on the plot area, et cetera. When reporting data, it is also important to use units which are standard for the case, if applicable. If no particular unit is standard, choose units which are sensible, and are part of the SI unit system.

Always include units, even if they cancel out (meters/meters) or are arbitrary (Arbitrary Units). There should be no doubt in the viewer's interpretation or mind.

 

Charts and Diagrams

Not all information is strictly numerical, although most may be reduced to such. Some information is best left at a higher level. The Integrated Learning Environ. for Interface Design diagram below depicts the relationships between individuals, research groups, and entities outside Stevens. This information would be significantly harder to interpret if it were to be reduced to binary information (1 = connected, 0 = not connected).

 

Labeling

Data Points:
In some instances, it is desirable to point out features or explain abberations and errors. If one were to create a graph showing power out over frequency, it might be necessary to label the top two or three points with the power amount (Y-axis value) and the frequency at which it was acheived (X-axis value). The text of these labels should be the same size at the scale labels. They sould also be fully formatted: "42kW at 14.25MHz", not listed as a data point (42, 14.25).

Series:
In most cases, a legend is sufficient to inform the viewer of the function of each plot on the graph. In some cases, as in the X-Ray Spectra Overlap (Oxford EDS) graph, the series should be labeled directly. Most often this is required if a graph has many series (over 7) which are not visually distinct in character.

General:
Acronyms should be avoided, unless they are widely known to you audience. If necessary, they may be used to save space, but a reference must be nearby.

 

Format

When laying out your graph, use the available area, a full page if possible. Table 1 lists good text sizes for various graph elements on a graph using a full page. When the graph is smaller, use you judgement, but try to maintain the ratios here. For a 1/2 page plot, these values should not be cut down to less than 60% for readability reasons. A graph, unless very simple, should never be sized less than 1/4 of a piece of letter paper.

Table 1

Graph Element
No Smaller Than
Good Size

No Larger Than

Axis Scale Numbers ( E )
8 pt
12 pt
16 pt
Axis Title ( D )
12 pt
14 pt
16 pt
Data Point Label ( M )
6 pt
10 pt
12 pt
Legend Text ( I )
10 pt
12 pt
14 pt
Label ( M )
10 pt
12 pt
14 pt
Trend Line / Plot Eqn. ( O )
10 pt
12 pt
14 pt
Title ( B )
16 pt
18-24 pt
30 pt

 

How much is too much?

When presenting your idea(s), it is very easy to go overboard and include far more information than necessary. When considering just how much detail to include, you must consider both your audience and the method of conveyance--how long you have to get your point accross with the graph. We will discuss audience later, but essentially, the less time the viewer has, the cleaner and simpler you graph must be.

Too much data or extraneous information and you will begin to confuse your viewer and obscure your message. In the E-321 Section Schedule chart above and the Financial Software Market pie charts below, the data are kept pretty leanand clear. These graphs convey a simple message and serve a single purpose with little effort required by any viewer.

Now all rules can be broken under the right circumstances: if you expect a question or need to explain something, it can be very useful to include a statement or few extra data labels on your graph. The impact of this extra and possibly distracting information my be reduced by reducing the font size, placing it outside the plot area, or changing the text color to a low-percentage grey scale (25-50% is often good).

A good example of secondary-level data may be found in the Integrated Learning Environ. for Interface Design chart above. The "top" level shows the relationships between people and organizations, with further detail showing projects, affiliations, and task area(s). This extra detail is given its own level of presence by coloring, shading, and grouping effects.

Some graphs must--by their very purpose--contain large amounts of data or series, as in the X-Ray Spectra Overlap (Oxford EDS) graph below. This graph serves a very specialized function--to aid in the identification of peaks from energy Dispersive Spectroscopy analysis. While you do not need to understand the technique, it is pretty simple to use. In the analysis, some energy peak is observed during analysis of a material, and its center (plus error) measured and plotted on this graph. This creates a range similar to the five common windows (Ni, Al, Hf, Cr, Co) already on the graph. Then, elements with overlapping energies (on the y-axis) may be determined, and correlated with each successive peak. In this case, ease of use and completeness of the data set are the most important factors for the viewer, not instant comprehension.

The relationships between the graph and the viewer, the time available for conveying the message or use of the graph, and the level of understanding of the intended audience will govern how much information and detail are included in your graph.

 

Drawing Elements

There are some cases where your graph is just right but the concept is not easily recognizable, or you have the minimum amount of information to tell your story and it still looks congested, or a detail just can not be made any larger without other unacceptable concessions. In this and related cases, there are many little drawing "elements" one can use to guide the viewer and provide clarity in conveying the concept.

There are several typical elements, which serve different visual and interpretive functions. For drawing attention to small items/features, an arrow is most suitable. Make the arrow easy to see, slightly longer than it is wide, and not overpowering. Color the arrow to stand out, or be considered with similarly colored items, if desired. If you need to label several points or features which are located closely, the labels may be kept spaced apart 1-2 inches away, with thin lines going from the base of each label to almost touching the corresponding feature. An example of this may be observed in the Effect of Employee Dress on Sales Revenues: FY 2004 graph below. The more labels you wish to include, the further away and more staggered they will need to be. In some cases you might like to make a comparison to a standard. In the Rapid Crystallization of MOCVD Al2O3 graph, there are no gridlines. On the x-axis, however, there are gray lines which denote the most intense alpha-Al2O3 peaks in a standard sample pattern. [These lines are easier to see on the large, full-resolution version of the picture.] This permits ready comparison of each scan to the expected pattern of the known standard.

Whichever you choose, how and where you implement each element will decide its usefulness and effectiveness. A good set of guidelines for drawing elements: make them slightly stylized, but minimal (like a font with small serifs); do not color text unless the font is thick; keep the graph's appearance orderly; and sometimes an inset window showing detail is best.

 

Appendices

It is often the case one has more to convey than may reasonably be put on a graph. There may also be related information or calculations which are relevant and should be provided for the viewer, particularly in a report, paper, or other scholarly work.

Typically, a guide to data analysis and calculations is included as an appendix, if it is more than a paragraph or contains many formulae or transformations. Another use is to include a graph depicting details or a different interpretation (maybe to disprove a point or show completeness of analysis). The third common use is to print out a table of the data captured and used.

When required (by company policy, for example), including the full data set is obviously acceptable, but if it is not required, use your judgement. Generally, if the data exceeds 5 pages or 25% of the report, it is probably not worth it. [Don't forget you can size it down, too! Few people will read the data, so 8 point font or printing 4 pages on one to save space and paper is much preferred.] For lab reports at the college level, most data series which exceed 2 pages are ususally not included in a lab report. There is no point in killing another tree when the analysis is what counts.

 

Plot Area / Legends

Customizing the plot area and legend is probably the easiest step, but one that is given either cursory consideration or ignored completely. This is unfortunate, as its size means it often has the largest impact on the presentation of a graph to the viewer.

The basic parameters controlled by/within the plot area are plot/series size, background color, and size and utility of the legend. First off, the maximum available area should be used for the plot area, without encroaching on the axis and graph titles. The meat of any graph is the data, so as much room as possible should be provided for it. Taking a step back for a moment, the area the entire graph occupies is strongly related to that of the plot area. On any scholarly work (report, presentaion or paper), the graph should be accorded an appropriate amount of space. For very simple, single-series graphs, a size of ~1/4 page is fine. For multi-plot graphs, ~1/2 page should be accorded. For very detailed or involved graphs, the full page or screen should be used (excepting margins, title, et cetera, of course). For company reports, graphs should be either 1/2 page or larger, as you don't want to make the boss strain to read your graph. For college lab courses, I have _never_ heard a TA or professor complain that a well-formatted graph was too large, so I would encourage the use of at least 1/2 a page for all but the simplest of graphs.

 

Effects (Eye Candy)

With the devolopment of 'office' software suites and graphing "wizards", a significant number of special effects (a.k.a. "eye candy") have been put at the finger tips of users everywhere. Some of these graph and chart "options" are absoutley useless.

Some of the items I have observed or encountered include: cones/cylinders/tubes/pyramids instead of bars, perspective drawing on 2-D plots, graded fill color on bar plots, plot area shading (gray background) or a 'watermark' photo, and, of course, clip art. Most of these have little to no value or place in a graph used for engineering, science, or business. Some of them even look nice, but are generally not worth the time and effort to implement. Even if they are done right, they will usually detract from the viewer's interpretation of you message. [Some actually visually distort the representation of your data--see the statistics section.]

 

Audience

Just how much detail is warranted or needed depends on whom you expect to be viewing your graph. Different levels of detail in the title, error bar inclusion, and data accuracy will be necessary for the general public versus classmates / professors versus research journal readers.

Gerneral Public
Nice, clean graph with no abberations
limits: 2 series, 1 concept

College Educated
Abberations permitted, use footnote explanation if necessary
limits: 2-4 series okay, but must be easy to read, 1-2 concepts

Business / Managment
Keep to conveying the overall picture
Want to have cause and effect spelled out for them
limits: 2-3 series, 1 concept (2 if related)

Engineers / Scientists
Want to understand the relationship between cause and effect, check the details
limits: as many series as needed without looking cluttered, 1-3 concepts

Journal / Conference
Want to check and understand concepts and underlying interpretation
limits: 3-5 series, 1-3 concepts

 

Part 3: Graph Types

 
Copyright (c) 2005 by Justin Daniel Meyer. Material on this page subject to this reproduction agreement.