Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Rootogram is a graphical tool introduced by John Tukey in 1971 that is used to visualize how closely a data set follows normal distribution. It uses the square root of the frequencies (or the residuals) rather than actual frequencies as in a histogram. This enables easy visualization of the deviations in the tails.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Johanan Collins on 27th Jan 2022.

 

Applause for all the respondents - Johanan Collins, Manish Manjhi, Sanchita Roy, Afzal Wadood.

Featured Replies

Q 440. Rootogram is a modified version of the good old histogram. What are the advantages of working with a rootogram? Explain using an example.

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Johanan Collins

  • Solution

John Wilder Tukey an American statistician and mathematician developed the Rootogram. He is also known for the Fast Fourier Transform algorithm, the Tukey Lambda distribution, Tukey test of additivity, Tukey range test and the Teichmeller-Tukey lemma.

 

Oxford Reference Definition

Oxford Reference defines a rootogram as “a diagram suggested by Tukey in 1971, for comparing an observed bar chart or histogram (with equal-width categories) with a theoretical probability distribution. The comparison is made easier by ‘hanging’ the observed results from the theoretical curve so that the discrepancies are seen by comparison with the horizontal axis rather than a sloping curve. As in the rootogram, the vertical axis is scaled to the square root of the frequencies so as to draw attention to discrepancies in the tails of the distribution.”

 

R Package Documentation Definition

The R Package documentation describes the rootogram function to “graphically compare (square roots) of empirical frequencies with fitted frequencies from a probability model.” “Rootograms graphically compare frequencies of empirical distributions and fitted probability models. For the observed distribution, the histogram is drawn on a square root scale (hence the name) and superimposed with a line for the fitted frequencies. The histogram can be “standing” on the x-axis (as usual), or “hanging” from the fitted curve, or a “suspended” histogram of deviations can be drawn. “

 

Paper of Use of Rootogram for Count Data

A rootogram is a visual tool that was initially used by Tukey to assess the goodness of fit of univariate distributions. Christian Kleiber of the Universitat Basel and Achim Zeileis of the Universitat Innsbruck in their paper “Visualizing Count Data Regressions Using Rootograms” have used rootograms to look at issues such as overdispersion, excess zeros in regression models for count data. Count data regression plots are done in the form of bar plots of the expected and observed frequencies. Rootograms are used to see the fit of both continuous data and count data.

 

Rootograms compare the observed frequencies using bars (histograms) and the expected frequencies using a curve on a square root scale. Taking the square root scale transforms the date to adjust to the scale differences across the intervals. This makes the deviations across the interval for smaller observed/expected frequencies to be more visible in the plot.

For example, the deviations of 9 as comped to 3600 would only be 1:400, however, the square of the numbers 3 and 60 is 1:20. This is a visual magnification of 20 times. 

 

There are three types of rootograms. The standing rootograms show the bars and a curve. In this, the deviations are not aligned. The standing rootogram is the least used as it just plots the bars and the curve representing the model, however, the fit is not shown. The hanging rootograms align all the deviations along the horizontal axis. The bars are hanging from the curve representing the expected frequencies whereas the suspended rootogram shows mainly the deviations as against the observed frequencies. The hanging and suspended use the horizontal reference line which shows the deviations between the observed and expected frequencies.

Example of Rootogram for Poisson Distribution and Negative Binomial Distribution

image.png.82b64b629ceafe90ad1d79b836933be9.png

Analysis of above Rootograms

Rootograms are used to detect patterns such as runs of positive or negative deviations. The top row of the figure above shows only small deviation when fitting a Poisson model to Poisson data. The expected frequencies and observed frequencies show minimum deviation. In the bottom row of the figure above shows large deviations when fitting a negative binomial distribution. The expected frequencies do not track the observed frequencies.

 

References

Kleiber, C, Zeileis, A. (2016). Visualizing Count Data Regressions Using Rootograms. American Statistician, Volume 70, Issue 3, Pages 296 to 303

 

Oxford Reference (https://www.oxfordreference.com/view/10.1093/oi/authority.20110803095919378

 

R Package Documentation

https://rdrr.io/rforge/topmodels/man/rootogram.html

We all know about our good old histograms - having bar charts with continuous numeric axes. For e.g., here is a simple histogram of transaction wise freight variation:

 

Truck Freight Distribution Histogram

image.png.1357423e6d531a1aa3b37c252a827dd7.png

 

X- axis is represented as freight cost bucket and y axis is represented as no. of transactions.

Now to understand the overall distribution pattern i.e., we will overlay the histogram with a normal distribution curve on the top.

 

Truck Freight Distribution Histogram

image.png.06eded4cfe60785dbd45ed88a6789a44.png

 

Now if you look above for the overlaying of the distribution curve and the histogram, it is obvious that line graphs that is overlapping the bar chart is not be flat, hence difficult to approximate the horizontal midpoint of the bar is:

image.png.1d9fd90d0a697f3f8c5bc3d54927e8e3.png

 

And thus, to solve this visualization challenge and to display data in such a way that interesting features will become apparent Tukey proposed this Rootogram also known as Tukey’s Hanging Rootogram.

 

Truck Freight Distribution Histogram

image.png.8bf91a5d08cd11e25066eb88fefde002.png

 

Now you can see difference become much easy to estimate, since the bars are hanging from the curve and using X-axis as flat line for comparison.

 

One more critical point related to Rootogram, is that it plots the square roots of the number of observations observed in different ranges of a quantitative variable. Here the requirement of using square roots is to equalize the variance of the deviations between the curve and the bars, which otherwise would increase with increasing frequency.

A rootogram is a data visualization technique to summarize a distribution of a variable. It has the frequencies in the Y axis and the response variable on the X axis. The frequencies are square root or relative frequencies. Rootogram can be for absolute count, relative rootogram converts counts into proportions, cumulative rootogram and cumulative relative rootogram. Its variation to the histogram, bars are plotted for observed frequencies and a curve for the fitted frequencies all on  square-root scale. Overlaying the distribution curve tell us how an actual histogram differs from a distribution estimate.  Mathematician John Tukey noted that the difference of comparing the distribution of data with a theoretical distribution from an ordinary histogram can be difficult because small frequencies are dominated by the larger frequencies so it difficult to understand the pattern of differences between the histogram bars and the curve.

Advantages: The data visualization becomes much better if we use hanging bars- from the fitted curve, or a "suspended" histogram of deviations can be drawn. ‘hanging’ the observed results from the theoretical curve is drawn, so that the discrepancies are seen by comparison with the straight reference line at zero (horizontal axis) rather than a sloping curve.

 

image.png

Image courtesy: andrewpwheeler.com

 

 

Similar to histogram , Rootogram is a graphical tool to visually depict the distribution of the variable data. One of the limitation of histogram is that it is difficult to can be difficult because small frequencies are dominated by the larger frequencies and it is hard to understand the pattern of  histogram bars when compared with the distribution. Comparison can become easier by ‘hanging’ the observed results from the theoretical curve, and  that way the discrepancies are seen by comparison with the horizontal axis rather than a sloping curve. Here vertical axis is scaled to the square-root of the frequencies so as to draw attention to discrepancies in the tails of the distribution.

 

Reference: https://datavizproject.com/data-type/rootogram/

Brilliant explanation provided by all respondents. Best answer was provided by Johanan Collins. His answer also highlights the difference between the 3 variations of Rootogram.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.