Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta

Box-Cox Transformation is a method to transform non-normal data to normal thus allowing the application of statistical tools on the normalized data.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Natwar Lal on 26th Aug 2020.

 

Applause for all the respondents - Natwar Lal, Sourabh Nandi, Krishan Narula, Aritra Das Gupta, Prasath.
 

Box-Cox Transformation

Featured Replies

Q 291. Let us say, you are dealing with Non Normal data and Box-Cox transformation has been found to have the capability to transform the data to Normally distributed data. For which purposes would you like to use the Box-Cox transformation? For what kind of analyses, will it be inappropriate to use such a transformation? 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Natwar Lal

Convert Non normal data to Normal for capability analysis , Pp, Ppk value. 

  • Solution

Box-Cox Transformation is the most commonly used method to transform non normal data to normal data. It transforms the original data by applying a power to it (usually denoted by lambda). The value of lambda varies from -5 to 5.

 

Why will we need to transform the data?

Short answer to the long theory is because of following two reasons

1. Properties of normal distribution

2. Normality is a pre-requisite condition for parametric statistical analysis

 

If we expect the data to be normally distributed, but it is not, then before we apply the transformation, we should first check for data entry issues. But then most of the times the process data does not tend to follow normal distribution and hence the transformations come in handy.

 

Analysis that can be performed after applying Box-Cox transformation

1. Stability Analysis - one of the pre-requisite for continuous data control charts is that the data should follow normal distribution

2. Capability analysis - the original data will get transformed, however the capability of the process is still usable. If one knows the underlying distribution of the data, then this transformation may not be required, however not everyone knows the multiple types of distributions

3. Regression analysis (or any of its variants) where the residuals are non normal due to heteroscedasticity (i.e. data does not have constant variance)

 

Analysis that should not be performed after applying Box-Cox transformation

1. Descriptive Statistics - there are measures that can handle non-normal data (Median and IQR)

2. Inferential Statistics -  there are non-parametric tests (median tests) that can be performed for non-normal data. These tests do not require one to understand the underlying distribution and are robust enough to handle non-normal data

 

 

 

 

There are a lot of statistically analysis which can only be conducted if the data is normal. When we are dealing with data there are times that the data turns out to be non normal data .hence it becomes extremely important that the data is converted into a normal data so that certain statistically analysis like process capability ,Annona and other such analysis can be conducted.

Box COX transformation is remedial method which can be used for transforming non normal data to a normal data . This was developed by 2 status stations George box and David Cox. They Developed a procedure  Which uses Lambda to transform data into a normal shaped curve. the value of Lambda is the power to which all data should be raised. This is done  hey by searching hey between Lambda =-5 to Lambda =+5 The best value can be found. 

there are certain challenges which are faced when we use box Cox transformation. there are times when the transformation does not result in a normal distribution so it becomes extremely important that after the transformation is done it is chat weather the curve is a normal distribution. box Cox transformation will only work if all the data points are positive and greater than zero. 


 

Box Cox transformation is done only for the Capability analysis of non normal data. Using this transformation convert
the non normal data to a normal data and their specification limits, and then do the capability analysis. This transformation cannot be used for capability study of discrete data

What is a Box-Cox Transformation?
A Box-Cox transformation is a method to remodel non-normal dependent variables into a standard pattern. Normality is a vital hypothesis for many statistical techniques; if the data is not standard, applying a Box-Cox signifies that it can run a broader number of tests. The Box-Cox transformation is coined by two British statisticians, namely George Box and Sir David Roxbee Cox, who researched on a 1964 paper and developed it.


Running Box-Cox the Test
The essence of the Box-Cox transformation is an exponent, lambda (λ), ranging from -5 to 5. All values of λ are estimated, and the optimal value for the information is selected; The "optimal value" is that the one which occurs in the best approximation of a standard distribution curve. The transformation of Y has the pattern:

P2Wk-YVWK2wEK44ova-sqBEW0bULZvrjddd36aryfrf3cnFXZNsvPCG-6fWR4mUZjs-59gpC-8nyqKg31q2JHHp2OGS1kRfXTSFWHgfwFouJ2a3r1GqhKZOGAVqWlvQ04rONib7a

This test only goes for positive data. However, Box & Cox did recommend a second formula that can be adopted for negative y-values:

UgBWXGuYMxYsNvO5YCWR9U6rLxQCf2jBH_gpLtd1w1fijS5PzUemdGN_xt69GxfP3iStpUC_qHPxVvVmpTj3anWoIFGF7_WVIPKvw7YevO992OkmvJMdHL1-YKWxBNsuE-BUfuzJ

The equations are deceptively simple. Testing all potential values by hand is unnecessary; currently, most software packages will include an option for a Box-Cox transformation, including:

  • R: use the command box-cox (object).
  • Minitab: click the Options box (for example, while fitting a regression model) and then click Box-Cox Transformations/Optimal λ.

Most Common Box-Cox Transformations

Box Cox Transformation

 

The relation between Box-Cox and Multiple Regression Analysis?
Box-Cox transformation is an essential tool in Multiple Regression Analysis. Any linear modes assume that the relationship between the response variable Y and the predictor variable X is linear. However, this is not accurate all the time, so when the association between the dependent variable and independent variable is not linear and yet wishes to fit a linear model to the data, consider a Box-Cox transformation method. This will transform the response variable and then fit a linear model to the data to analyze the predictor variable is the effect. The fundamental assumption of linear models is that the error terms are ordinarily distributed. A significant violation of the assumption also leads to committing the type I or type II error.


Use of Box-Cox transformation during the DMAIC process?
Process capability studies are performed during the Measure phase of DMAIC. The primary step for process capability reasoning is to check where the data follows normal distribution or not (like ANOVA).The Box-Cox approach helps to address non-normally distributed information by transforming to normalize the data. However, there is neither guarantee that data follows normality because it does not check for normalcy. The Box-Cox system checks whether the standard deviation is the most insignificant or not. Hence it is always desirable to check the converted data for normality using a probability plot or Quantile-Quantile (Q-Q) plot.

 

An example: 
Figure 1 shows the non-normally distributed cycle time data. Using the Box-Cox conversion in a statistical analysis program renders an output that shows the best Lambda values (Figure 2).

Figure 2: Example of Non-normally Distributed Cycle Time Data

Figure 1: Example of Non-normally Distributed Cycle Time Data

Figure 3: Example Box-Cox Plot of Data

Figure 2: Example Box-Cox Plot of Data

 

The lower & upper confidence levels show that normality's best results were reached with Lambda values between -2.48 and -0.69. Although the best value is -1.54 (given in Figure 2), the procedure works more beneficial when the value is shaped to a whole number; this will make it more straightforward to change the data. The best whole-number values here are -1 & -2 (the inverse function of Y & Y2, sequentially). The histogram in the above Figure 3 shows the transformed data using Lambda = -1, now more normally distributed.

Figure 4: Data Transformed Using Lambda = -1

Figure 3: Data Transformed Using Lambda = -1

 

When does Box-Cox work?
The Box-Cox transformation is not a guarantee for normality. It does not check for normality; the method checks for the smallest standard deviation. The assumption is that transformed data has the highest likelihood among all transformations with Lambda values between -5 and +5 – but not a guarantee – to be normally distributed when the standard deviation is the smallest. It is essential to always check the modified data for normality working on a probability plot.


Furthermore, the Box-Cox Power transformation only goes if all the data is positive and higher than 0. However, this can usually be achieved easily by adding a constant (c) to all data such that it all enhances decisive before it is transformed. The transformation equalization is then: Y’ = (Y+C)l

 

Cautions for the Box-Cox Transformation 

  • John and Draper (1980) showed that the Box-Cox Transformation was not satisfying even when the transformation parameter's best value had been chosen.
  • Doksum Doksum and Wong (1983) indicated that the Box-Cox transformation should be used with discretion in some circumstances, such as failure time and survival data.
  • Author

When to use it and when not to - has been well answered by Natwar Lal and he is the winner for this question. Do check the response by Sourabh Nandi who has explained in detail about the usefulness and caution points. 

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.