Correlation does not prove the cause-effect relationship between two variables. Why do we still use it in root cause analysis? Please answer in your own words. Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday. All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/ Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time. The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.

Correlation projects the statistical relationship between two variables whereas Causality prompts towards the cause of an event. Correlation does not imply causation and it only provides a possible lead for an analysis. If a plot of two variables in a manufacturing setup is looked at, defects produced against month of the year. They could be a positive correlation (directly proportional), negative correlation (indirectly proportional) or no relationship between the two variables. However if one has to arrive at a causation of one variable on the other, here are the possible investigation patterns Defects have cause and effect exchange with the month of the year Months of the year should have an exchange with the defects caused (reverse causation) Defects in a particular month might influence other months delivery( reinforcing) Defects and month of an year might not be related directly but an unaccounted factor ( e.g leadership) could be causing both. For an investigation be it in problem solving(DMAIC) or problem avoiding(DFSS), Correlation could be the only data available to tell the story between two variables. The study mode is less reliable but indicate a pattern ( Employee satisfaction measured against companies offering cars when compared to companies offering incentives) Correlation among variables offers good start for basic trees and allows to break presumptions in an investigation ( Black cars plotted against road accidents) Sometimes it might not be practical to collect causal information such as data from the clinical trials of a drug. The correlation between the factors( blinded, groups, sites) and the environment ( phases, diseases, medication)should be validated with certain significance level and power( conforming regulatory bodies and pharma companies). Causal data is expensive and time consuming to compile. (Call audits of the entire lot in a center to determine customer satisfaction) In order to bark up at the right tree, a researcher has to recognize the assumptions, evaluate arguments and draw conclusions.

Correlation

October 10, 20178 yr

Correlation does not prove the cause-effect relationship between two variables. Why do we still use it in root cause analysis? Please answer in your own words.

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/
Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time.
The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.

October 11, 20178 yr

Correlation helps is determine degree of two variable. It helps us to determine quantitative relationship between two variables

October 11, 20178 yr

Despite the fact that correlation does not necessarily mean causation, exploring and where relevant, identifying correlations still remain a key step in the improvement project life-cycle.

Exploring correlations of the observed effect with potential causes is a preliminary step in identifying root causes. This shortlists the potential causes from the huge universe of causes, leaving the project team with a reduced list of potential, most-likely causes to be investigated further. Without this shortlisting, the project team can get drowned in the sheer number of potential causes, the recovery from which can use up valuable project time.

Further, when people are introduced to and are being trained in structured problem solving, correlations are a good route for inducting them into a "cause-and-effect" mode of thinking. The logic of a relationship between a potential cause and an effect, summed up as correlation is relatively easy to understand as examples from all walks of life can be quoted and when followed by quantified correlation, further embeds the understanding of correlation in the minds of people. This is also applicable when training people on data based decision making and data driven improvements.

Additionally, the coefficient of correlation is a relatively easy-to-understand measure and can be used to illustrate both positive and negative correlation. When combined with visual tools like Scatter Diagram which are easy to create on popular spread sheet applications, the concept of correlation becomes even easier to understand.

As an extension, the concept of interactions between potential causes, resulting in varied impacts on the effect can also be well illustrated and understood by correlation analysis.

In summary, while correlation does not imply causation, causation typically displays correlation, making correlation an essential step in the root-cause analysis process.

October 11, 20178 yr

In healthcare sector; our ultimate goal is to fix the problem, reduce risk, and keep our patients, patients' families and safe safe.

Their is a specific correlation between the Incident and the related Causal Factors, the Root Causes for which corrective actions are recommended. Once we have the root cause, we can work on the corrective actions to fix the root cause problem.

A cause that produces an effect, or that which give s rise to any action, phenomenon or condition for example: if a change in X produces a change in Y than the X is said to be the cause of Y. Every cause itself is the result of some prior cause or causes.

Two variables may be found to be causally associated depending on the study. If two variables are found to be either associated or correlated, that doesn't mean that cause- and -effect relationship exists between the two variables.

In conclusion, if we choose the corrective action first followed by a cause that justifies, in that their is a specific relationship between them.

October 11, 20178 yr

Correlation does not prove the cause-effect relationship between two variables. however, still use it in root cause analysis because it helps us to identify if there is a relation of impact between two variables.

October 11, 20178 yr

Though Correlation does not prove the cause-effect relationship of two variables, it still is the first good step to ensure that there is some kind of relation between the variables. Either positive or negative relationship between variables can be known. It also provides details on the strength of the relationship between the parameters. This plays a prominent role when root cause analysis is done, so that the confidence level will be high for applicability of the causes identified for the problem. Else going only by root cause analysis method would bring in subjectivity. Correlation helps in quantitatively establishing the relation before further root cause analysis is done.

October 11, 20178 yr

Correlation is one of the most common and useful statistical evaluation tool. A correlation is a single number that describes the degree of relationship between two variables. The study of how variables are correlated is called correlation analysis. It is used when one wants to establish if there are possible connections between variables. It is often misunderstood that correlation analysis determines cause and effect; however, this is not the case because other variables that are not present also may have impact on the results. It means that change in one variable may cause change in the other variable and cause alterations over a period of time.

An example of data that has a high correlation:

· Calorie intake and your weight.

An example of data that has a low correlation (or none at all):

· A dog’s name and the type of dog biscuit they prefer

This impact may be either positive or negative. Positive correlation is when the increase in one variable simultaneously causes increase in the value of another variable. Negative correlation is when increase in the value of one variable causes decrease in the value of another variable.

A correlation coefficient puts a value to the relationship. Correlation coefficients having value of between -1 and 1. A ‘0’(zero) means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation between variables. Following is the graphical representation of correlation.

image.png.2999b09b2ef575f2a1337f3cd203454f.png

The coefficient thus worked out can help predict future trends between the two variables and allows business to use these statistics for budgets and business plans. Root cause analysis should be carried out to recognize the root causes that are responsible and use consensus decision-making to select the right solution.

October 11, 20178 yr

It is widely quoted that "Correlation does not imply causation". However, over emphasis on this statement often leads us to overlook the fact that the correlation might be due to a reason. i.e. it is more than mere coincidence.

The real clue could be that there is some underlying cause which is common to both the events and that is the reason why both are moving hand-in-hand. Consider an example from a study which found a high correlation between consumption of ice cream and death by drowning. Common sense would suggest that consumption of ice cream cannot lead to drowning. However, if one wanted to do a proper root cause analysis of the drowning deaths and find ways to reduce them, it would be informative to explore the correlation with ice cream consumption.

The real reason why both are correlated is that they are both related to the weather. Ice cream consumption increases significantly during summer. Similarly the hot weather leads to more people going for a swim, in turn leading to more number of drowning cases. Of course, there would be other causes of drowning which are unrelated to weather like the person not knowing how to swim! But that would not explain why more people don't know how to swim in certain months compared to rest of the year!!

Thus exploring the correlation can give insights into the true root causes, which can lead to better targeted actions. In this case, it could be appointing more life guards during the summer months as compared to the rest of the year.

October 11, 20178 yr

Correlation projects the statistical relationship between two variables

whereas Causality prompts towards the cause of an event. Correlation does not imply causation and it only provides a possible lead for an analysis.

If a plot of two variables in a manufacturing setup is looked at, defects produced against month of the year. They could be a positive correlation (directly proportional), negative correlation (indirectly proportional) or no relationship between the two variables. However if one has to arrive at a causation of one variable on the other, here are the possible investigation patterns

Defects have cause and effect exchange with the month of the year
Months of the year should have an exchange with the defects caused (reverse causation)
Defects in a particular month might influence other months delivery( reinforcing)
Defects and month of an year might not be related directly but an unaccounted factor ( e.g leadership) could be causing both.

For an investigation be it in problem solving(DMAIC) or problem avoiding(DFSS),

Correlation could be the only data available to tell the story between two variables. The study mode is less reliable but indicate a pattern ( Employee satisfaction measured against companies offering cars when compared to companies offering incentives)
Correlation among variables offers good start for basic trees and allows to break presumptions in an investigation ( Black cars plotted against road accidents)
Sometimes it might not be practical to collect causal information such as data from the clinical trials of a drug. The correlation between the factors( blinded, groups, sites) and the environment ( phases, diseases, medication)should be validated with certain significance level and power( conforming regulatory bodies and pharma companies).
Causal data is expensive and time consuming to compile. (Call audits of the entire lot in a center to determine customer satisfaction)

In order to bark up at the right tree, a researcher has to recognize the assumptions, evaluate arguments and draw conclusions.

October 11, 20178 yr

Solution

Just because two variables have a strong correlation, it does not form a sufficient condition for a cause-effect relationship.

Let us consider two events P and Q that have shown a correlation. The various possibilities may be examined as follows:

1. Event P may be dependent on Event Q (Direct causation)

This is the straight forward and genuine conclusion that one may derive from a correlation. For eg. Days and Nights are caused due to the rotation of earth.

2. Event Q may be dependent on Event P (reverse correlation)

How will it sound if we conclude that the rotation of earth is caused by Days and Nights? This being an obvious example, one may not make such a mistake. However, for not so familiar events, going just by correlation, the cause-effect relation may be mistaken in the opposite understanding.

3. Event P and Event Q may both be a resultant of a third variable, that acts as a common cause for both these events, but they do not impact each other.

For example, we see a negative correlation between the number of people travelling by public transport and the farmers’ Productivity. In reality they are not correlated to each other but both the factors are influenced by another factor, viz. shortage of fuel. Hence more people started switching to public transport than using their own cars, and farmers were hit by the diesel shortage impacting their productivity

4. Event P causes Event Q; and Event Q causes Event P (bidirectional or cyclic causation)

When more people invest in stock market, the market indices go up, which in turn would make more people to invest.

5. Event P causes Event R which in turn causes Event Q (indirect causation);

Longer hours of work results in consuming more junk food, which in turn causes obesity. So we cannot generalize the expectation that obesity can be reduced by reducing long work hours.

6. In reality there is no connection between Event P and Event Q; it is a spurious correlation

We find a correlation between the no. of cellphone users in India and the number of women joining yoga classes in UK. Practically they are not related, hence it is a case of spurious correlation

The above scenarios and examples bring out the fact that while correlation exercise is a tool that would help us to eliminate certain suspected causes, it may not help us to ascertain a real cause unless we have a good understanding of the events and processes under study, the underlying logical or scientific possibilities of relationship. Many a time the relationship that is indicated by a statistical correlation may have to be validated by other tools or trials before we establish the cause-effect relationship.

October 11, 20178 yr

During the RCA process the original idea is to ensure that the correlation matrix acts as a subjective screening tool to filter out lesser important process inputs.

However when there is a team session wherein there are participants who are from different departments or sections of the business (Cross functional participants), it is good to put in as many inputs as possible, even if the cause effect relationship between two variables is not evident, as it helps to create an non-threatening atmosphere and all participants are motivated. However, to ensure the success of the project, at the end of the exercise, all participants should be on the same page. The expertise of the Project manager and the process owner is of utmost importance here.The objectives and actionable become clearer from hereon.

Thus ,though in a situation when correlation does not prove the cause- effect relationship between to variable inputs, it is prudent to use it in root cause analysis especially the Fish bone diagram, where the contribution of inputs is always high, to keep the morale high .

October 11, 20178 yr

Correlation:

This is a statistical technic used to see whether or not the variables are mutually related & to what extent or degree the variables are related.

The variables are related with respect to their causes & effects perspective.

For example:- Productivity leads to profitability.

Now with respect to the point that correlation does not prove the cause & effect relationship between two variables, why do we still use in root cause analysis.

My answer would be;-

It is still may be necessary to use correlation technic because we don’t know at the point of time we do root cause that whether or not the correlation exists. & hence it is only by assuming correlation, we could go further to find the real root cause.

It is like diagnosing a health issue using correlation technic. Doctors have to go with lots of assumptions on why a particular issue with the health as arises.

For example:-

A child visited Eye specialist with an eye sight problem.

The doctor says there is a correlation between brushing teeth at a particular time & the eye sight problem.

It seems it cannot be proved as a cause.

But it that the real root causes is brushing his teeth at a particular time. This reason seems nowhere connected to eye sight problem but only by root cause analysis this correlation between brushing teeth at a particular time & having eye sight problem is identified.

In this case, The child is brushing his teeth in front of the mirror at 7:30 AM where the sharp sun light is directly reflecting on his eyes causing strain to & leads to eye sight issue

In the beginning, this assumption of the cause does not seems that it could be proved as a correlation to the problem, but it is actually is.

In the first example of Productivity leads to profitability. There could be many other reasons why profitability increases which seem not correlated at all, but actually it is.

Productivity is not the end of cause; the actual cause might be most of the factory workers got married & are very happy.

October 11, 20178 yr

We use correlation in root cause analysis because correlation analysis measures the degree of linear relationship between two variables. It is used in scatter diagram which provides a graphical representation of the relationship of two continuous variables. Correlation does not guarantee causation. Correlation by itself does not imply a cause & effect relationship. From scatter diagram we can judge strength of relationship by width or tightness of scatter, and determine direction of relationship eg. positive or negative.Correlation values of -1 or +1 imply an exact linear relationship. However , the real value of correlation is in quantifying less than perfect relationships.

The simplest tool used in correlation and regression analysis is often called scatter plot or scatter diagram which is plot of one variable versus another. one variable is called independent ad usually shown on horizontal axis and another one is called dependent variable and it is shown on vertical axis.scatter diagrams are used to evaluate cause and effect relationship. The assumption is that the independent variable is causing a change in the dependent variable. scatter plots are used to answer question like" does the length of training have anything to do with the amount of scrap an operator makes.

A correlation problem considers the joint variation of two variables, neither of which is restricted by experimenter. Proving cause and effect requires sound scientific understanding of the situation at hand because statics can not by themselves establish cause and effect.

October 11, 20178 yr

Correlation is used to predict the relationship between two or more variables. It essentially says whether a variable (X as input) can cause influence on the output variable (Y).

Properties of a Correlation
1. It establishes the equation, y=f(x)
2. It talks about the measure of strength of association between two quantitative variables
      - It could be positive or negative correlation or no correlation.
      - Positive correlation indicates that if x is directly proportional to y
      - Negative correlation indicates that x would be inversely proportional to y
3. It lies between -1 and +1
4. Coefficient of linear correlation r is defined as the measure of strength
   a. r >0 indicates positive linear relationship
   b. r<0 indicates negative linear relationship
   c. r=0 indicates no linear relationship

Let us see as why we use Correlation in root cause analysis despite as we know Correlation does not mean causation.
- With the help of a scatter diagram, investigate for each of the vital factors (x) with Y

variable or CTQ
- By doing that, it can prove or disprove our hypothesis that we would have formulated

Eg: A mid-sized company has its internal website opening up slowly. Employees of the company are thinking that this could be because of internet speed slowing down. But the team managing this project, threw multiple vital factors. So a scatter diagram was drawn for each vital factor against Page Response Time (CTQ). The Factors were – Internet speed , DB Operation time taken, Scalability (>1000 persons).

Null Hypothesis (ho) : Internet Speed is affecting the Response Time
Alternate Hypothesis (ha): DB Operations affecting response time primarily

When the Scatter diagram was plotted with values, it was found that DB Operations were taking time due to back and forth calls made between Database and the application because of poor coding. This disapproved the null hypothesis assumption

Conclusion:

As seen above, this is where and how Correlation can help in giving pointers to RCA

October 11, 20178 yr

Correlation, as a technique measures the extent to which two or more variables fluctuate together. It is used to perform "Basic Level Analysis". It helps in eliminating the insignificant factors while establishing "Degree of relationship" of few or several(sometimes 10 - 30 or more) X(Independent Variable) with the Y(The dependent Variable).

That way, it helps in setting the foundation for further analysis and saves time by narrowing the Focus Areas.

In today's day and time, many projects are hard pressed for time and provided with limited funds. Such tools help in enabling speedier progress.

October 11, 20178 yr

Correlation.docx

October 11, 20178 yr

	Correlation	Cause and Effect
Definition	To test the level of relationship of two or more variables	study the effective way of identifying causality between variables.
detailed	if the value of one variable increases or decreases, the other variable also increases or decreases	the action or occurrence of one variable can cause or change the other variable.
Benefits	can predict future events	can be controlled the impact to occur due to causes
Example	smoking is correlated with alcoholism	smoking causes an increase in the risk of developing lung cancer

October 11, 20178 yr

Correlation between two variables A and B does not imply causation (i.e. whether A Causes B or vice versa), but it reveals information about the relationship between the variables. It is used to understand:

(a) If A influences B

(b) If B influences A;

(c) Strength of the relationship between A & B

(d) If relationship between A & B is positive or negative

Correlation is used for a few reasons:

1. To see whether two variables are associated, without necessarily inferring a cause-and-effect relationship.

Example: If it must be inferred whether Marketing Expenses were affecting Sales of a company.

Suppose, we are doing the RCA for Sales drop during a period and lack of Marketing is suggested as one of the reasons.

We can calculate correlation between the two variables “Marketing Expenses” and “Sales Revenue” after collecting data for say,12 months.

Value of Correlation Coefficient “r”	Strength of relationship
-1.0 to -0.5 or 1.0 to 0.5	Strong Negative or Strong Positive
-0.5 to -0.3 or 0.3 to 0.5	Moderate Negative or Moderate Positive
-0.3 to -0.1 or 0.1 to 0.3	Weak Negative or Weak Positive
-0.1 to 0.1	None or very weak

If we find “r” is moderate or Strong then we can look at improving Marketing as a solution to improve sales. In case we find that marketing budget has indeed been reduced during that period, then it would be a logical recommendation to increase Marketing Budget. Thus, we see that though we do not have a Cause-effect relation between Marketing and Sales, using Correlation we are still able to identify a critical X to improve our Y (Sales Revenue)

If we find “r” is None or very weak, then we can conclude Marketing budget does not affect the sales.

2. Let us consider that 2 variables A and B do not have a cause -effect relationship, but have a strong positive correlation with each other. Now if the cause for either A or B is known then both A & B can be controlled.

3. If it has been determined that two variables are correlated, then given the value of one variable the value of the other can be figured out.

4. Testing hypothesis regarding cause-effect relation. If hypothesis regarding different quantities of a new organic manure “X” having different effect on plant growth level is to be tested, then the lab technician can vary the quantities of X and see if there is variation in plant growth. If there is a positive correlation then the cause -effect relation hypothesis is true.

October 11, 20178 yr

Correlation gives a relationship between two variables. It gives relationship between input X and output Y. It gives a good graphical representation of how the two variables are related.

But one cannot determine whether Input X is the cause of output Y. it can be the case that both the variables are X, or both the variables are Y.

As an example, Crime rate is increased in Ahmedabad VS Ice cream sales has decreased. There can be good correlation between both the events but, the real root cause can be something different.

In Ahmedabad, weather is pleasant in winter, so in winter the criminals become more active and hence crime rate increases while on the other hand since it is cold people prefer hot drinks rather than ice cream. So ice cream sales have decreased.

This is a major drawback of correlation. Knowing this most of the people use this analysis.

Because it acts as a good start, we can filter out many inputs from the analysis which does not show a relationship with the output Y. This can save a lot of time.

Further we can work on root cause analysis , based on the correlation study. We can know which parameter is having how much impact on the output.

October 11, 20178 yr

Correlation is finding a relationship between two or more sets of data. It measures the strength between the variables whether they are strong, moderate or weak and also the direction of relationship i.e positive or negative.

To find the correlation between the variables they should be independent which are not impacted by changes to other variables in a process. So, independent variables shows observed variation.

If the absolute value of the correlation coefficient is greater than 0.85, then there is a good relationship.

Correlation does not prove the cause-effect relationship between two variables. Why do we still use it in root cause analysis?

Yes, correlation is a good tool to know the relationship between the independent variables which are found or listed during the investigation. Usually after listing down the probable root causes select the independent variables and conduct the correlation which shows whether the probable root cause has positive or negative relationship with the problem.

In such a way we can use the correlation tool to find the appropriate variable which is leading to the cause.

October 11, 20178 yr

The two terms Correlation and Causation (Cause and Effect relationship) are often confused with each other, whereas these two terms are distinctly different. If two variables are correlated it means that when one variable (say X) changes the other variable (say Y) also changes in a positive or negative direction, but this doesn't necessarily mean that variable X is causing variable Y to change. For example, if we plot the data of last 10 years car sales and inflation, both will be positively correlated but neither inflation is the cause of increase in car sale nor car sales are causing inflation. On the other hand the sale of sweaters increase with the dip of temperature in winters, this also has a positive correlation and at the same time dip in temperature is the cause of increase in sale of sweaters.

In other words Correlation doesn't assure that there is a Cause and Effect relationship but on the other hand, if there is a Cause and Effect relationship, there will have to be correlation. Hence the use of Correlation Analysis is still inevitable in Cause and Effect Analysis, its a kind of Hypothesis proving which confirms the Cause and Effect relationship.

October 11, 20178 yr

Correlation: A relationship between two sets of variables.

Causation: A particular event or action which triggers a second event or action.

Root Cause Analysis: Analyse various factors that could be the cause of a particular event which is different than the expected normal.

Therefore, eventhough, correlation does not guarantee a causal relationship, it still is a huge advantage to use Correlation in RCA to narrow down the probable cause & effect relationships from a plethora of possibilities ( especially in surveillance).

October 11, 20178 yr

Question: Correlation does not prove the cause-effect relationship between two variables. Why do we still use it in root cause analysis? Please answer in your own words.

➢ Correlation – means the change in one variable does not change the other variable automatically. It is a statistical measure which tells the size and the direction / extent of relationship between two variables considered.

➢ Causation – means the change in one variable causes the change in the values of the other variable. Also called cause and effect.

➢ Association – it is also called as correlation

Two types of relationship:

✓ An action/ one variable causes the other variable. (E.g. Eating sugar causes diabetes)

✓ It is correlated / associated between two variables. (E.g. Diabetes and hypertension are correlated, but diabetes does not cause hypertension.)

Yes. Correlation is a necessary condition but it is not a sufficient condition for causation. Sometimes correlation alone is enough. Sometimes you need experimental data but not observational data to find the causation. Causation requires factual data but correlation is based on the observations.

Eg. A study was conducted in the insurance agency, that the male drivers are more prone to more of accidents, hence insurance agencies charge high. In this case you can’t change the cause. Gender of the drivers can’t be changed experimentally. Here a male and female group might be tested in separate groups and the results can be analyzed for correlation.

Causation – When a Ola CEO suddenly expired, there will be a change in the system. Huge cab price for some days may be experienced. Death of CEO is the cause for huge cab price/ revised cab price.

Lets talk about each relationship in detail…

1. No relationship – No association. Which means when one variable remains constant, while the other variable increases or decreases.

For Eg. If a person eat sweets / sugary products are correlated to the likelihood of obesity. After a detailed study, all correlation said the more you eat sweets / sugary products, the more you put on weight. Where is the causation? Do the sugary products cause one to gain weight, or does a gain in weight cause an increased consumption in sugary products? (a controlled experiment with rats showed the group that was fed a yogurt with artificial sweetener gained more weight than the group that was fed the normal yogurt.) Still more of such experiments happening in the medical laboratories.

2. Negative – it means the two measured variables move in opposite directions (ie when one increases the other decreases, or when one decreases the other increases).

Eg. The size of the palm is negatively correlated with longevity of a person. If you see female’s palm it is always smaller than male’s. But females live longer than males do. Hence it is negatively correlated.

3. Positive correlation – it means the two measured variables move together in same direction. (ie. When one increases, the other also increase ot when one decreases, the other decrease).

For eg. Whenever the outside weather is hot, the amount of fruit juices / icecreams sold are higher. It is positively correlated. As temperature and ice creams are moving in the same direction.

Another example – Where 3rd variable is a cause but there is a correlation between the observed 2 variables:

A strong correlation can be exhibited between amount of crime and amount of ice creams sold by the vendors. In such case, what is cause and what is effect.? We can’t segregate one as cause to another. The answer is evident that there is another 3rd variable causing both crime and ice cream sales. Summers are where the crime is highest and ice cream / juices sales are recorded highest.

Yes. We always see patterns and we normally tend to gather information around the same to support the views already concluded. This behavior is also called as a confirmation bias. We always conclude the study with coincidence but not the causality. A relationship can’t be proved. But can be disproved with the help of hypothesis testing.

Yes. Statistically it is possible to disprove the relationship. Never try to prove a correlation, instead pull double negative and disprove the correlation, by rejecting the null hypothesis.

With such considerations in mind, scientists must carefully design and control their experiments to weed out bias, circular reasoning, self-fulfilling prophecies and hidden variables.

Importance of Causation and correlation:

Correlation is important to identify the extent to which the relationship is established between two variables. After confirming the relationship, it is also important to investigate whether one variable causes the other. By understanding both will provide us insights to a better target to get a best outcome.

Correlation measurement & Values:

It helps us to identify the direction & degree of association between two variables and hence represented by (r ), It is a numerical value range between +1 and -1.0.

Negative Correlation – below 0 which indicates a negative relationship between the variables.

Positive Correlation - > 0 it indicates a positive relationship between the variables meaning that both variables move in tandem

No correlation - =0 , as this indicates there is no relationship between the variables.

Limitations
While the correlation coefficient is a useful measure, it has its limitations:

Correlation coefficients are usually associated with measuring a linear relationship.
For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.

Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered.
For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season (ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream) rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.

Limitations reference: http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+correlation+and+causation

How can causation be established?

People misunderstand the correlation in terms like if there is a relationship, then it would be the real causal factor. This might involve the organization’s extra effort and time and human resource to establish the real root cause or 3rd other variable which really causes the effect.

However if observed data is enough to establish solutions, then it is a wrong method, unless the variable selection is correct.

Controlled environmental study is the most effective way to differentiate the causes from the variables studied. In this controlled environment, the sample picked for study will be divided into two, making sure that the groups are almost comparable in each and every way. Then the study is conducted and the results are monitored. Later it would be analysed for causation and correlation between variables.

Eg. In pharmacies, two groups with similar kind of disease will be selected. One group would be given same treatment, wheras others with advanced or new type of medication. If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes.

observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.

The objective of these studies is to provide statistical information to add to the other sources of information that would be required for the process of establishing whether or not causality exists between two variables.

Hence to conclude the hypothesis is used to confirm the correlation and causation between two variables. Hence it is an important technique to be used in Root cause analysis to find out the critical X’s.

Thanks

Kavitha

Correlation

Featured Replies

Solved by Venugopal R

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)