Jump to content
Click here to know about CAISA - Certified AI Solution Architect ×
Message added by Mayank Gupta,

Correlation - is a statistical measure to quantify the strength of the relationship between two quantitative and continuous variables. The relationship can be one of the following

Positive - increasing one variable would increase the other
Negative - increasing one variable would decrease the other
No Correlation - increasing one variable has no impact on the other

 

Covariance is a measure of the linear relationship between two variables. Covariance is not standardized, unlike the correlation coefficient. Hence, covariance values can range from negative infinity to positive infinity. Positive covariance values indicate that above average values of one variable are associated with above average values of the other variable and below average values are similarly associated.  Negative covariance values indicate that above average values of one variable are associated with below average values of the other variable. For samples, the covariance is calculated as the sum of the product of deviations of the data values about their means divided by n-1.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Mohamed Asif on 9th Apr 2022.

 

Applause for all the respondents - Tamilarasan, Anshul Vaidya, Mohamed Asif.

Question

Posted

Q 462. What is the difference between correlation and covariance? Provide examples to highlight the usage of these terms.

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

4 answers to this question

Recommended Posts

  • 0
Posted

Both Correlation and Covariance measures the linear association between two variables.

To be specific and make it apparent, let us understand the key difference,

Correlation measures the strength of a relationship between two variables.

Covariance measures the direction of a relationship between two variables.

 

Specific comparison:

Values:

Correlation: Standardized

Covariance: Unstandardized

 

Units:

Correlation: Has Units

Covariance: Does not have units

 

Scale:

Correlation: Change in scale does not affect the value of Correlation

Covariance: Change in Scale will affect the value of Covariance

 

Range:

Correlation: -1 to +1

Covariance: -∞ to +∞

 

Why Correlation value lies between -1 and +1?

Correlation is nothing but Covariance divided by standard deviation of the variables, hence the value lies between -1 and +1. Which means, it is scaled down version of covariance. 

 

Inferences from Analysis:

Covariance Inference:

Positive - Both the variables increase or decreases together - Directly Proportional

Negative - Inverse, if one variable increases, the other decreases - Inversely Proportional

 

Correlation Inference:

+1 - Perfect Positive linear relationship

0 - No linear relationship

-1 - Perfect Negative Linear relationship

 

Some more examples:

Correlation Examples:
Pearson r Relationship
0 No relationship
0.466 Moderate positive relationship
0.95 Large positive relationship
-0.96 Large negative relationship

 

Covariance Examples:
Covariance Relationship
0.0036 Positive
0 No variance
-0.007 Negative
-0.0376 Negative

 

Covariance, typically can take any value and it is toilsome to interpret the number.

 

Sample Data Set:

G Price CO Price
49000 95.17
48600 98.4
48600 98.4
48600 98.4
48250 97.17
48000 97.16
47800 101.24
47800 101.24
47800 101.24
47950 103.66

 

Based on the same data set, below is association summary:

Correlation(R) -0.74682
Covariance(G,CO) -744.37

 

image.png.96de3a0f71bc1eda9fa9f8e9601c637b.png

 

There are numerous applications of Correlation and Covariance, some are listed below:

Data science: one of the frequent used measurement is Covariance.

Insights from covariance analysis can help us to get more clarity on Multivariate data.

 

Stock market: Investors, traders and analyst often use correlation and covariance. Specifically, to understand the hidden correlation on the stock returns of one company to other, which could potentially bring down and minimize the investment risks.

Implied Correlation Index by CBOE (Chicago Board Options Exchange): This tracks the correlation between implied volatilities of options and weighted portfolio of options

image.png.641758a496419ad39bf32ce8a67852c1.png

 

Banking and Insurance: Exploratory analysis can give more insights on the variable relationship which assists in customer churn and retention.

  • 1
Posted

 

 

1.       Correlation depicts magnitude & direction of linear relation between two data series. The value of correlation for sample universe varies between range +1 to -1, depicting positive and negative slope for the regression line. Here, negative correlation values between -1 to 0 represent a negative association-- with magnitude of one data series value decreasing, as the magnitude of the other data series variable increases. Likewise, positive correlation values between 0 to +1 represent positive association-- with magnitude of one data series value increases, as the magnitude of the other data series value increases.  Specifically, correlation values greater than -0.7 to -1, indicating strong negative correlation, values between -0.7 and -0.5, indicates modest negative correlation and -0.4 to 0, indicates weak or poor correlation. Similarly, correlation values greater than 0.7 to +1 indicating strong correlation, values between 0.7 and 0.5 indicates modest correlation and 0.4 to 0, indicates weak or poor correlation.

2.        Schematically correlation may be represented as downward sloping line in scatter-plot diagram; however, the actual direction of correlation curve, would depend upon value of the data points.

3.       Covariance represents the differences in position of data points of series from mean value. An upward facing line-plot may be used in scatter-plot diagram, to schematically represent deviation of data variables from mean value. The theoretical value of covariance between variable of a data series may range between -∞ to +∞.

4.       Correlation phenomenon is observed between values of different data types, represented in different data series; whereas, the covariance phenomenon is observed between data values of similar data types represented in same series. Both Covariance & Correlation values, are used interchangeably (basis similarity/difference in variable values type & scale units), to estimate value of Eigen value and Eigen Vector, that is employed for estimation of the PCA principal component analysis. It is theoretically possible to have different estimate of PCA principal component analysis using covariance matrix or correlation matrix while reaching estimate of Eigen value and Eigen Vector.    

5.       Correlation value may be obtained by dividing the value of covariance, with the standard deviation of independent data series.

6.       Correlation may be impacted by instance of autocorrelation between variable i.e., the explanation of dependent variable is not fully explained by independent predictor variable. This can be illustrated with help of residual plot, where residual variable, (calculated as difference between observation variable i.e., dependent variable plotted on y-axis & fitted value i.e., independent variable plotted on x-axis) exhibit correlation between two consecutive values. Hence, in case of auto correlation, error term (defined as difference between expected value and actual value) may explain/predict/infer values of dependent variable, instead of analyst expectation of value of dependent regressand variable explained by explanatory regressor covariate.

 

 

 

  • 0
Posted

Correlation and covariance are statistical concepts that are used to determine the relationship between two random variables.

 

The Main difference is Covariance used to find direction of a linear relationship between two variables. Where, The Correlation is used for relationships between two variables and To assess the strength of a relationship between two variables.

 

Covariance :-

Covariance defines the relationship between 2 random variables and what extent two random item vary together.

Covariance used to find direction of a linear relationship between two variables.

But cannot use the covariance statistic to assess the strength of a linear relationship.

 

- Both variables tend to increase or decrease reciprocally, the coefficient is positive.

- one variable tends to increase as the other one declines, the coefficient is negative.

 

Correlation :-

Correlation it’s quantifies the relationship between two random variables and defines how a change in one variable will impact the other.

Correlation is the most commonly used for process establishing the relationships between two variables, whether or not two variables are related, is to plot them on a “scatter plot

  • It measures the strength of the LINEAR association between two variables
  • Will not measure strong NON LINER relationship
  • DOES NOT measure cause and effect
  • Can be used as 1st step to regression
  • Should be used in conjunction with graphical techniques
  • Measure of the strength of the linear association in a correlation analysis is “R” – Correlation coefficient.

image.png.377a9a66d522d96ca408cd6cd056ccb0.png

 

Correlation and Covariance - Example

 

The monthly revenues from customer and response to advertisements are used as aggregate measure of effectiveness of business development function of a company.

 

image.png.4d2364e2380a90c8067d786688f74c01.png

 

In Minitab Stat >> Basic statistics >> correlation

image.thumb.png.f05a580bd40ceff270c99ef67be5d3b2.png

 

 

image.png.b601a75d7a85064f52ce0500ecf4cfa1.png

 

Correlation Result: R=0.997. So significant.

image.png.c4737b928fb60e1815bf8553f71cc924.png

Covariance Result:

The covariance between Revenue and Adv. Response is 67.5856, which indicates that the relationship is positive

  • 0
Posted

This was a tricky question to answer. In the published answers there are two distinct approaches - one more conceptual and theoretical while the other involves explaining with an example.

 

 

While Anshul's answer is a treatise on the question and is a must read, Mohamed's answer scores a bit more for the example. Hence, Mohamed's answer has been selected as the winner. 

 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Who's Online (See full list)

    • There are no registered users currently online
  • Forum Statistics

    • Total Topics
      4.4k
    • Total Posts
      19.1k
  • Member Statistics

    • Total Members
      55,648
    • Most Online
      990

    Newest Member
    disha deo
    Joined
×
×
  • Create New...