Q 662. In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model. Why do we need to assess it separately? Aren't R-sq or R-sq (adj) sufficient indicators of a good model? Support your answer with suitable examples. Note for website visitors - This platform hosts two weekly questions, one on Tuesday and the other on Friday. All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/. To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/. The question will be open until Tuesday or Friday at 5 PM Indian Standard Time, depending on the launch day. Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be approved. If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting. All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term. Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error prone as our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/ We also use an AI content detector at https://crossplag.com/ai-content-detector/. Only answers with less than 15-20% AI-generated content will be approved.

Message added by Mayank Gupta, April 23, 20242 yr

R-Squared Predicted [R-sq (pred)] is an indicator of how well the model predicts the response for new observations. Models that have larger R-sq (pred) values have better predictive ability.

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sumukha Nagaraja on 22nd Apr 2024.

Applause for all the respondents - Anish Mohandas, Sumukha Nagaraja, Nikita Chordia.

R-Squared Predicted

Followers

April 19, 20242 yr

Q 662. In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model. Why do we need to assess it separately? Aren't R-sq or R-sq (adj) sufficient indicators of a good model? Support your answer with suitable examples.

Note for website visitors -

This platform hosts two weekly questions, one on Tuesday and the other on Friday.
All previous questions can be found here: https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/.
To participate in the current question, please visit the forum homepage at https://www.benchmarksixsigma.com/forum/.
The question will be open until Tuesday or Friday at 5 PM Indian Standard Time, depending on the launch day.
Responses will not be visible until they are reviewed, and only non-plagiarised answers with less than 5-10% plagiarism will be approved.
If you are unsure about plagiarism, please check your answer using a plagiarism checker tool such as https://smallseotools.com/plagiarism-checker/ before submitting.
All correct answers shall be published, and the top-rated answer will be displayed first. The author will receive an honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.
Some people seem to be using AI platforms to find forum answers. This is a risky approach as AI responses are error prone as our questions are application-oriented (they are never straightforward). Have a look at this funny example - https://www.benchmarksixsigma.com/forum/topic/39458-using-ai-to-respond-to-forum-questions/
We also use an AI content detector at https://crossplag.com/ai-content-detector/. Only answers with less than 15-20% AI-generated content will be approved.

Solved by Sumukha Nagaraja

April 22, 20242 yr

Go to solution

April 21, 20242 yr

R squared = Measures the proportion of variance in the dependent variable(y) that is explained by the independent variables (x). It ranges from o to 1 and higher R-Squared value indicates that model is a good fit. The regression model is created based on training dataset.

R Squared prediction = used to assess the predictive performance of a regression model. This is usually done using the test datasets (unseen data) to know how well the model could predict in the real world.
Example:
R- squared is a good indicator of how well the model fits based on the training data whereas R-square prediction will actually show how well the model fits the unseen data in the real world. This prediction is usually done with test data.
One good example is the loan default prediction model created by a bank where they want to predict if the customer will default on loan based on various parameters(X factors such as age, gender nationality, loan amount, occupation, purpose of loan etc). The regression model is created based on the historical data using training data set. R-squared value = 0.93. This indicates that the model fits well.The regression model was then used on training dataset to predict how well the model fit for unseen data. R-squared prediction value =0.87 which also indicates that the model is a good fit for unseen data.

April 22, 20242 yr

Solution

In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model, this needs to be assessed separately even though we have R-sq and R-sq (Adj) calculated as part of the model which focuses on measuring the goodness of fit of any new factors to the model but don't assess the predictability of any new factor to the model. In order to make the model more predictable higher R-sq (Pred) is required against the R-sq and R-sq (Adj) and also fitment of any new factor or data to the model can be tested. This also helps in avoiding the multicollinearity in the model.

Eg. Consider examples of predicting the prices of flats based on different factors like area of the flats, locality, bedrooms and amenities. You create a model based on historical data where R-sq and R-sq (Adj) values are calculated as 0.82 and 0.81 respectively, which indicates there are 81-82% variability in historical data. R-sq (Pred) is 0.75 predicting 75% of variability in new data. The predicted value will be lower as the data is new as compared to historical data aligned for other measures. These predicted values are more focused on future sales and decision making.

April 23, 20242 yr

To access the full capability of the model it is essential to understand the explanatory performance as well as predictive performance.
In explanatory understanding of regression model we generally see difference between observed values and predicted values is considered. If the difference between the values is small, the model is a good fit. R-square is an indicator that shows goodness of fit. It indicates the percentage of variation explained by the model.
If R-squared is 0%, it means none of the variation is explained by the model and 100% means all variation is explained. Hence, one may assume that high value of R-sq is good and low value is bad. However, that’s not always the case.
If R-square values are low, it may be good to check if the predictors are statistically significant. If yes, you can still draw some conclusions from the model. Also, in some fields like psychology generally lower R-sq values are also acceptable based on the nature of the study.
If the R-square value is high, it may not always be a good fit. In some cases, R-sq may be biased. Example- it could be because of using linear model to explain non-linear data. In some cases, R-square value may be high due to overfitting. Generally, if we add more variables to the model, the value of R-square will increase even if the variable is not significant. To solve this, we need to modify R-squared in a way that it is not affected number of variables. R-squared adjusted is that modified version which only increases if added variable improves the model.

To assess the predictive performance, we need to systematically remove each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. Predictive R-squared indicates how well a regression model predicts responses for new observations.

2 yr2 yr Rohit Gandhi locked this topic

April 23, 20242 yr

Sumukha has given the winning answer to this question. Short and crisp. Well done!

Answer from Anish is also a must read.

2 yr2 yr Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

R-Squared Predicted

Featured Replies

Solved by Sumukha Nagaraja

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)