Jump to content
Click here to know about ONLINE Lean Six Sigma Certiifcation ×
Message added by Mayank Gupta,

R-Squared Predicted [R-sq (pred)] is an indicator of how well the model predicts the response for new observations. Models that have larger R-sq (pred) values have better predictive ability.


An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sumukha Nagaraja on 22nd Apr 2024.


Applause for all the respondents - Anish Mohandas, Sumukha Nagaraja, Nikita Chordia.


Q 662In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model. Why do we need to assess it separately? Aren't R-sq or R-sq (adj) sufficient indicators of a good model? Support your answer with suitable examples.


Note for website visitors -

Link to comment
Share on other sites

4 answers to this question

Recommended Posts

  • 0

In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model, this needs to be assessed separately even though we have R-sq and R-sq (Adj) calculated as part of the model which focuses on measuring the goodness of fit of any new factors to the model but don't assess the predictability of any new factor to the model. In order to make the model more predictable higher R-sq (Pred) is required against the R-sq and R-sq (Adj) and also fitment of any new factor or data to the model can be tested. This also helps in avoiding the multicollinearity in the model. 


Eg. Consider examples of predicting the prices of flats based on different factors like area of the flats, locality, bedrooms and amenities. You create a model based on historical data where R-sq and R-sq (Adj) values are calculated as 0.82 and 0.81 respectively, which indicates there are 81-82% variability in historical data. R-sq (Pred) is 0.75 predicting 75% of variability in new data. The predicted value will be lower as the data is new as compared to historical data aligned for other measures. These predicted values are more focused on future sales and decision making. 

Link to comment
Share on other sites

  • 2

R squared = Measures the proportion of variance in the dependent variable(y) that is explained by the independent variables (x).  It ranges from o to 1 and higher R-Squared value indicates that model is a good fit. The regression model is created based on training dataset.

R Squared prediction = used to assess the predictive performance of a regression model. This is usually done using the test datasets (unseen data) to know how well the model could predict in the real world.
R- squared is a good indicator of how well the model fits based on the training data whereas R-square prediction will actually show how well the model fits the unseen data in the real world. This prediction is usually done with test data.
One good example is the loan default prediction model created by a bank where they want to predict if the customer will default on loan based on various parameters(X factors such as age, gender nationality, loan amount, occupation, purpose of loan etc). The regression model is created based on the historical data using training data set. R-squared value = 0.93. This indicates that the model fits well.The regression model was then used on training dataset to predict how well the model fit for unseen data. R-squared prediction value =0.87 which also indicates that the model is a good fit for unseen data.

Link to comment
Share on other sites

  • 0

To access the full capability of the model it is essential to understand the explanatory performance as well as predictive performance. 
In explanatory understanding of regression model we generally see difference between observed values and predicted values is considered. If the difference between the values is small, the model is a good fit. R-square is an indicator that shows goodness of fit. It indicates the percentage of variation explained by the model.
If R-squared is 0%, it means none of the variation is explained by the model and 100% means all variation is explained. Hence, one may assume that high value of R-sq is good and low value is bad. However, that’s not always the case. 
If R-square values are low, it may be good to check if the predictors are statistically significant. If yes, you can still draw some conclusions from the model. Also, in some fields like psychology generally lower R-sq values are also acceptable based on the nature of the study.
If the R-square value is high, it may not always be a good fit. In some cases, R-sq may be biased. Example- it could be because of using linear model to explain non-linear data. In some cases, R-square value may be high due to overfitting. Generally, if we add more variables to the model, the value of R-square will increase even if the variable is not significant. To solve this, we need to modify R-squared in a way that it is not affected number of variables. R-squared adjusted is that modified version which only increases if added variable improves the model.

To assess the predictive performance, we need to systematically remove each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. Predictive R-squared indicates how well a regression model predicts responses for new observations.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Forum Statistics

    • Total Topics
    • Total Posts
  • Member Statistics

    • Total Members
    • Most Online

    Newest Member
    Mihir Shekh
  • Create New...