• 1

# R-Squared Predicted

R-Squared Predicted [R-sq (pred)] is an indicator of how well the model predicts the response for new observations. Models that have larger R-sq (pred) values have better predictive ability.

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sumukha Nagaraja on 22nd Apr 2024.

Applause for all the respondents - Anish Mohandas, Sumukha Nagaraja, Nikita Chordia.

## Question

Q 662In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model. Why do we need to assess it separately? Aren't R-sq or R-sq (adj) sufficient indicators of a good model? Support your answer with suitable examples.

Note for website visitors -

## Recommended Posts

• 0

In advanced regression techniques, we use R-sq (Pred) to assess the predictive performance of a model, this needs to be assessed separately even though we have R-sq and R-sq (Adj) calculated as part of the model which focuses on measuring the goodness of fit of any new factors to the model but don't assess the predictability of any new factor to the model. In order to make the model more predictable higher R-sq (Pred) is required against the R-sq and R-sq (Adj) and also fitment of any new factor or data to the model can be tested. This also helps in avoiding the multicollinearity in the model.

Eg. Consider examples of predicting the prices of flats based on different factors like area of the flats, locality, bedrooms and amenities. You create a model based on historical data where R-sq and R-sq (Adj) values are calculated as 0.82 and 0.81 respectively, which indicates there are 81-82% variability in historical data. R-sq (Pred) is 0.75 predicting 75% of variability in new data. The predicted value will be lower as the data is new as compared to historical data aligned for other measures. These predicted values are more focused on future sales and decision making.

##### Share on other sites

• 2

R squared = Measures the proportion of variance in the dependent variable(y) that is explained by the independent variables (x).  It ranges from o to 1 and higher R-Squared value indicates that model is a good fit. The regression model is created based on training dataset.

R Squared prediction = used to assess the predictive performance of a regression model. This is usually done using the test datasets (unseen data) to know how well the model could predict in the real world.
Example:
R- squared is a good indicator of how well the model fits based on the training data whereas R-square prediction will actually show how well the model fits the unseen data in the real world. This prediction is usually done with test data.
One good example is the loan default prediction model created by a bank where they want to predict if the customer will default on loan based on various parameters(X factors such as age, gender nationality, loan amount, occupation, purpose of loan etc). The regression model is created based on the historical data using training data set. R-squared value = 0.93. This indicates that the model fits well.The regression model was then used on training dataset to predict how well the model fit for unseen data. R-squared prediction value =0.87 which also indicates that the model is a good fit for unseen data.

##### Share on other sites

• 0

To access the full capability of the model it is essential to understand the explanatory performance as well as predictive performance.
In explanatory understanding of regression model we generally see difference between observed values and predicted values is considered. If the difference between the values is small, the model is a good fit. R-square is an indicator that shows goodness of fit. It indicates the percentage of variation explained by the model.
If R-squared is 0%, it means none of the variation is explained by the model and 100% means all variation is explained. Hence, one may assume that high value of R-sq is good and low value is bad. However, that’s not always the case.
If R-square values are low, it may be good to check if the predictors are statistically significant. If yes, you can still draw some conclusions from the model. Also, in some fields like psychology generally lower R-sq values are also acceptable based on the nature of the study.
If the R-square value is high, it may not always be a good fit. In some cases, R-sq may be biased. Example- it could be because of using linear model to explain non-linear data. In some cases, R-square value may be high due to overfitting. Generally, if we add more variables to the model, the value of R-square will increase even if the variable is not significant. To solve this, we need to modify R-squared in a way that it is not affected number of variables. R-squared adjusted is that modified version which only increases if added variable improves the model.

To assess the predictive performance, we need to systematically remove each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. Predictive R-squared indicates how well a regression model predicts responses for new observations.

##### Share on other sites

• 0

Sumukha has given the winning answer to this question. Short and crisp. Well done!

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
3.3k
• Total Posts
16.9k
• ### Member Statistics

• Total Members
55,176
• Most Online
990