Coefficient of Determination or (R-square)

If you are dealing with data and models you have cross the path of R-square and many of us are already aware about it, however this mini write-up is to refresh your theory with fundamental concepts.

What is R-square and what does it do?

 The R-square or coefficient of determination, often referred to as R-square or R2, is a statistical measure used in regression analysis to assess the goodness of fit of a model. It provides a way to understand how well the independent variable(s) explain the variation in the dependent variable.

What is Baseline Model and fitted Model

In R-square we have a baseline model which is the worst model. This baseline model doesn’t make use of any independent variables to predict the value of dependent variable Y. Instead it uses the mean of the observed responses of dependent variable Y and always predicts this mean as the value of Y.

Any regression model that we fit is compared to this baseline model to understand it’s goodness of fit. This baseline model is the control model by default.

In other words R-square simply explains how good is your model when compared to the baseline model(model that follows mean or average).

Baseline model: Take the average of Y variable, let’s say an average salary and there would be single value and a flat horizontal line.

Fitted model: It makes the use of X to predict Y and follow the best fit line and least sum of squared errors.

R -Square Formula

Below formula shows the mathematical representation of R-square

SSE: Sum Squared Error, It is the unexplained deviation wrt regreession line.

SST: Sum Squared Total, It is the total observed deviation.

SSR: Sum Squared Regression, This is the explained deviation by the regression line.

R2=SSR/SST or 1-SSE/SST  (Stick to this, there are many variations of this equation)

What is the range of R-square can be taken?

It ranges from 0 to 1, with 1 indicating a perfect fit where all the variation in the dependent variable can be explained by the independent variable(s), and 0 indicating that the independent variable(s) cannot explain any of the variation.


How to interpret R-squared?

If R-squared=0.93 then it means 93% variations in dependent variable Y is explained by the independent variable X present in our model. We can say that the goodness of fit is 93% and it is a good model.

What is the R-square for good Model?

The Thumb rule : R2 of 0.6 or 60% is considered decent. This still depends on the problem at hand and the data we are handling. For some of the models I have seen R2 being .25 or even lower.

Adjusted R2 is another parameter that should be looked upon very carefully and this is used for validating a simple linear regression model.

Validation of  Linear Regression: R2 and adjusted R2

R2 and adjusted R2 are same for simple linear as we have  1y and 1x

Thus only R2 is valid in simple linear regression.

It provides a useful tool for evaluating the goodness of fit in regression analysis but should be considered in conjunction with other statistical measures for a comprehensive analysis such as p-values and confidence intervals, to evaluate the reliability and significance of the regression model. Hope, you guys enjoyed learning about the basic concept of R-square, Stay tuned for more such content we will be talking about the Adjusted R-square in our next blog at Assessment Yoda.

Tags: No tags

Add a Comment

Your email address will not be published. Required fields are marked *