Subscribe to our Newsletter

Originally posted on Data Science Central

This article on going deeper into regression analysis with assumptions, plots & solutions, was posted by Manish Saraswat. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy.

Regression analysis marks the first step in predictive modeling. No doubt, it’s fairly easy to implement. Neither it’s syntax nor its parameters create any kind of confusion. But, merely running just one line of code, doesn’t solve the purpose. Neither just looking at R² or MSE values. Regression tells much more than that!

In R, regression analysis return 4 plots using plot(model_name) function. Each of the plot provides significant information or rather an interesting story about the data. Sadly, many of the beginners either fail to decipher the information or don’t care about what these plots say. Once you understand these plots, you’d be able to bring significant improvement in your regression model.

For model improvement, you also need to understand regression assumptions and ways to fix them when they get violated.

In this article, I’ve explained the important regression assumptions and plots (with fixes and solutions) to help you understand the regression concept in further detail. As said above, with this knowledge you can bring drastic improvements in your models.

What you can find in this article :

Assumptions in Regression

What if these assumptions get violated ?

  1. Linear and Additive
  2. Autocorrelation
  3. Multicollinearity
  4. Heteroskedasticity
  5. Normal Distribution of error terms

Interpretation of Regression Plots

  1. Residual vs Fitted Values
  2. Normal Q-Q Plot
  3. Scale Location Plot
  4. Residuals vs Leverage Plot

You can find the full article here. For other articles about regression analysis, click here. 

Note from the Editor: For a robust regression that will work even if all these model assumptions are violated, click here. It is simple (it can be implemented in Excel and it is model-free), efficient and very comparable to the standard regression (when the model assumptions are not violated).  And if you need confidence intervals for the predicted values, you can use the simple model-free confidence intervals (CI) described here. These CIs are equivalent to those being taught in statistical courses, but you don't need to know stats to understand how they work, and to use them. Finally, to measure goodness-of-fit, instead of R-Squared or MSE, you can use this metric, which is more robust against outliers. 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Email me when people comment –

You need to be a member of DataViz to add comments!

Join DataViz

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds


Data Scientist - Adobe

Adobe - The Challenge Data has become a strategic asset of the CIO’s organization fueling all aspects of operations and decision-making. The ability to eff...

Data Scientist - EA

Electronic Arts (EA) - Position Overview The Data Scientist position is part of the Data Science team within the Global Analytics and Insights group in Electronic Art’s G...