Outliers in Regression



At this point, you should be familiar with the concept of an outlier. Throughout the course, we've discussed outliers several times. However, we'll concentrate on outliers in the context of linear regression in this blog. We'll also go through how to spot different types of outliers and how to deal with them.

If we can see a cloud of points gathered together in a plot, as well as a solitary point that is far apart from the others. What effect does this outlier have on the least squares line? To answer this question, consider where the line would go if this particular outlier didn't exist. And in that situation, there would be no relationship between the two variables because the pointer would be completely random, making the line appear horizontal.

As a result, there is no relationship between x and y without the outlier, and this one solitary outlier gives the impression that there is. There are several forms of outliers, and how we address them varies based on the type. Outliers are points that deviate from the majority of the data. Leverage points are outliers that fall horizontally away from the cloud's center but have no effect on the regression line's slope. Influential points are outliers that have an effect on the slope of the regression line.

To see if a point has an impact, visualize the regression line with and without the point and ask... Does the line's slope vary significantly? So, what kind of anomaly is this? To answer this issue, we must first determine whether this point deviates horizontally from the rest of the data. Then, the answer is Yes as a result, it has the potential to be a leverage point. However, another question we have is whether it is also influential. Let's try to imagine where the line would go if the point was present or absent. It appears that the line will remain in the same position. As a result, the outlier point is on the regression line's trajectory. As a result, it has no bearing on it. As a result, this is a leverage point.

Remember, you don't want to get rid of outlying points simply to get rid of them, because those can be the most fascinating situations. Last but not least, a word about influential points. Let's take a look at this statement and see if it's correct or not. R squared is usually reduced when there are influential points. Influential aspects do have the tendency to make life more challenging. Is it true, however, that they always decrease R squared? The response is always look at a scatter plot before fitting a model. If we only looked at the correlation coefficient and R squared to determine whether or not the model is a good fit, we would never see the irregularity in the data or the fact that there is only one influential point driving the entire relationship.


On this topic, your comments/suggestions are highly appreciated. Type it in the comment section below. You can follow to this blog to receive notifications of new posts.




Post a Comment

0 Comments