Linear Regression with R

This is going to be mainly to do with creating a linear regression model for the default diamond dataset found in the ggplot2 library in R.

Below you can find the code used to create the LR model for the diamond data set.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | library(ggplot2) library(caTools) View(diamonds) #Creating our training and test data frames sample.split(diamonds$price,SplitRatio = 0.65) -> split_values #uses a 0.65 to 0.35 split subset(diamonds,split_values==T) -> train_reg subset(diamonds,split_values==F) -> test_reg #buidling linear model lm(price~.,data=train_reg) ->mod_regress predict(mod_regress,test_reg) -> result_regress cbind(Actual=test_reg$price,Predicted=result_regress)-> Final_Data as.data.frame(Final_Data)->Final_Data View(Final_Data) #calculating the error (Final_Data$Actual - Final_Data$Predicted) -> error cbind(Final_Data,error) -> Final_Data rmse<-sqrt(mean(Final_Data$error^2)) rmse |

To provide some further information on how the above graphs are relevant in any way the following article written by Bommae Kim https://data.library.virginia.edu/diagnostic-plots/ provides a far superior explanation than what I am about to say.

This type of plot will help indicate if the predictor variables and the outcome variables have a linear or non-linear relationship. If there are equally spread residuals sitting around a horizontal line, this would be a good indicator of a linear relationship. However, if there is no equal spread around a horizontal line this could be indicating a non-linear relationship. In the case above, it could be assumed that the model has a linear relationship as in the plot there is a somewhat equal spread around a horizontal line however there is a slight progressive increase to the line which could be hinting at a parameter of the model which has not been defined.

A normal Q-Q plot will show if the residuals are normally distributed. This is demonstrated when they follow a straight line or not. A positive result would be the residuals lined up well on the straight line. Following these assumptions, the Normal Q-Q plot we have above looks concerning.