In this tutorial, I will illustrate how to add a fitted line to a scatter plot in R Programming language. The tutorial has two parts.

Before starting the application part, I’ll briefly explain the concept of the scatter plot and the fitted line, respectively.

A Scatter plot can help you identify the relationship between two variables. It is a plot of one variable against another variable.

Among these two variables, one is the explanatory variable (independent variable), and the other is the response variable. (dependent variable, target variable). The decision of the explanatory and response variables requires attention to have a scatter plot representing the true relationship.

The most effective usage of the scatter plot is for investigating the relationship between two numerical variables.

As stated above, the scatter plot is used to understand the relationship between two variables. If these two variables are related (or correlated), then we can also represent their relationship by a line on the plot. A fitted line provides us to obtain the estimated regression function between the explanatory variable and target variable.

After this brief introduction, let’s get started! We will use a very famous dataset in this tutorial,mtcars.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Adding a fitted line to an R base scatter plot

We will discover the underlying relationship between horsepower(hp) and miles per gallon (mpg). To draw a scatter plot in base R, we use plot()function. Here is how we are supposed to use it.

plot(exploratory variable,response variable)

plot(mtcars$hp,mtcars$mpg)

Now, we will manipulate the axis names and the title of the plot to have a better view.

plot(mtcars$hp,mtcars$mpg,
     xlab  = "Horsepower",
     ylab = "Miles per Gallon",
     main = "The relationship between horsepower and miles per gallon")

As seen from the figure above, the two variables are related to each other. When the horsepower increases, mpg decreases. Now, we can represent this relationship by adding a fitted line to this scatter plot. To do so, we will use abline() and lm() functions together.

abline() is used to add one or more lines to the current plot and lm() is used to fit a linear model.

plot(mtcars$hp,mtcars$mpg,
     xlab  = "Horsepower",
     ylab = "Miles per Gallon",
     main = "The relationship between horsepower and miles per gallon")
abline(lm(mpg~hp,data = mtcars))

We can change the color and thickness of the line by using col and lwd arguments, respectively.

plot(mtcars$hp,mtcars$mpg,
     xlab  = "Horsepower",
     ylab = "Miles per Gallon",
     main = "The relationship between horsepower and miles per gallon")
abline(lm(mpg~hp,data = mtcars),
       col = "red",
       lwd = 2)

In the example above, we add a straight fitted line to the scatter plot. It is also possible to add a smoothed fitted line using lowess function. We also use lines function instead of abline command.

plot(mtcars$hp,mtcars$mpg,
     xlab  = "Horsepower",
     ylab = "Miles per Gallon",
     main = "The relationship between horsepower and miles per gallon")
lines(lowess(x = mtcars$hp, y = mtcars$mpg),
       col = "red",
       lwd = 2)

The line representing the negative relationship between two variables of interest is not straight due to the random variation.

So far, we have shown how to add a fitted line to a base scatter plot. Now, let’s follow the same steps for ggplot2 scatter plot.

Adding a fitted line to ggplot2 scatter plot

You have to install the ggplot2 package if you are interested in creating a ggplot2 object. Then, load the package.

#install.packages(ggplot2) # install the ggplot2 package
library(ggplot2) #load the package
## Warning: package 'ggplot2' was built under R version 4.1.2

According to the ggplot2 idea, a plot can be split into 3 parts.

The geometry function for scatter plot in ggplot2 is geom_point().

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()

Now, we can change the axis names and add title to the plot.

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()+
  labs(x  = "Horsepower",
     y = "Miles per Gallon",
     title = "The relationship between horsepower and miles per gallon")

We can add the straight fitted line to the plot first. The geometric function used for adding a fitted line is geom_smooth. Since we draw a regression function line, we also use method = lm argument in the function.

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()+
  labs(x  = "Horsepower",
     y = "Miles per Gallon",
     title = "The relationship between horsepower and miles per gallon")+
  geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

The line on the figure above has a grey band. It is a confidence interval, but we don’t want to have it. To remove it, we also use se=FALSE argument.

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()+
  labs(x  = "Horsepower",
     y = "Miles per Gallon",
     title = "The relationship between horsepower and miles per gallon")+
  geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

We can change the color and thickness of the line by using col and size arguments, respectively.

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()+
  labs(x  = "Horsepower",
     y = "Miles per Gallon",
     title = "The relationship between horsepower and miles per gallon")+
  geom_smooth(method = "lm", 
              se = FALSE,
              col = "red",
              size =2)
## `geom_smooth()` using formula 'y ~ x'

If we are interested in adding a smoothed fitted line rather than a straight line, we just remove the method=lm argument from geom_smooth() function.

ggplot(mtcars,aes(x = hp, y = mpg))+
  geom_point()+
  labs(x  = "Horsepower",
     y = "Miles per Gallon",
     title = "The relationship between horsepower and miles per gallon")+
  geom_smooth(se = FALSE,
              col = "red",
              size =2)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In summary, I illustrate how to add a fitted line to a scatter plot (both base R and ggplot2 graph). For any questions and comments, you can contact me via e-mail.

Back to main page.