Data Visualization

Data visualization is an essential tool in the realm of information analysis and presentation. It involves the graphical representation of data and statistics to uncover meaningful insights and communicate complex ideas in a clear and concise manner. As the volume and complexity of data continue to grow exponentially, data visualization plays a crucial role in transforming raw numbers into visual narratives that are easily digestible and comprehensible. By leveraging charts, graphs, maps, and interactive visual elements, data visualization allows individuals and organizations to explore patterns, identify trends, detect anomalies, and make data-driven decisions. Its importance lies in its ability to facilitate understanding, reveal patterns and correlations, highlight outliers, and support storytelling, enabling stakeholders to derive actionable insights from vast amounts of information.

Data visualization has a rich history that dates back centuries, evolving alongside humanity’s quest for understanding and representing information visually. Early examples of data visualization can be traced back to ancient civilizations, where visual symbols and hieroglyphics were used to convey important information and record historical events. In the 18th and 19th centuries, pioneers like William Playfair and Florence Nightingale revolutionized the field by introducing graphical methods to illustrate statistical data. Playfair’s invention of line, bar, and pie charts provided a visual framework for comparing quantities, while Nightingale’s famous coxcomb diagram effectively communicated the impact of preventable diseases on soldiers during the Crimean War. The advent of computer technology in the 20th century brought about a new era of data visualization, with advancements in software and hardware enabling the creation of interactive and dynamic visual representations. Today, with the explosion of big data and the need for data-driven decision-making, data visualization has become an indispensable tool in various domains, including business, science, journalism, and public policy. Its historical significance lies in its ability to transform complex data into intuitive visuals, empowering individuals and organizations to gain insights, communicate effectively, and make informed decisions in an increasingly data-driven world.

CoxBomb Diagram by Nightingale

ggplot2 Package

ggplot2, developed by H.Wickham, is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

To get more information about this special packages, you can visit http://ggplot2.org/

Also, there are many videos, books and pages related to this packages. One of them is my book. :)

An Introduction to ggplot2, Ozancan Ozdemir, 2022

The ggplot2, shortly ggplot, implies “Grammar of Graphics” which believes in the principle that a plot can be split into the following basic parts -

Plot = data + Aesthetics + Geometry

Here, you can see some functions from the list.

    geom_point() = Scatter Plot
    geom_bar() = Bar Plot
    geom_line() = Line Plot
    geom_histogram() = Histogram
    geom_boxplot() = Box Plot
    geom_density() = Density Plot

e.g

    library(ggplot2)
    ggplot(data,aes(x=x,y=y))+geom_point()

In addition to those functions, we use the following arguments or functions for our plots.

Functions

geom_text() = Add label or number on your plot

coord_flip() = Rotate your plot 

theme() = Arrange the theme of your plot, e.g size of axis names etc.

facet_wrap() & facet_grid() = Plot for different subject of your data

scale_color_manual() = Change the color of your plot manually.

labs() = Set title, axis name etc. 

Arguments

col = Change color of your plot by third variable (in aesthetics part)

group= Divide your data into third group (in aesthetics part)

color = Change frame of your box / bar or bin (in geom part)

fill = Fill your box / bar or bin (in geom part)

Important Note

ggplot2 package works with data.frame and tibble objects.

Why ggplot2 is better?

(Ozdemir, O, Lab Notes, 2019)

Case Study

In this tutorial, we will use diamond dataset, which is built-in dataset in ggplot2 to illustrate the plot types.

library(ggplot2)
knitr::kable(head(diamonds))
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Check the class of the variables in the data.

dplyr::glimpse(diamonds)
## Rows: 53,940
## Columns: 10
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.~
## $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver~
## $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,~
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, ~
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64~
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58~
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34~
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.~
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.~
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.~

Question: What is the frequency distribution of cut?

Bar Plot

As seen above, cut is a categorical variable. The most appropriate and frequently used graphical tool for one categorical variable is bar plot.

There are two ways to create a bar plot in ggplot2.

The first one is to calculate a frequency table manually.

t = table(diamonds$cut)
class(t)
## [1] "table"
t
## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551

Then, convert your table object into the data frame or tibble.

df = data.frame(t)
df
##        Var1  Freq
## 1      Fair  1610
## 2      Good  4906
## 3 Very Good 12082
## 4   Premium 13791
## 5     Ideal 21551
library(ggplot2)
ggplot(df,aes(x=Var1,y=Freq))+geom_bar(stat="identity")

#stat="identity" must argument

labs() function helps you adding title, axis names to your plot.

ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
  labs(title="Bar Plot of CUT",y="Freq",x="Level")

geom_text() helps you adding numbers or text to your plot. You can also use annotate. See.

ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
  labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),fontface="bold")

#label a must argument

To adjust the position of texts, you can use the arguments vjust or hjust.

ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
  labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")

#label a must argument

In an ideal bar graph, values should be arranged in descending order, from largest to smallest.

Ordering x axis

Use reorder() argument for x axis.

ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity")+
  labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")

If you have a raw data to be visualized, you can use stat = "count" argument in place of stat="identity.

ggplot(diamonds,aes(x=cut, fill = cut))+geom_bar()+
  labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(stat="count",aes(label =..count..),vjust=-0.25,fontface="bold")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(count)` instead.

Question: What is the distribution of carat?

Histogram/Density Plot/Box Plot

As seen from the output above, we are interested in the distribution of carat, a numerical variable. Histogram or density plot are the best tools to explore the shape of the numerical variable visually. Alternatively, you can also use QQ-Plot or Box Plot

class(diamonds$carat)
## [1] "numeric"

geom_histogram() is the necessary geometry function for histogram.

ggplot(diamonds,aes(x=carat))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(diamonds,aes(x=carat))+geom_histogram(fill="darkred")+labs(title="Histogram of Carat",y="Count",x="Carat")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For density plot, you can use geom_density().

ggplot(diamonds,aes(x=carat))+geom_density()+labs(title="Density Plot of Carat",y="Prob",x="Carat")

It is observed that carat has right skewed multimodal distribution.

ggplot(diamonds,aes(x=carat))+geom_histogram(fill="darkred",aes(y=stat(density)))+labs(title="Histogram and Density Plot of Carat",y="Count",x="Carat")+geom_density(col="orange")
## Warning: `stat(density)` was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(density)` instead.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#aes(y=stat(density))is must 

Combining multiple plots into single window

grid.arrange() command from gridExtra package combines your ggplot2 objects into a single window. To this end, you have to assign an object name to each plot.

g1<-ggplot(diamonds,aes(x=carat))+geom_boxplot()+labs(title="g1")
g2<-ggplot(diamonds,aes(x=factor(1),y=carat))+geom_boxplot()+labs(title="g2") #factor(1) a must 
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.1.2
grid.arrange(g1,g2,ncol=2)

A box plot can be used to observe the distribution of a continuous variable, but it is not particularly successful in identifying multimodal distributions. However, it is effective in detecting extreme outliers. In fact, to create a box plot in ggplot2, you typically need two variables. However, if you want to create a box plot for a single variable, you can assign a factor object to the x-axis.

ggplot(diamonds,aes(x=factor(1),y=carat))+geom_boxplot(fill="darkred")+labs(title="Box Plot of Carat")

Question: What is the association between carat and price?

Scatter Plot

When you are interested in the association between two continuous variables, scatter plot is useful. geom_point() is the geometric function.

ggplot(diamonds,aes(x=carat,y=price))+geom_point()+labs(title = "The relationship between Carat and Price")

ggplot(diamonds,aes(x=carat,y=price))+geom_point(col="darkred")+labs(title = "The relationship between Carat and Price")

Adding trend line

If you want to add a trend line representing the underlying relationship between two continuous variables of interest, you can use the geom_smooth command. This command is useful in ggplot2 for adding a trend line to your plot. By default, geom_smooth uses linear regression, but you can use the method argument within this function to add the trend line using other modeling techniques.

For example, to add a trend line using a different modeling technique such as loess (local regression), you can specify method = “loess” within the geom_smooth function. This will fit a smooth curve to the data instead of a straight line. Similarly, you can explore other available methods, such as generalized additive models (GAMs) or polynomial regression, depending on the nature of your data and the relationship you want to capture.

ggplot(diamonds,aes(x=carat,y=price))+geom_point(col="darkred")+labs(title = "The relationship between Carat and Price")+geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Price and carat have strong positive relationship but some outliers exist that violates the relationship.

More and More

Question: What is the association between carat and price by cut type?

Facet in ggplot

Faceting is a technique that divides the chart window into multiple smaller sections, forming a grid-like layout where each section displays a similar chart. Typically, each section represents a specific group or subset of the dataset, showcasing the same type of graph. This approach, often referred to as small multiples, provides a way to compare and analyze different segments of the data in a concise and structured manner.

ggplot(diamonds,aes(x=carat,y=price))+geom_point()+facet_wrap(.~cut)

# .~ is must 

Question: What is the association between carat and price by depth and cut?

Here, we would like to represent three numerical variables and one categorical variable visually at the same time.

Bubble Plot : A bubble plot is a scatter plot with a third numeric variable mapped to circle size. It also enable us to include one categorical variable as a fourth one.

In order to explain the concept, I would like to use a sample of data. (Prevent the visual overplotting.)

set.seed(123)
s = diamonds[sample(1:53940,50),]

You can add your third variable, which is the numerical one, via size argument, and add your fourth variable, which is the categorical one, via col argument.

ggplot(s,aes(x=carat,y=price,size=depth,col=cut))+geom_point()+labs(title="Association between Carat and Price by Cut and Depth")

The plot shows that when depth and cut quality increases, we can expect high carat and price.

Question: How does depth distribute by clarity?

ggplot(diamonds,aes(x=clarity,y=depth))+geom_point()+labs(title="Relationship between Clarity and Depth")

Such plot suffers from overplotting which makes interpretation harder. It is a common problem seen in data set having large number of observations (for this data we have 53940 observations.)

There are some suggested solutions for this problem. To see the list click here, and one of them is jittering.

Jitter Plot: Random noise are added to the location of each point to remove overplotting.

ggplot(diamonds,aes(x=clarity,y=depth))+geom_point(position="jitter")+labs(title="Relationship between Clarity and Depth")

#position="jitter" is must

To see the effect of jittering, let us use the sample data set, using sample of data is one of the solution of overplotting.

ggplot(s,aes(x=clarity,y=depth))+geom_point()+labs(title="Relationship between Clarity and Depth for Sample Data Set")

ggplot(s,aes(x=clarity,y=depth))+geom_point(position="jitter")+labs(title="Relationship between Clarity and Depth for Sample Data Set")

The second example shows that jittering makes the polynomial relationship between depth and clarity more visible.

More and More

Question: How diamonds prices distribute over cut type?

Box Plot

In addition to showcasing a single variable, box plots are also useful for observing the variation of a continuous variable across different levels of a categorical variable. By creating box plots for each level of the categorical variable, we can visually compare the distributions, central tendencies, and spreads of the continuous variable within each category. This allows us to identify potential differences, outliers, and trends across the categories, providing valuable insights into the relationship between the continuous and categorical variables.

ggplot(diamonds, aes(x = cut, y = price, fill = cut)) +
  geom_boxplot() +
  labs(x = "Cut", y = "Price", title = "Price Distribution by Diamond Cut")

Violin Plot: Violin plots are similar to box plots, except that they also show the kernel probability density of the data at different values. geom_violin() is the geometric function for the violin plot.

ggplot(diamonds,aes(x=cut,y=price))+geom_violin()

Don’t leave violin plot alone.

Combine it with box plot.

ggplot(diamonds,aes(x=cut,y=price,fill=cut))+geom_violin()+geom_boxplot(width=0.15)+labs(title="Distribution of Price by Cut")

#width sets the size of box plot

It is seen that price has right skewed distribution except good cut. Also, the price of ideal cut diamond is smaller than others on the average.

Extra

statsExpressions package

This package helps you to print the output of the statistical test on the related plots. It can be applicable for many types of plot.

For other examples, please click here.

library(ggplot2)
library(ggforce)
## Warning: package 'ggforce' was built under R version 4.1.3
library(statsExpressions)
## Warning: package 'statsExpressions' was built under R version 4.1.3
# plot with subtitle
ggplot(diamonds,aes(x=cut,y=price)) +
  geom_violin() +geom_boxplot(width=0.15)+
  labs(
    title = "Fisher's one-way ANOVA",
    subtitle = oneway_anova(diamonds, cut, price, var.equal = TRUE)$expression[[1]]
  )

Consider first research question and answer this with different visual

What is the frequency distribution of cut?

Lollipop Chart: Lollipop plot is basically a barplot, where the bar is transformed in a line and a dot. It shows the relationship between a numeric and a categorical variable. A lollipop is built using geom_point() for the circle, and geom_segment() for the stem.

t = table(diamonds$cut)
df = data.frame(t)
df
##        Var1  Freq
## 1      Fair  1610
## 2      Good  4906
## 3 Very Good 12082
## 4   Premium 13791
## 5     Ideal 21551
ggplot(df,aes(x=Var1,y=Freq)) +geom_point() + geom_segment(aes(x=Var1, xend=Var1, y=0, yend=Freq))

ggplot(df,aes(x=Var1,y=Freq)) +geom_point(size=5, color="darkred", fill="yellow", alpha=0.7, shape=21, stroke=2) +geom_segment(aes(x=Var1, xend=Var1, y=0, yend=Freq))+labs(title="Lollipop Plot of Cut",x="Cut Types",y="Frequency")

Question: What is the change in the prices within each cut type?

Dumbell Plot: Dumbell plot, a.k.a Dumbell Chart, is great for displaying changes between two points in time, two conditions or differences between two groups.

Before drawing this plot, your data set should be ready for it.

Data Manipulation

min_price = aggregate(depth~cut,data=diamonds,min)
max_price = aggregate(depth~cut,data=diamonds,max)
dumbell_data=cbind(min_price,max_price)
dumbell_data
##         cut depth       cut depth
## 1      Fair  43.0      Fair  79.0
## 2      Good  54.3      Good  67.0
## 3 Very Good  56.8 Very Good  64.9
## 4   Premium  58.0   Premium  63.0
## 5     Ideal  43.0     Ideal  66.7

Not enough..

dumbell_data = dumbell_data[,-3]
colnames(dumbell_data) = c("cut","min","max")
dumbell_data
##         cut  min  max
## 1      Fair 43.0 79.0
## 2      Good 54.3 67.0
## 3 Very Good 56.8 64.9
## 4   Premium 58.0 63.0
## 5     Ideal 43.0 66.7
library(ggalt)
## Warning: package 'ggalt' was built under R version 4.1.2
## Registered S3 methods overwritten by 'ggalt':
##   method                  from   
##   grid.draw.absoluteGrob  ggplot2
##   grobHeight.absoluteGrob ggplot2
##   grobWidth.absoluteGrob  ggplot2
##   grobX.absoluteGrob      ggplot2
##   grobY.absoluteGrob      ggplot2
ggplot(dumbell_data, aes(y=cut, x=min, xend=max)) + 
  geom_dumbbell(size=3, color="gray80", 
                colour_x = "gold1", colour_xend = "darkred",
                dot_guide=TRUE, dot_guide_size=0.1)
## Warning: Using the `size` aesthetic with geom_segment was deprecated in ggplot2 3.4.0.
## i Please use the `linewidth` aesthetic instead.

It is seen that depth of fair diamonds have the highest variability. The premium diamonds have more consistent depth value.

Line Plot: : A line plot is a type of graph that displays data points connected by straight lines, typically used to visualize the trend or pattern of a continuous variable over a continuous range. It is effective in showing the relationship between two variables and can reveal trends, fluctuations, or changes in the data over time or across different categories.

Consider economics data.

head(economics)
## # A tibble: 6 x 6
##   date         pce    pop psavert uempmed unemploy
##   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
## 1 1967-07-01  507. 198712    12.6     4.5     2944
## 2 1967-08-01  510. 198911    12.6     4.7     2945
## 3 1967-09-01  516. 199113    11.9     4.6     2958
## 4 1967-10-01  512. 199311    12.9     4.9     3143
## 5 1967-11-01  517. 199498    12.8     4.7     3066
## 6 1967-12-01  525. 199657    11.8     4.8     3018

We use geom_line() geometric function to produce a line plot. Note that we use two variables to draw a line plot.

# Create a line plot with the economics dataset
ggplot(data = economics, aes(x = date, y = unemploy)) +
  geom_line() +
  labs(x = "Year", y = "Number of Unemployed", title = "Unemployment Trend") 

It is also possible to draw multiple lines on the same plot.

# Create a line plot with two lines using the economics dataset
ggplot(data = economics, aes(x = date)) +
  geom_line(aes(y = unemploy, color = "Unemployed")) +
  geom_line(aes(y = pop, color = "Population")) +
  labs(x = "Year", y = "Count", title = "Unemployment and Population Trends")

Appearance

ggplot2 package provides several functions that improves the appearance of the plot, and thus we have a visual that has the ideal graph properties.

The appearance of ggplot objects can be improved using themes in ggplot2 package and other theme packages such as ggtheme and bbplot.

We will consider two scenarios. In the first one, our purpose is just to illustrate the frequency of each diamond cut type. On the other hand, we will emphasize the number of very good cut diamonds in the second one. Scenario 1

ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+theme_bw()

library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()

You can also use theme() function for customization.

library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
  theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10))

In an ideal bar plot, the bar starts with the highest one.

library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
  theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10))

When you use labels on the top of the bar, you do not need to use y axis.

library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
  theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10),axis.text.y = element_blank())

Then, put the legend on the top left of the plot.

library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
  labs(title="The Bar Plot of Cut",y="Freq",x="Level",subtitle ="diamonds data set is used.")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
  theme(plot.title = element_text(size=18),plot.subtitle = element_text(size=10),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=9),axis.text.y = element_blank(),legend.position='top', 
        legend.justification='left',
        legend.direction='horizontal')

If our purpose is to emphasize the very good diamond cuts, we will use Gestalt Principle.

library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
  labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkgrey","darkgrey","darkred","darkgrey","darkgrey"))+geom_text(aes(label=Freq),hjust=-0.25,fontface="bold")+bbc_style()+coord_flip()+
  theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10),axis.text.x = element_blank(),legend.position='none')

You can see more details from my lecture note for data visualization.

Drawing a Map in R

Mapping with Leaflet

Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It provides features like Interactive panning/zooming, Map tiles, Markers, Polygons, Lines, Popups, GeoJSON, creating maps right from the R console or RStudio. It also allows you to render spatial objects from the sp or sf packages, or data frames with latitude/longitude columns using map bounds and mouse events to drive Shiny logic, and display maps in non-spherical Mercator projections.(Leaflet Introduction)

Please install leaflet package to use all function including by it.

install.packages("leaflet")
library(leaflet)

Basic Usage

You create a Leaflet map with these basic steps:

  1. Create a map widget by calling it leaflet().
  2. Add layers to the map by using layer functions (e.g. addTiles, addMarkers, addPolygons) to modify the map widget.
  3. Print the map widget to display it. Here is a basic example:
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.1.3
leaflet()
leaflet()%>%addTiles() # Add default OpenStreetMap map tiles
leaflet()%>%addTiles() %>%addMarkers(lng=174.768, lat=-36.852, popup="The birthplace of R")
leaflet()%>%setView(lng=174.768, lat=-36.852, zoom=20)%>%addTiles() 
#setwiew sets the center of the map view and the zoom level.

Adding Circle to your map

Let’s consider the data taken from Istanbul Municipality Open Data portal. The data shows location of the fire station in the city.

itfaiye <- openxlsx::read.xlsx("https://github.com/ozancanozdemir/ozancanozdemir.github.io/raw/master/istanbul_iftaiye.xlsx")
sep_val <-as.numeric(unlist(strsplit(itfaiye$Koordinat,"[,]")))
m = matrix(NA,nrow = length(sep_val),ncol = 2)
for(i in seq(1,length(sep_val),2)){
  m[i,1] <-sep_val[i]
  m[i,2] <-sep_val[i+1]
}
info<-data.frame(na.omit(m))
colnames(info) <- c("lat","long")
itfaiye_bilgi<-cbind(itfaiye[,1:2],info)
head(itfaiye_bilgi)
##                                         İstasyon.Adı Bulunduğu.İlçe      lat
## 1                           Adalar İtfaiye İstasyonu         ADALAR 40.87173
## 2        Akpınar Mahallesi Gönüllü İtfaiye İstasyonu     EYÜPSULTAN 41.27822
## 3                      Akşemsettin İtfaiye İstasyonu  GAZİOSMANPAŞA 41.09198
## 4        Alacalı Mahallesi Gönüllü İtfaiye İstasyonu           ŞİLE 41.18316
## 5                        Alibeyköy İtfaiye İstasyonu     EYÜPSULTAN 41.07984
## 6 Anadolu Kavağı Mahallesi Gönüllü İtfaiye İstasyonu         BEYKOZ 41.17439
##       long
## 1 29.13763
## 2 28.80994
## 3 28.91740
## 4 29.45651
## 5 28.93722
## 6 29.08878
# add some circles to a map
leaflet(itfaiye_bilgi) %>% addCircles()
## Assuming "long" and "lat" are longitude and latitude, respectively

This map is not meaningful without providing tiles.

leaflet(itfaiye_bilgi)%>%addTiles()%>%addCircles(lng = ~itfaiye_bilgi$long, lat = ~itfaiye_bilgi$lat)

You can change your tiles provider.

leaflet(itfaiye_bilgi)%>%addProviderTiles("Esri")%>%addCircles(lng = ~itfaiye_bilgi$long, lat = ~itfaiye_bilgi$lat)

You can look at my tutorials if you are interested in drawing a map with ggplot2 and leaflet in R.

Drawing Turkey map with ggplot

Drawing Izmir map with leaflet