ggplot2
Data visualization is an essential tool in the realm of information analysis and presentation. It involves the graphical representation of data and statistics to uncover meaningful insights and communicate complex ideas in a clear and concise manner. As the volume and complexity of data continue to grow exponentially, data visualization plays a crucial role in transforming raw numbers into visual narratives that are easily digestible and comprehensible. By leveraging charts, graphs, maps, and interactive visual elements, data visualization allows individuals and organizations to explore patterns, identify trends, detect anomalies, and make data-driven decisions. Its importance lies in its ability to facilitate understanding, reveal patterns and correlations, highlight outliers, and support storytelling, enabling stakeholders to derive actionable insights from vast amounts of information.
Data visualization has a rich history that dates back centuries, evolving alongside humanity’s quest for understanding and representing information visually. Early examples of data visualization can be traced back to ancient civilizations, where visual symbols and hieroglyphics were used to convey important information and record historical events. In the 18th and 19th centuries, pioneers like William Playfair and Florence Nightingale revolutionized the field by introducing graphical methods to illustrate statistical data. Playfair’s invention of line, bar, and pie charts provided a visual framework for comparing quantities, while Nightingale’s famous coxcomb diagram effectively communicated the impact of preventable diseases on soldiers during the Crimean War. The advent of computer technology in the 20th century brought about a new era of data visualization, with advancements in software and hardware enabling the creation of interactive and dynamic visual representations. Today, with the explosion of big data and the need for data-driven decision-making, data visualization has become an indispensable tool in various domains, including business, science, journalism, and public policy. Its historical significance lies in its ability to transform complex data into intuitive visuals, empowering individuals and organizations to gain insights, communicate effectively, and make informed decisions in an increasingly data-driven world.
ggplot2
Packageggplot2, developed by H.Wickham, is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
To get more information about this special packages, you can visit http://ggplot2.org/
Also, there are many videos, books and pages related to this packages. One of them is my book. :)
An Introduction to ggplot2, Ozancan Ozdemir, 2022
The ggplot2, shortly ggplot, implies “Grammar of Graphics” which believes in the principle that a plot can be split into the following basic parts -
Plot = data + Aesthetics + Geometry
data refers to information you want to visualize.
Aesthetics includes the specific variables that
you use in drawing. i.e, x and y variables. It is also used to tell R
how data are displayed in a plot, e.g. color, size and shape of points,
transparency etc. All aesthetics for a plot are specified in the
aes()
function call
Geometry refers to the type of graphics (bar chart, histogram, box plot, line plot, density plot, dot plot etc.) To see the list of geometric functions, please visit https://ggplot2.tidyverse.org/reference/
Here, you can see some functions from the list.
geom_point() = Scatter Plot
geom_bar() = Bar Plot
geom_line() = Line Plot
geom_histogram() = Histogram
geom_boxplot() = Box Plot
geom_density() = Density Plot
e.g
library(ggplot2)
ggplot(data,aes(x=x,y=y))+geom_point()
In addition to those functions, we use the following arguments or functions for our plots.
Functions
geom_text() = Add label or number on your plot
coord_flip() = Rotate your plot
theme() = Arrange the theme of your plot, e.g size of axis names etc.
facet_wrap() & facet_grid() = Plot for different subject of your data
scale_color_manual() = Change the color of your plot manually.
labs() = Set title, axis name etc.
Arguments
col = Change color of your plot by third variable (in aesthetics part)
group= Divide your data into third group (in aesthetics part)
color = Change frame of your box / bar or bin (in geom part)
fill = Fill your box / bar or bin (in geom part)
Important Note
ggplot2
package works with data.frame
and tibble objects.
Why ggplot2 is better?
Excellent themes can be created with a single command.
Its colors are nicer and more pretty than the usual graphics.
Easy to visualize data with multiple variables.
Provides a platform to create simple graphs providing plethora of information.
(Ozdemir, O, Lab Notes, 2019)
In this tutorial, we will use diamond
dataset, which is
built-in dataset in ggplot2
to illustrate the plot
types.
library(ggplot2)
knitr::kable(head(diamonds))
carat | cut | color | clarity | depth | table | price | x | y | z |
---|---|---|---|---|---|---|---|---|---|
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
Check the class of the variables in the data.
dplyr::glimpse(diamonds)
## Rows: 53,940
## Columns: 10
## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.~
## $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver~
## $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,~
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, ~
## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64~
## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58~
## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34~
## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.~
## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.~
## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.~
Question: What is the frequency distribution of cut?
Bar Plot
As seen above, cut
is a categorical variable. The most
appropriate and frequently used graphical tool for one categorical
variable is bar plot.
There are two ways to create a bar plot in ggplot2
.
The first one is to calculate a frequency table manually.
t = table(diamonds$cut)
class(t)
## [1] "table"
t
##
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
Then, convert your table object into the data frame or tibble.
df = data.frame(t)
df
## Var1 Freq
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
library(ggplot2)
ggplot(df,aes(x=Var1,y=Freq))+geom_bar(stat="identity")
#stat="identity" must argument
labs()
function helps you adding title, axis names to
your plot.
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
labs(title="Bar Plot of CUT",y="Freq",x="Level")
geom_text()
helps you adding numbers or text to your
plot. You can also use annotate
. See.
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),fontface="bold")
#label a must argument
To adjust the position of texts, you can use the arguments
vjust
or hjust
.
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity")+
labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")
#label a must argument
In an ideal bar graph, values should be arranged in descending order, from largest to smallest.
Ordering x axis
Use reorder()
argument for x axis.
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity")+
labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")
If you have a raw data to be visualized, you can use
stat = "count"
argument in place of
stat="identity
.
ggplot(diamonds,aes(x=cut, fill = cut))+geom_bar()+
labs(title="Bar Plot of CUT",y="Freq",x="Level")+geom_text(stat="count",aes(label =..count..),vjust=-0.25,fontface="bold")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(count)` instead.
Question: What is the distribution of carat?
Histogram/Density Plot/Box Plot
As seen from the output above, we are interested in the distribution
of carat
, a numerical variable. Histogram
or density plot are the best tools to explore the shape
of the numerical variable visually. Alternatively, you can also use
QQ-Plot or Box Plot
class(diamonds$carat)
## [1] "numeric"
geom_histogram()
is the necessary geometry function for
histogram.
ggplot(diamonds,aes(x=carat))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(diamonds,aes(x=carat))+geom_histogram(fill="darkred")+labs(title="Histogram of Carat",y="Count",x="Carat")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
For density plot, you can use geom_density()
.
ggplot(diamonds,aes(x=carat))+geom_density()+labs(title="Density Plot of Carat",y="Prob",x="Carat")
It is observed that carat has right skewed multimodal distribution.
ggplot(diamonds,aes(x=carat))+geom_histogram(fill="darkred",aes(y=stat(density)))+labs(title="Histogram and Density Plot of Carat",y="Count",x="Carat")+geom_density(col="orange")
## Warning: `stat(density)` was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(density)` instead.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#aes(y=stat(density))is must
Combining multiple plots into single window
grid.arrange()
command from gridExtra
package combines your ggplot2
objects into a single window.
To this end, you have to assign an object name to each plot.
g1<-ggplot(diamonds,aes(x=carat))+geom_boxplot()+labs(title="g1")
g2<-ggplot(diamonds,aes(x=factor(1),y=carat))+geom_boxplot()+labs(title="g2") #factor(1) a must
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.1.2
grid.arrange(g1,g2,ncol=2)
A box plot can be used to observe the distribution of a continuous variable, but it is not particularly successful in identifying multimodal distributions. However, it is effective in detecting extreme outliers. In fact, to create a box plot in ggplot2, you typically need two variables. However, if you want to create a box plot for a single variable, you can assign a factor object to the x-axis.
ggplot(diamonds,aes(x=factor(1),y=carat))+geom_boxplot(fill="darkred")+labs(title="Box Plot of Carat")
Question: What is the association between carat and price?
Scatter Plot
When you are interested in the association between two continuous
variables, scatter plot is useful.
geom_point()
is the geometric function.
ggplot(diamonds,aes(x=carat,y=price))+geom_point()+labs(title = "The relationship between Carat and Price")
ggplot(diamonds,aes(x=carat,y=price))+geom_point(col="darkred")+labs(title = "The relationship between Carat and Price")
Adding trend line
If you want to add a trend line representing the underlying relationship between two continuous variables of interest, you can use the geom_smooth command. This command is useful in ggplot2 for adding a trend line to your plot. By default, geom_smooth uses linear regression, but you can use the method argument within this function to add the trend line using other modeling techniques.
For example, to add a trend line using a different modeling technique such as loess (local regression), you can specify method = “loess” within the geom_smooth function. This will fit a smooth curve to the data instead of a straight line. Similarly, you can explore other available methods, such as generalized additive models (GAMs) or polynomial regression, depending on the nature of your data and the relationship you want to capture.
ggplot(diamonds,aes(x=carat,y=price))+geom_point(col="darkred")+labs(title = "The relationship between Carat and Price")+geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Price and carat have strong positive relationship but some outliers exist that violates the relationship.
More and More
Question: What is the association between carat and price by cut type?
Facet in ggplot
Faceting is a technique that divides the chart window into multiple smaller sections, forming a grid-like layout where each section displays a similar chart. Typically, each section represents a specific group or subset of the dataset, showcasing the same type of graph. This approach, often referred to as small multiples, provides a way to compare and analyze different segments of the data in a concise and structured manner.
ggplot(diamonds,aes(x=carat,y=price))+geom_point()+facet_wrap(.~cut)
# .~ is must
Question: What is the association between carat and price by depth and cut?
Here, we would like to represent three numerical variables and one categorical variable visually at the same time.
Bubble Plot : A bubble plot is a scatter plot with a third numeric variable mapped to circle size. It also enable us to include one categorical variable as a fourth one.
In order to explain the concept, I would like to use a sample of data. (Prevent the visual overplotting.)
set.seed(123)
s = diamonds[sample(1:53940,50),]
You can add your third variable, which is the numerical one, via
size
argument, and add your fourth variable, which is the
categorical one, via col
argument.
ggplot(s,aes(x=carat,y=price,size=depth,col=cut))+geom_point()+labs(title="Association between Carat and Price by Cut and Depth")
The plot shows that when depth and cut quality increases, we can expect high carat and price.
Question: How does depth distribute by clarity?
ggplot(diamonds,aes(x=clarity,y=depth))+geom_point()+labs(title="Relationship between Clarity and Depth")
Such plot suffers from overplotting which makes interpretation harder. It is a common problem seen in data set having large number of observations (for this data we have 53940 observations.)
There are some suggested solutions for this problem. To see the list click here, and one of them is jittering.
Jitter Plot: Random noise are added to the location of each point to remove overplotting.
ggplot(diamonds,aes(x=clarity,y=depth))+geom_point(position="jitter")+labs(title="Relationship between Clarity and Depth")
#position="jitter" is must
To see the effect of jittering, let us use the sample data set, using sample of data is one of the solution of overplotting.
ggplot(s,aes(x=clarity,y=depth))+geom_point()+labs(title="Relationship between Clarity and Depth for Sample Data Set")
ggplot(s,aes(x=clarity,y=depth))+geom_point(position="jitter")+labs(title="Relationship between Clarity and Depth for Sample Data Set")
The second example shows that jittering makes the polynomial relationship between depth and clarity more visible.
Question: How diamonds prices distribute over cut type?
Box Plot
In addition to showcasing a single variable, box plots are also useful for observing the variation of a continuous variable across different levels of a categorical variable. By creating box plots for each level of the categorical variable, we can visually compare the distributions, central tendencies, and spreads of the continuous variable within each category. This allows us to identify potential differences, outliers, and trends across the categories, providing valuable insights into the relationship between the continuous and categorical variables.
ggplot(diamonds, aes(x = cut, y = price, fill = cut)) +
geom_boxplot() +
labs(x = "Cut", y = "Price", title = "Price Distribution by Diamond Cut")
Violin Plot: Violin plots
are similar to box plots, except that they also show the kernel
probability density of the data at different values.
geom_violin()
is the geometric function for the violin
plot.
ggplot(diamonds,aes(x=cut,y=price))+geom_violin()
Don’t leave violin plot alone.
Combine it with box plot.
ggplot(diamonds,aes(x=cut,y=price,fill=cut))+geom_violin()+geom_boxplot(width=0.15)+labs(title="Distribution of Price by Cut")
#width sets the size of box plot
It is seen that price has right skewed distribution except good cut. Also, the price of ideal cut diamond is smaller than others on the average.
Extra
statsExpressions
package
This package helps you to print the output of the statistical test on the related plots. It can be applicable for many types of plot.
For other examples, please click here.
library(ggplot2)
library(ggforce)
## Warning: package 'ggforce' was built under R version 4.1.3
library(statsExpressions)
## Warning: package 'statsExpressions' was built under R version 4.1.3
# plot with subtitle
ggplot(diamonds,aes(x=cut,y=price)) +
geom_violin() +geom_boxplot(width=0.15)+
labs(
title = "Fisher's one-way ANOVA",
subtitle = oneway_anova(diamonds, cut, price, var.equal = TRUE)$expression[[1]]
)
Consider first research question and answer this with different visual
What is the frequency distribution of cut?
Lollipop Chart: Lollipop
plot is basically a barplot, where the bar is transformed in a line and
a dot. It shows the relationship between a numeric and a categorical
variable. A lollipop is built using geom_point()
for the
circle, and geom_segment()
for the stem.
t = table(diamonds$cut)
df = data.frame(t)
df
## Var1 Freq
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
ggplot(df,aes(x=Var1,y=Freq)) +geom_point() + geom_segment(aes(x=Var1, xend=Var1, y=0, yend=Freq))
ggplot(df,aes(x=Var1,y=Freq)) +geom_point(size=5, color="darkred", fill="yellow", alpha=0.7, shape=21, stroke=2) +geom_segment(aes(x=Var1, xend=Var1, y=0, yend=Freq))+labs(title="Lollipop Plot of Cut",x="Cut Types",y="Frequency")
Question: What is the change in the prices within each cut type?
Dumbell Plot: Dumbell plot, a.k.a Dumbell Chart, is great for displaying changes between two points in time, two conditions or differences between two groups.
Before drawing this plot, your data set should be ready for it.
Data Manipulation
min_price = aggregate(depth~cut,data=diamonds,min)
max_price = aggregate(depth~cut,data=diamonds,max)
dumbell_data=cbind(min_price,max_price)
dumbell_data
## cut depth cut depth
## 1 Fair 43.0 Fair 79.0
## 2 Good 54.3 Good 67.0
## 3 Very Good 56.8 Very Good 64.9
## 4 Premium 58.0 Premium 63.0
## 5 Ideal 43.0 Ideal 66.7
Not enough..
dumbell_data = dumbell_data[,-3]
colnames(dumbell_data) = c("cut","min","max")
dumbell_data
## cut min max
## 1 Fair 43.0 79.0
## 2 Good 54.3 67.0
## 3 Very Good 56.8 64.9
## 4 Premium 58.0 63.0
## 5 Ideal 43.0 66.7
library(ggalt)
## Warning: package 'ggalt' was built under R version 4.1.2
## Registered S3 methods overwritten by 'ggalt':
## method from
## grid.draw.absoluteGrob ggplot2
## grobHeight.absoluteGrob ggplot2
## grobWidth.absoluteGrob ggplot2
## grobX.absoluteGrob ggplot2
## grobY.absoluteGrob ggplot2
ggplot(dumbell_data, aes(y=cut, x=min, xend=max)) +
geom_dumbbell(size=3, color="gray80",
colour_x = "gold1", colour_xend = "darkred",
dot_guide=TRUE, dot_guide_size=0.1)
## Warning: Using the `size` aesthetic with geom_segment was deprecated in ggplot2 3.4.0.
## i Please use the `linewidth` aesthetic instead.
It is seen that depth of fair diamonds have the highest variability. The premium diamonds have more consistent depth value.
Line Plot: : A line plot is a type of graph that displays data points connected by straight lines, typically used to visualize the trend or pattern of a continuous variable over a continuous range. It is effective in showing the relationship between two variables and can reveal trends, fluctuations, or changes in the data over time or across different categories.
Consider economics
data.
head(economics)
## # A tibble: 6 x 6
## date pce pop psavert uempmed unemploy
## <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1967-07-01 507. 198712 12.6 4.5 2944
## 2 1967-08-01 510. 198911 12.6 4.7 2945
## 3 1967-09-01 516. 199113 11.9 4.6 2958
## 4 1967-10-01 512. 199311 12.9 4.9 3143
## 5 1967-11-01 517. 199498 12.8 4.7 3066
## 6 1967-12-01 525. 199657 11.8 4.8 3018
We use geom_line()
geometric function to produce a line
plot. Note that we use two variables to draw a line plot.
# Create a line plot with the economics dataset
ggplot(data = economics, aes(x = date, y = unemploy)) +
geom_line() +
labs(x = "Year", y = "Number of Unemployed", title = "Unemployment Trend")
It is also possible to draw multiple lines on the same plot.
# Create a line plot with two lines using the economics dataset
ggplot(data = economics, aes(x = date)) +
geom_line(aes(y = unemploy, color = "Unemployed")) +
geom_line(aes(y = pop, color = "Population")) +
labs(x = "Year", y = "Count", title = "Unemployment and Population Trends")
ggplot2
package provides several functions that improves
the appearance of the plot, and thus we have a visual that has the ideal
graph properties.
The appearance of ggplot objects can be improved using themes in
ggplot2 package and other theme packages such as ggtheme
and bbplot
.
We will consider two scenarios. In the first one, our purpose is just to illustrate the frequency of each diamond cut type. On the other hand, we will emphasize the number of very good cut diamonds in the second one. Scenario 1
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+theme_bw()
library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()
You can also use theme()
function for customization.
library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10))
In an ideal bar plot, the bar starts with the highest one.
library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10))
When you use labels on the top of the bar, you do not need to use y axis.
library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10),axis.text.y = element_blank())
Then, put the legend on the top left of the plot.
library(bbplot)
ggplot(df,aes(x=reorder(Var1,-Freq),y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
labs(title="The Bar Plot of Cut",y="Freq",x="Level",subtitle ="diamonds data set is used.")+scale_fill_manual(values=c("darkred","gold1","maroon","steelblue","darkblue"))+geom_text(aes(label=Freq),vjust=-0.25,fontface="bold")+bbc_style()+
theme(plot.title = element_text(size=18),plot.subtitle = element_text(size=10),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=9),axis.text.y = element_blank(),legend.position='top',
legend.justification='left',
legend.direction='horizontal')
If our purpose is to emphasize the very good diamond cuts, we will use Gestalt Principle.
library(bbplot)
ggplot(df,aes(x=Var1,y=Freq,fill=Var1))+geom_bar(stat="identity",width=0.5)+ylim(0,25000)+
labs(title="Bar Plot of Cut",y="Freq",x="Level")+scale_fill_manual(values=c("darkgrey","darkgrey","darkred","darkgrey","darkgrey"))+geom_text(aes(label=Freq),hjust=-0.25,fontface="bold")+bbc_style()+coord_flip()+
theme(plot.title = element_text(size=18),axis.text = element_text(size=12,face="bold"),legend.text = element_text(size=10),axis.text.x = element_blank(),legend.position='none')
You can see more details from my lecture note for data visualization.
Mapping with Leaflet
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It provides features like Interactive panning/zooming, Map tiles, Markers, Polygons, Lines, Popups, GeoJSON, creating maps right from the R console or RStudio. It also allows you to render spatial objects from the sp or sf packages, or data frames with latitude/longitude columns using map bounds and mouse events to drive Shiny logic, and display maps in non-spherical Mercator projections.(Leaflet Introduction)
Please install leaflet package to use all function including by it.
install.packages("leaflet")
library(leaflet)
Basic Usage
You create a Leaflet map with these basic steps:
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.1.3
leaflet()
leaflet()%>%addTiles() # Add default OpenStreetMap map tiles
leaflet()%>%addTiles() %>%addMarkers(lng=174.768, lat=-36.852, popup="The birthplace of R")
leaflet()%>%setView(lng=174.768, lat=-36.852, zoom=20)%>%addTiles()
#setwiew sets the center of the map view and the zoom level.
Adding Circle to your map
Let’s consider the data taken from Istanbul Municipality Open Data portal. The data shows location of the fire station in the city.
itfaiye <- openxlsx::read.xlsx("https://github.com/ozancanozdemir/ozancanozdemir.github.io/raw/master/istanbul_iftaiye.xlsx")
sep_val <-as.numeric(unlist(strsplit(itfaiye$Koordinat,"[,]")))
m = matrix(NA,nrow = length(sep_val),ncol = 2)
for(i in seq(1,length(sep_val),2)){
m[i,1] <-sep_val[i]
m[i,2] <-sep_val[i+1]
}
info<-data.frame(na.omit(m))
colnames(info) <- c("lat","long")
itfaiye_bilgi<-cbind(itfaiye[,1:2],info)
head(itfaiye_bilgi)
## İstasyon.Adı Bulunduğu.İlçe lat
## 1 Adalar İtfaiye İstasyonu ADALAR 40.87173
## 2 Akpınar Mahallesi Gönüllü İtfaiye İstasyonu EYÜPSULTAN 41.27822
## 3 Akşemsettin İtfaiye İstasyonu GAZİOSMANPAŞA 41.09198
## 4 Alacalı Mahallesi Gönüllü İtfaiye İstasyonu ŞİLE 41.18316
## 5 Alibeyköy İtfaiye İstasyonu EYÜPSULTAN 41.07984
## 6 Anadolu Kavağı Mahallesi Gönüllü İtfaiye İstasyonu BEYKOZ 41.17439
## long
## 1 29.13763
## 2 28.80994
## 3 28.91740
## 4 29.45651
## 5 28.93722
## 6 29.08878
# add some circles to a map
leaflet(itfaiye_bilgi) %>% addCircles()
## Assuming "long" and "lat" are longitude and latitude, respectively
This map is not meaningful without providing tiles.
leaflet(itfaiye_bilgi)%>%addTiles()%>%addCircles(lng = ~itfaiye_bilgi$long, lat = ~itfaiye_bilgi$lat)
You can change your tiles provider.
leaflet(itfaiye_bilgi)%>%addProviderTiles("Esri")%>%addCircles(lng = ~itfaiye_bilgi$long, lat = ~itfaiye_bilgi$lat)
You can look at my tutorials if you are interested in drawing a map with ggplot2 and leaflet in R.