11  Data visualization with ggplot2

In this textbook, the ggplot2 package is the preferred tool for creating plots. With this tool and its extensions, R users can produce elegant, sophisticated, and professional-looking types of graphs that communicate results efficiently to the desired audience.

When we have finished this chapter, we should be able to:

Learning objectives
  • Use ggplot2 to generate publication-ready graphics.
  • Understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring by groups.
  • Understand how to save plots in a variety of formats.
  • Find extensions for ggplot2 to produce custom graphics.

 

11.1 Introduction to ggplot2

The main idea of ggplot2 is that any plot can be made up of the following principal components:

  • data and coordinate system
  • geometric objects such as points, bars, lines.
  • aesthetic mappings that describe how variables are mapped to visual properties or aesthetics (e.g., color, size, shape) of the graph.
  • themes that style all the visual elements which are not part of data.

The key to understanding ggplot2 is thinking about a figure in multiple layers.

Figure 11.1: The ggplot figure is built layer by layer by adding new graphical elements.

11.2 Covid-19 data

In this Chapter, we will explore graphically the association between a country’s wealth and COVID-19 cases. However, there more variables that may be associated to both wealth and COVID-19 cases such as testing rate. For example, wealthier countries may have a national program to distribute tests for the virus, provide advice on how to apply a self-test and report the results to a national organization. Without the resources of wealthy countries to buy and distribute tests, a lack of reported cases in developing countries could indicate a scarcity of testing. In this case, using diagrams to depict multivariable associations may be helpful.

library(readr)
covid_data <- read_csv(here("data", "covid_data.csv"))

Let’s have a look at the types of variables:

glimpse(covid_data)
Rows: 132,236
Columns: 12
$ iso3c           <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW"…
$ country         <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", …
$ date            <chr> "3/13/2020", "3/14/2020", "3/15/2020", "3/16/2020", "3…
$ confirmed       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ deaths          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ total_tests     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ region          <chr> "Latin America & Caribbean", "Latin America & Caribbea…
$ income          <chr> "High income", "High income", "High income", "High inc…
$ population      <dbl> 106766, 106766, 106766, 106766, 106766, 106766, 106766…
$ pop_density     <dbl> 593.1, 593.1, 593.1, 593.1, 593.1, 593.1, 593.1, 593.1…
$ life_expectancy <dbl> 76.3, 76.3, 76.3, 76.3, 76.3, 76.3, 76.3, 76.3, 76.3, …
$ gdp_capita      <dbl> 26631.5, 26631.5, 26631.5, 26631.5, 26631.5, 26631.5, …

The data frame contains 132236 rows and 12 variables that are described as follows:

  • iso3c: ISO3c country code as defined by ISO 3166-1 alpha-3

  • country: Country name

  • date: Calendar date

  • confirmed: Confirmed Covid-19 cases as reported by JHU CSSE1 (accumulated)

  • deaths: Covid-19-related deaths as reported by JHU CSSE (accumulated)

  • total_tests: Accumulated test counts as reported by Our World in Data

  • region: Country region as classified by the World Bank (time-stable): East Asia & Pacific, Europe & Central Asia, Latin America & Caribbean, Middle East & North Africa, North America, South Asia, Sub-Saharan Africa.

  • income: Country income group as classified by the World Bank (time-stable)

  • population: Country population as reported by the World Bank (original identifier ‘SP.POP.TOTL’, time-stable)

  • pop_density: Country population density as reported by the World Bank (original identifier ‘EN.POP.DNST’, time-stable)

  • life_expectancy Average life expectancy at birth of country citizens in years as reported by the World Bank (original identifier ‘SP.DYN.LE00.IN’, time-stable)

  • gdp_capita: Country gross domestic product (GDP) per capita, measured in 2010 US-\(\$\) as reported by the World Bank (original identifier ‘NY.GDP.PCAP.KD’, time-stable)

1 JHU CSSE: Johns Hopkins Coronavirus Resource Center

11.2.1 Data preparation for the plots

The data cover a period from 1/1/2020 to 9/9/2021. Suppose we are interested in investigating countries with population more than 1 million up to June, 12 2021 and we want also to calculate the cases per 100000 inhabitants and tests per capita:

dat <- covid_data |> 
  mutate(date = mdy(date),
         region = factor(region),
         income = factor(income),
         cases_per_100k = round(confirmed / population * 100000, digits = 1),
         tests_per_capita = round(total_tests / population, digits = 2)) |>
  filter(date == "2021-06-12", population > 1000000)

11.3 Basic steps for creating a ggplot graph

The ggplot2 is contained within the tidyverse package, so it is installed automatically when we install the tidyverse “meta” package. Furthermore, it is one of the core packages of the tidyverse that are loaded at R session when we run the command library(tidyverse).

11.3.1 Step 0: Start with a default blank ggplot object

ggplot()
Figure 11.2: A default blank ggplot object.

11.3.2 Step 1: Add the dataset and define the x and y

The ggplot() function has two basic named arguments. The first argument, data, specifies the dataset that we are going to use for the plot. The second argument, mapping, defines which variables are mapped to x and y aesthetics of position.

Let’s provide the dataset “dat” to the first argument of ggplot() and mapp the variable gdp_capita to the x position and the variable cases_per_100K to the y position inside the aes():

ggplot(data = dat, mapping = aes(x = gdp_capita, y = cases_per_100k))
Figure 11.3: Variables are mapped to x and y axes on a canvas with grid lines.

Note that we don’t usually have to spell out the names of the arguments data and mapping. Therefore, the following command is equivalent:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k))

As we can observe, only a grey canvas is created when running the code. This is because we also need to provide a geometry!

11.3.3 Step 2: Add geometry

Geoms are the geometric objects that make up ggplot2 visualizations. Each geom is called with a function that begins with “geom_” and ends with the name of the geometric object (e.g., point, bar, line) (Table 11.1).

Table 11.1: Common geometries used in ggplot graphs.
geom_ Example
geom_point()
geom_line()
geom_text()
geom_label()
geom_histogram()
geom_density()
geom_bar()
geom_boxplot()

Let’s select which style we want to use. We are interested in exploring the association between two numeric variables with a scatter plot (see also Chapter 28). So, we will add points using a geom layer called geom_point. In this case, geom_point() will inherit the x and y aesthetics from the ggplot() function:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point()

11.3.4 Step 3: Add aesthetics to geometry

Each “geom” has a number of aesthetics that define its visual properties. We can map our data to anything that our “geom” supports. For example, geom_point() understands the following aesthetics (required aesthetics are in bold): x, y, alpha, color, fill, group, shape, size, stroke.

So, if we want to add more variables to a plot, we can use aesthetics like color, shape, and size.

A. color aesthetics

Color is an important characteristic of graphs. If we decide to use color, we should consider which colors to use and where to use them. Color palettes (or colormaps) are classified into three main categories in ggplot2:

  1. Sequential (continuous or discrete) palette that is used for quantitative data. One variation of a unique color varying from dark to light (Figure 11.4).
Figure 11.4: Example of sequential color scales.
  1. Diverging palette that creates a gradient between three different colors, allowing us to easily identify low, middle, and high values within our data (Figure 11.5).
Figure 11.5: Example diverging color scales.
  1. Qualitative palette that is used mainly for discrete or categorical data. This palette is consisted from a discrete set of distinct colors with no implied order (Figure 11.6).
Figure 11.6: Example qualitative color scales.

Now, suppose we want to group the points according to the categorical variable region using different colors, as follows:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region))        

Here, we added inside the aes() the color argument. The data of the categorical variable region mapped to color aesthetic of geom_point. Obviously, the qualitative palette of colors was applied automatically by ggplot2. Additionally, ggplot automatically created a legend to show the correspondence between the regions and colors.

It is crucial to understand the difference between including the color2 argument inside or outside of the aes() function. For example, let’s run the following code:

2 ggplot2 understands both color and colour as well as the short version col.

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(color = "deeppink")

In this case, we set the color argument to a fixed value (“deeppink”) in the geom function instead of using aes(), so ggplot changed the color of the points “globally”.

In R, colors can be specified in quotes either by name (e.g., "deeppink") or as a hexadecimal color (hex code) that starts with a # (e.g., "#FF1493"). In the following Table 11.2 we present an example of a color palette:

Table 11.2: Examples of name and hex color code of different colors.
Name Hex code
\(\color{#FF7F50}{coral}\) #FF7F50
\(\color{#66CDAA}{aquamarine3}\) #66CDAA
\(\color{#76EE00}{chartreuse2}\) #76EE00
\(\color{#FFFF00}{yellow}\) #FFFF00
\(\color{#0000FF}{blue}\) #0000FF
\(\color{#A52A2A}{brown}\) #A52A2A
\(\color{#FF1493}{deeppink}\) #FF1493
\(\color{#000000}{black}\) #000000

The main advantage of the Hex color system is that it is very compact and we can pick out any color we desire.

 

B. shape aesthetics

Alternatively, we can group the points according to the region variable using different point shapes, as follows:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(shape = region))

We observe that ggplot2 by default allows only six different point shapes to be displayed. However, we will see how to change this using appropriate scales.

The different points shapes symbols commonly used in R are shown in the Figure 11.7 below:

Figure 11.7: Points shapes symbols and their codes commonly used in R.
Important
  • The point shapes from 0 to 14 have an outline (we use color to change the color).
  • The point shapes from 15 to 20 are solid shapes (we use color to change color).
  • Point shapes options from 21 to 25 allow us to use both the outline and the inside color, so they can be controlled separately (we use color to change the color of the outline and fill to change the inside color).

The default geom_point() uses the shape symbol 19 that is a solid circle. If we decide to use a shape symbol between 21 and 25, we can set color and fill aesthetics to each point. The following examples help us understand how to set the color and fill arguments for the shape symbol 24 that is a triangle:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(shape = 24)

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(shape = 24, color = "red")

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(shape = 24, color = "red", fill = "yellow")

 

C. size aesthetics

Next, we can add a third variable tests_per_capita using the size aesthetic:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita))

11.4 Add a new geom (text information for each point)

Let’s add the name of the country for each data point. The geom_text_repel() function from the add-on package ggrepel allow us to add text labels for each data point that repel away from each other to avoid overlapping of the text.

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita)) +
  geom_text_repel(aes(label = country), seed = 123)

11.5 Change the default properties of the plot with scales

11.5.1 Change the scale of the y axis

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10()

Continuous variable tests_per_capital mapped to size and categorical variable region mapped to color.

11.5.2 Change the default point shapes

# default
ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, shape= region)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10()

# modified
ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, shape= region)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10() +
  scale_shape_manual(values = c(4, 9, 2, 1, 0, 19, 8))
(a) Default
(b) Modified
Figure 11.8: Change the default point shapes

However, when a variable is mapped to size (here, test_per_capital), it’s a good idea to not map a variable to shape (here, region). This is because it is difficult to compare the sizes of different shapes (e.g., a size 4 square with a size 4 triangle).

11.5.3 Change the default colors

# default
ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, color = region)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10()

# modified
ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, color = region)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10() +
  scale_color_jco()
(a) Default
(b) Modified
Figure 11.9: Change the default colors

11.6 Modify axis, legend, and plot labels with labs

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, color = region)) +
  geom_text_repel(aes(label = country), seed = 123) +
  scale_y_log10() +
  scale_color_jco() +
  labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data from {tidycovid19} package",
       tag = 'A')

11.7 Modify theme elements with theme

Theme elements are the non-data elements of a graph such as:

  • line
  • text
  • title, subtitle, caption
  • grid (major, minor)
  • background
  • ticks
Syntax of the theme() function

The default display of theme elements can be override by using the theme() function which has two parts in structure, an element name and an element function in a form similar to:

theme(element name = element_function(arguments))

Element name

We are able to modify the appearance of theme elements in plot, panel, axis, and legend compartments of a simple ggplot graph (Figure 11.10).

Figure 11.10: Main compartments in a ggplot graph.

The theme system enables us to specify the display of elements for a particular compartment of the graph by creating element names in the general form of compartment.element. For example, we can specify the title in plot, axis, and legend with the element names plot.title, axis.title, and legend.title, respectfully.

Element function

Depending on the type of element that we want to modify, there are three pertinent functions that start with element_:

  • element_line(): specifies the display of lines
  • element_text(): specifies the display of text elements
  • element_rect(): specifies the display of borders and backgrounds

NOTE: (a) There is also the element_blank() that suppresses the appearance of elements we’re not interested in. (b) Other features of the graph, such as the position of legend, are not specified within an element_function.

 

  Example

Suppose we want to change the color (from black to red) and the width of the x-axis line of our graph. The syntax would be similar to:

Figure 11.11: Main compartments in a ggplot graph.

 

Next, we present some examples for each element function to help understanding of the previous concepts.

11.7.1 A. element_line()

With element_line(), we can customize all the lines of the plot that are not part of data. The Figure 11.12 shows the basic line elements (axes, ticks, and grids) that we can control in a simple ggplot graph.

Figure 11.12: Anatomy of Line Elements in ggplot2.

1. The X and Y axes lines:

  • both X and Y axes: axis.line = element_line()
  • only X axis:axis.line.x = element_line()
  • only Y axis:axis.line.y = element_line()
Figure 11.13: dfsfdsfdsf.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point() +
  theme(axis.line.x = element_line(color = "red", linewidth = 1),
        axis.line.y = element_line(color = "green", linewidth = 1, linetype = 5))
Figure 11.14: fgdfgdfg.

As we can observe, the default line type is a solid line (red x-axis line; linetype = 1). To change the line type, for example from solid to dashed line, we used the linetype = 3 option (green y-axis line) (Figure 11.14).

2. The ticks on X and Y axes:

  • both X and Y ticks: axis.ticks = element_line()
  • only X ticks:axis.ticks.x = element_line()
  • only Y ticks:axis.ticks.y = element_line()

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point() +
  theme(axis.ticks.x = element_line(color = "red", linewidth = 5),
        axis.ticks.y = element_line(color = "green", linewidth = 5))

3. The major and minor grid lines of the panel:

Major grid

  • major grid lines (vertical and horizontal): panel.grid.major = element_line()
  • major vertical grid lines (cross X): panel.grid.major.x = element_line()
  • major horizontal grid lines (cross Y): panel.grid.major.y = element_line()

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point() +
  theme(panel.grid.major.x = element_line(color = "red", linewidth = 0.55),
        panel.grid.major.y = element_line(color = "green", linewidth = 0.55))

Minor grid

  • minor grid lines (vertical and horizontal): panel.grid.minor = element_line()
  • minor vertical grid lines (cross X): panel.grid.minor.x = element_line()
  • minor horizontal grid lines (cross Y): panel.grid.minor.y = element_line()

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point() +
  theme(panel.grid.minor.x = element_line(color = "red", linewidth = 0.35, linetype = 2),
        panel.grid.minor.y = element_line(color = "green", linewidth = 0.35, linetype = 2))

We can also modify the display of both major and minor grids of the ggplot.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point() +
  theme(panel.grid.major = element_line(color = "blue", linewidth = 0.55),
        panel.grid.minor = element_line(color = "deeppink", linewidth = 0.35, linetype = 2))

11.7.2 B. element_text()

Figure 11.15: Anatomy of Text Elements in ggplot2.

1. The title of X and Y axes:

  • X axis:axis.title.x = element_text()
  • Y axis:axis.title.y = element_text()

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(axis.title.x = element_text(color = "red", size = 18, angle = 10),
        axis.title.y = element_text(color = "green", size = 10))
Figure 11.16: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(axis.text.x = element_text(color = "red", size = 16, face="bold", angle = 90),
        axis.text.y = element_text(color = "green", size = 10))
Figure 11.17: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(plot.title = element_text(color = "deeppink"),
        plot.subtitle = element_text(color = "blue"),
        plot.caption = element_text(color = "orange", size = 8),
        plot.tag = element_text(color = "green", size = 20))
Figure 11.18: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(legend.title = element_text(color = "red", size = 16),
        legend.text = element_text(color = "green", size = 10))
Figure 11.19: fgdfgdfg.

11.7.3 C. element_rect()

Figure 11.20: Anatomy of Text Elements in ggplot2.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(plot.background = element_rect(fill = "deeppink"))
Figure 11.21: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(panel.background = element_rect(fill = "deeppink"))
Figure 11.22: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(legend.background = element_rect(fill = "deeppink"))
Figure 11.23: fgdfgdfg.

  Example

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(color = region)) +
   labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per \ncapita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data",
       tag = 'A') +
  theme(legend.key = element_rect(fill = "deeppink"))
Figure 11.24: fgdfgdfg.

11.8 Themes

11.8.1 Default ggplot theme

The default theme of ggplot is the theme_gray:

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, color = region)) +
  geom_text_repel(aes(label = country), 
                  min.segment.length = 0, seed = 42, 
                  box.padding = 0.1, color = "black", size = 5) +
  scale_y_log10() +
  labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants (log scale)",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per capita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data Jonhs Hopkins University")
Figure 11.25: fgdfgdfg.

11.8.2 Predefined theme

We can customize the theme elements of our graph by applying a predefined theme rather than changing everything by hand. There are ready to use themes from the ggplot2 and ggthemes packages.

Examples of in-build theme from “ggplot2”:

  • theme_bw() – dark on light ggplot2 theme
  • theme_dark() – lines on a dark background instead of light
  • theme_minimal() – no background annotations, minimal feel.
  • theme_classic() – theme with no grid lines.
  • theme_void() – empty theme with no elements

Examples of themes from “ggthemes”:

  • theme_economist() – theme approximates the style of “The Economist”.
  • theme_excel_new() – theme based on current Excel plot defaults.
  • theme_fivethirtyeight() – theme inspired by FiveThirtyEight plots.
  • theme_gdocs() – theme based on Google Docs Chart defaults.
  • theme_hc() – theme based on Highcharts plots.

In our case, we will use the theme_fivethirtyeight() from the “ggthemes” to understand how we can handle default themes.

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_point(aes(size = tests_per_capita, color = region)) +
  geom_text_repel(aes(label = country), 
                  min.segment.length = 0, seed = 42, 
                  box.padding = 0.1, color = "black", size = 5) +
  scale_y_log10() +
  labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants (log scale)",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per capita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data Jonhs Hopkins University") +
  theme_fivethirtyeight()
Figure 11.26: fgdfgdfg.

If we take a look at the code for theme_fivethirtyeight (we just run theme_fivethirtyeight in our console to see the code), we’ll observe that axis.title is set to element_blank(). So this theme has no axis titles by default. We’ll need to change this if we want to set X and Y axis titles. You can turn them back on with theme(axis.title = element_text()) you need to change some of the theme settings that are the defaults in theme_fivethirtyeight.

11.8.3 Publication Quality Figure

More customization of plot

ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_smooth(color = "red") +
  geom_point(aes(size = tests_per_capita, color = region), shape = 1, stroke = 2) +
  geom_text_repel(aes(label = country), 
                  min.segment.length = 0, seed = 42, 
                  box.padding = 0.1, color = "black", size = 5) +
  scale_y_log10() +
  scale_color_jco() +
  labs(x = "GDP per capita ($)",
       y = "Cases per 100,000 inhabitants (log scale)",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per capita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data Jonhs Hopkins University") +
  theme_fivethirtyeight(base_size = 16) +
  theme(plot.title = element_text(size = 17),
        axis.title = element_text(face = "bold"),
        legend.title = element_text(face = "bold")) +
  guides(color = guide_legend(override.aes = list(size = 4)))
Figure 11.27: fgdfgdfg.
ggplot(dat, aes(x = gdp_capita, y = cases_per_100k)) +
  geom_smooth(method = lm, color = "red") +
  geom_point(aes(size = tests_per_capita, color = region), shape = 1, stroke = 2) +
  geom_text_repel(aes(label = country), 
                  min.segment.length = 0, seed = 42, 
                  box.padding = 0.1, color = "black", size = 5) +
  scale_x_log10() +
  scale_y_log10() +
  scale_color_jco() +
  labs(x = "GDP per capita ($) (log scale)",
       y = "Cases per 100,000 inhabitants (log scale)",
       color = "Region",
       size = "Proportion tested",
       title = "Confirmed cases per 100,000 inhabitants, GDP per capita, and COVID-19 testing rate by country", 
       subtitle = "May 20, 2021", 
       caption = "Source Data: Covid-19 related data Jonhs Hopkins University") +
  theme_fivethirtyeight(base_size = 16) +
  theme(plot.title = element_text(size = 17),
        axis.title = element_text(face = "bold"),
        legend.title = element_text(face = "bold")) +
  guides(color = guide_legend(override.aes = list(size = 4)))
Figure 11.28: fgdfgdfg.