Let say that you receive a raw data.frame with
coordinate points per row, and I want to create a Thematic map with it,
how can I make it using {ggplot2}? Or that you use a
package to get spatial data but in a object class like
SpatVector or SpatialPolygonsDataFrame, how
can I use them with {ggplot2}?
Figure 1. (A) Map from a raw data.frame with
coordinate points. (B) Dot map with the same data converted to a
sf object.
We need sf class objects to create ggplot
maps! So, today we are going to learn how to convert
foreign spatial objects to sf, either polygon or
point data, to keep making ggplot2 maps!
Convert foreign objects to
sf class using the st_as_sf()
function from the {sf} package.
Convert foreign polygon data in
SpatVector class to sf.
Convert foreign point data in
data.frame class to sf.
This lesson requires the following packages:
if(!require('pacman')) install.packages('pacman')
pacman::p_load(malariaAtlas,
ggplot2,
cholera,
geodata,
dplyr,
here,
sf)
pacman::p_load_gh("afrimapr/afrilearndata",
"wmgeolab/rgeoboundaries"){geodata}An advantage of the {geodata} package is that it
downloads country borders data with a multiple column output,
providing the names of all the levels above the one requested.
For example, here is an example with the first three administrative levels in Bolivia:
bolivia_level3 <- gadm(country = "Bolivia", level = 3, path = tempdir())bolivia_level3 %>%
as_tibble() %>%
select(starts_with("NAME_")) #👈👈👈👈👈👈👈👈👈👈👈👈👈## # A tibble: 344 × 3
## NAME_1 NAME_2 NAME_3
## <chr> <chr> <chr>
## 1 Beni Cercado San Javier
## 2 Beni Cercado Trinidad
## 3 Beni General José Ballivián Puerto Menor de Rurrenabaque
## 4 Beni General José Ballivián Reyes
## 5 Beni General José Ballivián San Borja
## 6 Beni General José Ballivián Santa Rosa
## 7 Beni Iténez Baures
## 8 Beni Iténez Huacaraje
## 9 Beni Iténez Magdalena
## 10 Beni Mamoré Puerto Siles
## # ℹ 334 more rows
This is useful for situations where you want to filter()
only a specific subset of sub-divisions (e.g. level 3) within a region
(level 1) or department (level 2).
bolivia_level3 %>%
as_tibble() %>%
select(starts_with("NAME_")) %>%
filter(NAME_1 == "Santa Cruz") #👈👈👈👈👈👈👈👈👈👈👈👈👈👈## # A tibble: 56 × 3
## NAME_1 NAME_2 NAME_3
## <chr> <chr> <chr>
## 1 Santa Cruz Andrés Ibáñez Cotoca
## 2 Santa Cruz Andrés Ibáñez El Torno
## 3 Santa Cruz Andrés Ibáñez La Guardia
## 4 Santa Cruz Andrés Ibáñez Porongo
## 5 Santa Cruz Andrés Ibáñez Santa Cruz de la Sierra
## 6 Santa Cruz Angel Sandoval San Matías
## 7 Santa Cruz Chiquitos Pailón
## 8 Santa Cruz Chiquitos Roboré
## 9 Santa Cruz Chiquitos San José
## 10 Santa Cruz Cordillera Boyuibe
## # ℹ 46 more rows
However, I can not create ggplot maps with it because it
is a SpatVector class object:
gadm(country = "Bolivia", level = 3, path = tempdir())👉 class : SpatVector 👈
geometry : polygons
dimensions : 344, 16 (geometries, attributes)
extent : -69.64525, -57.45443, -22.90657, -9.670923
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
sfThe best way to solve this issue is to use the
st_as_sf() function from the {sf} package
# spatvector
bolivia_level3 %>% class()## [1] "SpatVector"
## attr(,"package")
## [1] "terra"
# sf
bolivia_level3 %>% st_as_sf() %>% class() #👈👈👈👈👈👈👈👈## [1] "sf" "data.frame"
This solution is useful for object classes like
SpatVector and SpatialPolygonsDataFrame.
Convert the country borders of Germany from SpatVector
to a sf class object using the st_as_sf()
function:
gadm(country = "Germany", level = 2, path = tempdir()) %>%
...........Now with sf objects, we can create a ggplot like
this:
ggplot() +
geom_sf(data = geoboundaries(country = "Bolivia")) +
geom_sf(data = bolivia_level3 %>%
st_as_sf() %>% #👈👈👈👈
filter(NAME_1 == "Santa Cruz"),
mapping = aes(fill = NAME_1))Why is the class object called
SpatVector?
“SpatVector” stands for “Spatial Vector”, and “Vector” stands for “Vector data”.
Vector data is the formal name for the Geometry types like Point, Lines, and Polygons.
It requires a Coordinate Reference System (CRS) to relate the spatial elements of the data with the surface of Earth.
It is also the most common format of Spatial data used in GIS. Which is commonly stored in Shapefiles.
Do not get confused by the vector class object, which is
an R class just like data.frame and
matrix.
Above is an example of converting foreign
Polygon data to sf class. Now let’s see
briefly see how to convert foreign Point
data.
{malariaAtlas}The {malariaAtlas} package download, visualize and
manipulate global malaria data hosted by the Malaria Atlas Project.
The malariaAtlas package enables users to download data
like:
The getPR() function downloads all the publicly
available PR points for a country (or countries) and returns it as a
data.frame.
zimbabwe_malaria_pr <- getPR(country = "Zimbabwe", species = "BOTH")The species argument is a string specifying the
Plasmodium species and can be Pf (Plasmodium
falciparum), Pv (Plasmodium vivax) or
BOTH.
The output is a data.frame object that contains
longitude and latitude as variables:
class(zimbabwe_malaria_pr)## [1] "pr.points" "data.frame"
zimbabwe_malaria_pr %>%
as_tibble()## # A tibble: 567 × 28
## dhs_id site_id site_name latitude longitude rural_urban country country_id
## <lgl> <int> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 NA 16142 Chilonga -18.4 27.2 UNKNOWN Zimbab… ZWE
## 2 NA 17018 Ume - Kahobo -17.8 28.8 UNKNOWN Zimbab… ZWE
## 3 NA 18443 Bandanyenje -18.9 32.0 UNKNOWN Zimbab… ZWE
## 4 NA 68 Piringani -17.0 30.1 UNKNOWN Zimbab… ZWE
## 5 NA 19390 Zidulini -18.7 28.7 UNKNOWN Zimbab… ZWE
## 6 NA 17564 Lutumba -22.1 30.2 UNKNOWN Zimbab… ZWE
## 7 NA 14287 Seula School -21.6 28.4 UNKNOWN Zimbab… ZWE
## 8 NA 2943 Ndlovu -18.9 27.8 UNKNOWN Zimbab… ZWE
## 9 NA 11029 Bambaninga -20.4 31.8 UNKNOWN Zimbab… ZWE
## 10 NA 20058 Pumula Miss… -19.6 27.1 UNKNOWN Zimbab… ZWE
## # ℹ 557 more rows
## # ℹ 20 more variables: continent_id <chr>, month_start <int>, year_start <int>,
## # month_end <int>, year_end <int>, lower_age <int>, upper_age <int>,
## # examined <int>, positive <int>, pr <dbl>, species <chr>, method <chr>,
## # rdt_type <chr>, pcr_type <lgl>, malaria_metrics_available <chr>,
## # location_available <chr>, permissions_info <chr>, citation1 <chr>,
## # citation2 <chr>, citation3 <chr>
Download the publicly available Parasite Ratio (PR) points for
Plasmodium falciparum in Sierra Leone using the
{malariaAtlas} package.
q3 <- ________(________ = "Sierra Leone",species = "Pf")
q3autoplot() can be used to quickly and easily visualize
the downloaded PR survey points.
autoplot(zimbabwe_malaria_pr)However, if you would like to make a custom plot for
your particular research needs, with complete control of the output
aesthetics, you would probably prefer {ggplot2} for
the work!
sfTo convert a data.frame to an sf object, we
can use the st_as_sf() from the {sf}
package.
In case of Point patterns data (like for a Dot map):
coords argument need to specify the names of the
variables holding the Coordinates, andcrs argument specify the Coordinate
Reference System (CRS) to be assigned (here WGS84, which is the
CRS code #4326).zimbabwe_malaria_pr_sf <-
# data frame
zimbabwe_malaria_pr %>%
# convert to sf
sf::st_as_sf(coords = c("longitude","latitude"), #👈👈👈👈👈👈
crs = 4326)Inside the coords argument,you need to
pay attention to two things:
"quotation_marks" for each column
name,longitude, and then the
latitude.Mission accomplished! This is the output of the conversion:
zimbabwe_malaria_pr_sf %>%
dplyr::select(site_name,pr)## Simple feature collection with 567 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 25.8063 ymin: -22.2333 xmax: 32.9708 ymax: -16.0662
## Geodetic CRS: WGS 84
## First 10 features:
## site_name pr geometry
## 9 Chilonga 0.0000 POINT (27.1667 -18.4)
## 11 Ume - Kahobo 0.0750 POINT (28.8209 -17.8091)
## 17 Bandanyenje 0.0000 POINT (31.95 -18.8667)
## 19 Piringani 0.0000 POINT (30.1333 -16.9667)
## 21 Zidulini 0.0080 POINT (28.6667 -18.7333)
## 24 Lutumba 0.0000 POINT (30.1757 -22.122)
## 26 Seula School 0.0000 POINT (28.4033 -21.6176)
## 30 Ndlovu 0.0800 POINT (27.75 -18.8667)
## 34 Bambaninga 0.0080 POINT (31.7562 -20.4146)
## 38 Pumula Mission 0.0078 POINT (27.116 -19.617)
Now we have converted our data.frame into an
sf object and can directly plot this object with
{ggplot2} code:
ggplot(data = zimbabwe_malaria_pr_sf) +
geom_sf(aes(size = pr))Or add one previous layer with the administrative boundaries of Zimbabwe:
zimbabwe_boundaries_adm1 <- geoboundaries(country = "Zimbabwe",
adm_lvl = 1)In the same ggplot:
ggplot() +
# boundaries data
geom_sf(data = zimbabwe_boundaries_adm1,
fill = "white") +
# disease data
geom_sf(data = zimbabwe_malaria_pr_sf,
aes(size = pr),
color = "red",
alpha=0.5)sf::st_as_sf() does not allow missing values in
coordinates!
So you can use {dplyr} verbs like filter()
to drop problematic rows from variable x using this
notation:filter(!is.na(x)).
With the publicly available Parasite Ratio (PR) points of
Plasmodium falciparum in Sierra Leone stored in
q3:
Convert its data.frame output to a sf
object using the st_as_sf() function, with the code
4326 to set a WGS84 CRS. (More on all these
codes coming soon, don’t be intimidated !)
q4 <-
q3 %>%
filter(!is.na(longitude)) %>%
sf::________(coords = c("________","________"),
crs = 4326)
q4In this lesson, we have learned how to convert
foreign Polygon and Point data to a
sf class object from SpatVector and
data.frame objects, respectively. We also learned about
Vector data and how Point data needs a
Coordinate Reference Systems (CRS) to obtain an sf
object.
Figure 2. Concept map
The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
Seimon, Dilinie. Administrative Boundaries. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/admin_boundaries.html
Varsha Ujjinni Vijay Kumar. Malaria. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/malaria.html
Batra, Neale, et al. The Epidemiologist R Handbook. Chapter 28: GIS Basics. (2021). Retrieved 01 April 2022, from https://epirhandbook.com/en/gis-basics.html
Lovelace, R., Nowosad, J., & Muenchow, J. Geocomputation with R. Chapter 2: Geographic data in R. (2019). Retrieved 01 April 2022, from https://geocompr.robinlovelace.net/spatial-class.html
Moraga, Paula. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapter 2: Spatial data and R packages for mapping. (2019). Retrieved 01 April 2022, from https://www.paulamoraga.com/book-geospatial/sec-spatialdataandCRS.html
This work is licensed under the Creative Commons Attribution Share Alike license.