Let say that you receive a raw data.frame
with
coordinate points per row, and I want to create a Thematic map with it,
how can I make it using {ggplot2}
? Or that you use a
package to get spatial data but in a object class like
SpatVector
or SpatialPolygonsDataFrame
, how
can I use them with {ggplot2}
?
We need sf
class objects to create ggplot
maps! So, today we are going to learn how to convert
foreign spatial objects to sf
, either polygon or
point data, to keep making ggplot2
maps!
Convert foreign objects to
sf
class using the st_as_sf()
function from the {sf}
package.
Convert foreign polygon data in
SpatVector
class to sf
.
Convert foreign point data in
data.frame
class to sf
.
This lesson requires the following packages:
if(!require('pacman')) install.packages('pacman')
::p_load(malariaAtlas,
pacman
ggplot2,
cholera,
geodata,
dplyr,
here,
sf)
::p_load_gh("afrimapr/afrilearndata",
pacman"wmgeolab/rgeoboundaries")
{geodata}
An advantage of the {geodata}
package is that it
downloads country borders data with a multiple column output,
providing the names of all the levels above the one requested.
For example, here is an example with the first three administrative levels in Bolivia:
<- gadm(country = "Bolivia", level = 3, path = tempdir()) bolivia_level3
%>%
bolivia_level3
as_tibble() %>%
select(starts_with("NAME_")) #👈👈👈👈👈👈👈👈👈👈👈👈👈
## # A tibble: 344 × 3
## NAME_1 NAME_2 NAME_3
## <chr> <chr> <chr>
## 1 Beni Cercado San Javier
## 2 Beni Cercado Trinidad
## 3 Beni General José Ballivián Puerto Menor de Rurrenabaque
## 4 Beni General José Ballivián Reyes
## 5 Beni General José Ballivián San Borja
## 6 Beni General José Ballivián Santa Rosa
## 7 Beni Iténez Baures
## 8 Beni Iténez Huacaraje
## 9 Beni Iténez Magdalena
## 10 Beni Mamoré Puerto Siles
## # ℹ 334 more rows
This is useful for situations where you want to filter()
only a specific subset of sub-divisions (e.g. level 3) within a region
(level 1) or department (level 2).
%>%
bolivia_level3
as_tibble() %>%
select(starts_with("NAME_")) %>%
filter(NAME_1 == "Santa Cruz") #👈👈👈👈👈👈👈👈👈👈👈👈👈👈
## # A tibble: 56 × 3
## NAME_1 NAME_2 NAME_3
## <chr> <chr> <chr>
## 1 Santa Cruz Andrés Ibáñez Cotoca
## 2 Santa Cruz Andrés Ibáñez El Torno
## 3 Santa Cruz Andrés Ibáñez La Guardia
## 4 Santa Cruz Andrés Ibáñez Porongo
## 5 Santa Cruz Andrés Ibáñez Santa Cruz de la Sierra
## 6 Santa Cruz Angel Sandoval San Matías
## 7 Santa Cruz Chiquitos Pailón
## 8 Santa Cruz Chiquitos Roboré
## 9 Santa Cruz Chiquitos San José
## 10 Santa Cruz Cordillera Boyuibe
## # ℹ 46 more rows
However, I can not create ggplot
maps with it because it
is a SpatVector
class object:
gadm(country = "Bolivia", level = 3, path = tempdir())
👉 class : SpatVector 👈
geometry : polygons
dimensions : 344, 16 (geometries, attributes)
extent : -69.64525, -57.45443, -22.90657, -9.670923
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
sf
The best way to solve this issue is to use the
st_as_sf()
function from the {sf}
package
# spatvector
%>% class() bolivia_level3
## [1] "SpatVector"
## attr(,"package")
## [1] "terra"
# sf
%>% st_as_sf() %>% class() #👈👈👈👈👈👈👈👈 bolivia_level3
## [1] "sf" "data.frame"
This solution is useful for object classes like
SpatVector
and SpatialPolygonsDataFrame
.
Convert the country borders of Germany from SpatVector
to a sf
class object using the st_as_sf()
function:
gadm(country = "Germany", level = 2, path = tempdir()) %>%
...........
Now with sf
objects, we can create a ggplot like
this:
ggplot() +
geom_sf(data = geoboundaries(country = "Bolivia")) +
geom_sf(data = bolivia_level3 %>%
st_as_sf() %>% #👈👈👈👈
filter(NAME_1 == "Santa Cruz"),
mapping = aes(fill = NAME_1))
Why is the class object called
SpatVector
?
“SpatVector” stands for “Spatial Vector”, and “Vector” stands for “Vector data”.
Vector data is the formal name for the Geometry types like Point, Lines, and Polygons.
It requires a Coordinate Reference System (CRS) to relate the spatial elements of the data with the surface of Earth.
It is also the most common format of Spatial data used in GIS. Which is commonly stored in Shapefiles.
Do not get confused by the vector
class object, which is
an R class just like data.frame
and
matrix
.
Above is an example of converting foreign
Polygon data to sf
class. Now let’s see
briefly see how to convert foreign Point
data.
{malariaAtlas}
The {malariaAtlas}
package download, visualize and
manipulate global malaria data hosted by the Malaria Atlas Project.
The malariaAtlas
package enables users to download data
like:
The getPR()
function downloads all the publicly
available PR points for a country (or countries) and returns it as a
data.frame
.
<- getPR(country = "Zimbabwe", species = "BOTH") zimbabwe_malaria_pr
The species
argument is a string specifying the
Plasmodium species and can be Pf
(Plasmodium
falciparum), Pv
(Plasmodium vivax) or
BOTH
.
The output is a data.frame
object that contains
longitude
and latitude
as variables:
class(zimbabwe_malaria_pr)
## [1] "pr.points" "data.frame"
%>%
zimbabwe_malaria_pr as_tibble()
## # A tibble: 567 × 28
## dhs_id site_id site_name latitude longitude rural_urban country country_id
## <lgl> <int> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 NA 16142 Chilonga -18.4 27.2 UNKNOWN Zimbab… ZWE
## 2 NA 17018 Ume - Kahobo -17.8 28.8 UNKNOWN Zimbab… ZWE
## 3 NA 18443 Bandanyenje -18.9 32.0 UNKNOWN Zimbab… ZWE
## 4 NA 68 Piringani -17.0 30.1 UNKNOWN Zimbab… ZWE
## 5 NA 19390 Zidulini -18.7 28.7 UNKNOWN Zimbab… ZWE
## 6 NA 17564 Lutumba -22.1 30.2 UNKNOWN Zimbab… ZWE
## 7 NA 14287 Seula School -21.6 28.4 UNKNOWN Zimbab… ZWE
## 8 NA 2943 Ndlovu -18.9 27.8 UNKNOWN Zimbab… ZWE
## 9 NA 11029 Bambaninga -20.4 31.8 UNKNOWN Zimbab… ZWE
## 10 NA 20058 Pumula Miss… -19.6 27.1 UNKNOWN Zimbab… ZWE
## # ℹ 557 more rows
## # ℹ 20 more variables: continent_id <chr>, month_start <int>, year_start <int>,
## # month_end <int>, year_end <int>, lower_age <int>, upper_age <int>,
## # examined <int>, positive <int>, pr <dbl>, species <chr>, method <chr>,
## # rdt_type <chr>, pcr_type <lgl>, malaria_metrics_available <chr>,
## # location_available <chr>, permissions_info <chr>, citation1 <chr>,
## # citation2 <chr>, citation3 <chr>
Download the publicly available Parasite Ratio (PR) points for
Plasmodium falciparum in Sierra Leone
using the
{malariaAtlas}
package.
<- ________(________ = "Sierra Leone",species = "Pf")
q3 q3
autoplot()
can be used to quickly and easily visualize
the downloaded PR survey points.
autoplot(zimbabwe_malaria_pr)
However, if you would like to make a custom plot for
your particular research needs, with complete control of the output
aesthetics, you would probably prefer {ggplot2}
for
the work!
sf
To convert a data.frame
to an sf
object, we
can use the st_as_sf()
from the {sf}
package.
In case of Point patterns data (like for a Dot map):
coords
argument need to specify the names of the
variables holding the Coordinates, andcrs
argument specify the Coordinate
Reference System (CRS) to be assigned (here WGS84, which is the
CRS code #4326
).<-
zimbabwe_malaria_pr_sf # data frame
%>%
zimbabwe_malaria_pr # convert to sf
::st_as_sf(coords = c("longitude","latitude"), #👈👈👈👈👈👈
sfcrs = 4326)
Inside the coords
argument,you need to
pay attention to two things:
"quotation_marks"
for each column
name,longitude
, and then the
latitude
.Mission accomplished! This is the output of the conversion:
%>%
zimbabwe_malaria_pr_sf ::select(site_name,pr) dplyr
## Simple feature collection with 567 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 25.8063 ymin: -22.2333 xmax: 32.9708 ymax: -16.0662
## Geodetic CRS: WGS 84
## First 10 features:
## site_name pr geometry
## 9 Chilonga 0.0000 POINT (27.1667 -18.4)
## 11 Ume - Kahobo 0.0750 POINT (28.8209 -17.8091)
## 17 Bandanyenje 0.0000 POINT (31.95 -18.8667)
## 19 Piringani 0.0000 POINT (30.1333 -16.9667)
## 21 Zidulini 0.0080 POINT (28.6667 -18.7333)
## 24 Lutumba 0.0000 POINT (30.1757 -22.122)
## 26 Seula School 0.0000 POINT (28.4033 -21.6176)
## 30 Ndlovu 0.0800 POINT (27.75 -18.8667)
## 34 Bambaninga 0.0080 POINT (31.7562 -20.4146)
## 38 Pumula Mission 0.0078 POINT (27.116 -19.617)
Now we have converted our data.frame
into an
sf
object and can directly plot this object with
{ggplot2}
code:
ggplot(data = zimbabwe_malaria_pr_sf) +
geom_sf(aes(size = pr))
Or add one previous layer with the administrative boundaries of Zimbabwe:
<- geoboundaries(country = "Zimbabwe",
zimbabwe_boundaries_adm1 adm_lvl = 1)
In the same ggplot:
ggplot() +
# boundaries data
geom_sf(data = zimbabwe_boundaries_adm1,
fill = "white") +
# disease data
geom_sf(data = zimbabwe_malaria_pr_sf,
aes(size = pr),
color = "red",
alpha=0.5)
sf::st_as_sf()
does not allow missing values in
coordinates!
So you can use {dplyr}
verbs like filter()
to drop problematic rows from variable x
using this
notation:filter(!is.na(x))
.
With the publicly available Parasite Ratio (PR) points of
Plasmodium falciparum in Sierra Leone stored in
q3
:
Convert its data.frame
output to a sf
object using the st_as_sf()
function, with the code
4326
to set a WGS84
CRS. (More on all these
codes coming soon, don’t be intimidated !)
<-
q4 %>%
q3 filter(!is.na(longitude)) %>%
::________(coords = c("________","________"),
sfcrs = 4326)
q4
In this lesson, we have learned how to convert
foreign Polygon and Point data to a
sf
class object from SpatVector
and
data.frame
objects, respectively. We also learned about
Vector data and how Point data needs a
Coordinate Reference Systems (CRS) to obtain an sf
object.
The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
Seimon, Dilinie. Administrative Boundaries. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/admin_boundaries.html
Varsha Ujjinni Vijay Kumar. Malaria. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/malaria.html
Batra, Neale, et al. The Epidemiologist R Handbook. Chapter 28: GIS Basics. (2021). Retrieved 01 April 2022, from https://epirhandbook.com/en/gis-basics.html
Lovelace, R., Nowosad, J., & Muenchow, J. Geocomputation with R. Chapter 2: Geographic data in R. (2019). Retrieved 01 April 2022, from https://geocompr.robinlovelace.net/spatial-class.html
Moraga, Paula. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapter 2: Spatial data and R packages for mapping. (2019). Retrieved 01 April 2022, from https://www.paulamoraga.com/book-geospatial/sec-spatialdataandCRS.html
This work is licensed under the Creative Commons Attribution Share Alike license.