1 Introduction

So far, all spatial data visualized contain locations measured in angular units (longitude/latitude). But, what if data came in a different coordinate system measured in linear units? Like the planar coordinates of this summary figure!

Figure 1. Actual Earth, Geoid, Ellipsoid and Planar Coordinates. The Ellipsoid uses angular units (longitude/latitude), and Planar Coordinates uses linear units (foots or meters). Source: Rhumbline.

In this lesson we are going to learn how to set up our data with a UTM Coordinate System, measured in meters instead of longitude/latitude, and how to transform spatial objects from UTM to longitude/latitude, and vice versa!

2 Learning objectives

  1. Configure data with UTM coordinate system using the st_as_sf() function.

  2. Change the CRS projection of a sf object using the st_transform() function.

3 Prerequisites

This lesson requires the following packages:

if(!require('pacman')) install.packages('pacman')

pacman::p_load(malariaAtlas,
               colorspace,
               ggplot2,
               cholera,
               spData,
               dplyr,
               here,
               rio,
               sf)

pacman::p_load_gh("afrimapr/afrilearndata",
                  "wmgeolab/rgeoboundaries")

This lesson requires familiarity with {dplyr}: if you need to brush up, have a look at our introductory course on data wrangling.

4 Set up a CRS Projection to UTM coordinates

Let’s say that you receive a data frame with coordinates, but without a CRS projection. If you want to make a ggplot map with it, you know that you can use st_as_sf() from the {sf} package.

As we learned in a previous lesson, for point data we need to specify the coordinates and the CRS:

fatalities %>% 
  st_as_sf(coords = c("x","y"),
           crs = 4326) %>% 
  ggplot() +
  geom_sf(alpha = 0.3)

However, what if you receive coordinates different to longitude and latitude?

To exemplify this scenario, we are going to use malaria prevalence in The Gambia. We use data of malaria prevalence in children obtained at 65 villages in The Gambia.

4.1 Malaria prevalence

First, we read the gambia_summarized.rds file using its file path with the {readr} package.

gambia_point_summary <- 
  read_rds(here("data/gambia_summarized.rds"))

gambia_point_summary
## # A tibble: 65 × 5
##          x       y total positive  prev
##      <dbl>   <dbl> <int>    <dbl> <dbl>
##  1 349631. 1458055    33       17 0.515
##  2 358543. 1460112    63       19 0.302
##  3 360308. 1460026    17        7 0.412
##  4 363796. 1496919    24        8 0.333
##  5 366400. 1460248    26       10 0.385
##  6 366688. 1463002    18        7 0.389
##  7 368420. 1460569    38       24 0.632
##  8 370400. 1460301    56        7 0.125
##  9 370628. 1499589    57        6 0.105
## 10 374403. 1501392    26        8 0.308
## # ℹ 55 more rows

It is a data frame with 65 observations and 5 variables:

  • x: x coordinate of the village (UTM),
  • y: y coordinate of the village (UTM),
  • total: total number of tests performed,
  • positive: number of positive tests, and
  • prev: malaria prevalence in each village.

UTM stants for Universal Transverse Mercator, another coordinate system.

Set up the CRS projection

Now we can plot the malaria prevalence. However, in order to use ggplot2 and geom_sf(), we need to transform the data.frame to an sf object.

To use the st_as_sf() function from the {sf} package, we need to specify a CRS projection. But in this case, the units of the x and y variables are not in Geographic coordinates (longitude/latitude). Instead, these data coordinates are in UTM format (Easting/Northing), also called Projected coordinates.

CRS coordinate systems:

  • Geographic (or unprojected) reference systems use longitude and latitude for referencing a location on the Earth’s three-dimensional ellipsoid surface.

  • Projected coordinate reference systems use easting and northing Cartesian coordinates for referencing a location on a two-dimensional representation of the Earth.

Figure 2. A geographic coordinate system measured in angular units is compared to a projected coordinate system measured in linear units. Source: ArcGIS Pro.

In R, it looks like this:

Figure 3. Coordinate systems. Left: Geographic, Right: Projected.

All Projected CRSs are based on a Geographic CRS and rely on map projections to convert the three-dimensional surface of the Earth into Easting and Northing (x and y) values for a two-dimensional Earth representation.

Figure 4. Coordinate systems. Left: Geographic, Right: Projected. Source: ayresriverblog.

Which of the following options of Coordinate Reference System (CRS) types:

  1. "geographic_crs"
  2. "projected_crs"

…corresponds to each of these datasets, given the magnitude of the values in their x and y columns:

the parana dataset?

parana <- import("https://github.com/cran/geoR/raw/master/data/parana.rda")
as_tibble(parana$coords)

the fatalities dataset?

pacman::p_load(cholera)
as_tibble(fatalities)

Set UTM Projected coordinates with st_as_sf()

First, we need to set UTM coordinates. For this, we specify the projection of The Gambia, that is, UTM zone 28 ("+proj=utm +zone=28") in the st_as_sf() function of the {sf} package.

gambia_projected <- gambia_point_summary %>% 
  # first, specify the projection of gambia
  # UTM zone 28
  st_as_sf(coords = c("x", "y"),
           crs = "+proj=utm +zone=28")

gambia_projected
## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349631.3 ymin: 1458055 xmax: 622086.1 ymax: 1510610
## Projected CRS: +proj=utm +zone=28
## # A tibble: 65 × 4
##    total positive  prev           geometry
##  * <int>    <dbl> <dbl>        <POINT [m]>
##  1    33       17 0.515 (349631.3 1458055)
##  2    63       19 0.302 (358543.1 1460112)
##  3    17        7 0.412 (360308.1 1460026)
##  4    24        8 0.333 (363795.7 1496919)
##  5    26       10 0.385 (366400.5 1460248)
##  6    18        7 0.389 (366687.5 1463002)
##  7    38       24 0.632 (368420.5 1460569)
##  8    56        7 0.125 (370399.5 1460301)
##  9    57        6 0.105 (370627.7 1499589)
## 10    26        8 0.308 (374402.6 1501392)
## # ℹ 55 more rows

Confirm the presence of the:

  • CRS text (CRS: +proj=utm +zone=28) inside the header of the new sf object, and
  • the unit the geometry column in meters (<POINT [m]>).
  • The UTM system divides the Earth into 60 zones of 6 degrees of longitude in width. Each of the zones uses a transverse Mercator projection that maps a region of large north-south extent.

Figure 5. UTM zones in the USA. Source: Chrismurf at English Wikipedia, via Wikimedia Commons (CC-BY)

In the UTM system, a position on the Earth is given by the:

  • UTM zone number,
  • Hemisphere (north or south), and
  • Easting and northing coordinates in the zone which are measured in meters.
    • Eastings are referenced from the central meridian of each zone, and
    • northings are referenced from the equator.

parana_data contains the average rainfall over different years for the period May-June (dry-season). It was collected at 143 recording stations throughout Parana State, Brasil.

Set UTM coordinate system to the parana_data. Parana State is located in the UTM zone number 22.

parana_data <- as_tibble(parana$coords) %>% 
  mutate(Rainfall = parana$data)
q3 <- parana_data %>% 
  st_as_sf(coords = c("east", "north"), 
           crs = "+proj=______ +zone=______")
q3

Transform to Geographic coordinates with st_transform()

We can transform the UTM projected coordinates to geographic coordinates (longitude/latitude with datum WGS84) using st_transform() where we set CRS to "+proj=longlat +datum=WGS84".

gambia_geographic <- gambia_projected %>% 
  # second, transform 
  # projected coordinates to
  # geographic coordinates
  st_transform(crs = "+proj=longlat +datum=WGS84")

gambia_geographic
## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -16.38755 ymin: 13.18541 xmax: -13.87273 ymax: 13.66438
## Geodetic CRS:  +proj=longlat +datum=WGS84
## # A tibble: 65 × 4
##    total positive  prev             geometry
##  * <int>    <dbl> <dbl>          <POINT [°]>
##  1    33       17 0.515 (-16.38755 13.18541)
##  2    63       19 0.302 (-16.30543 13.20444)
##  3    17        7 0.412 (-16.28914 13.20374)
##  4    24        8 0.333 (-16.25869 13.53742)
##  5    26       10 0.385 (-16.23294 13.20603)
##  6    18        7 0.389 (-16.23041 13.23094)
##  7    38       24 0.632 (-16.21431 13.20902)
##  8    56        7 0.125 (-16.19604 13.20668)
##  9    57        6 0.105 (-16.19569 13.56187)
## 10    26        8 0.308 (-16.16088 13.57834)
## # ℹ 55 more rows

Confirm the update of the:

  • CRS text to CRS: +proj=longlat +datum=WGS84 inside the header, and
  • the units of the geometry column to degrees (<POINT [°]>).

A PROJ string includes the following information:

  • +proj=: the projection of the data (e.g. utm, longlat, or laea)
  • +zone=: the zone of the data, specific to the UTM projection (e.g. 28)
  • +datum=: the datum use (e.g. WGS84)
  • +units=: the units for the coordinates of the data (e.g. m)

With the UTM coordinate system data stored in q3:

Transform its Projected CRS to a Geographic CRS using the longitude/latitude (longlat) projection with datum WGS84.

q4 <- q3 %>% 
  st_transform(crs = "+proj=______ +datum=______")
q4

To reproduce John Snow’s map shown in previous lessons, we needed to set and transform different CRS in the same coding pipeline, using st_set_crs() and st_transform().

mdsr::CholeraDeaths %>%
  # british national grid
  st_set_crs(27700) %>%
  # to wgs84
  st_transform(4326)

Projected CRS are a choice made by a public mapping agency. So, with local data sources, work with the CRS in which the data was provided, to ensure compatibility, even if the official CRS is not the most accurate.

Map prevalences

Now that you have set up the right CRS projection to your data, you can overlap these points with other Vector data objects:

gambia_adm_2 <- geoboundaries(country = "Gambia", adm_lvl = 2)
ggplot() +
  geom_sf(data = gambia_adm_2) +
  geom_sf(data = gambia_geographic, mapping = aes(color = prev)) +
  colorspace::scale_color_continuous_sequential(palette="Reds 3")

Which CRS to use?

“There exist no all-purpose projections, all involve distortion when far from the center of the specified frame” (Bivand, Pebesma, and Gómez-Rubio 2013).

  • When Geographic CRS, the answer is often WGS84.
    • It is used by default for web mapping, in GPS datasets, and vector datasets.
    • WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326.
    • This ‘magic number’ can be used to convert objects with unusual projected CRSs into something that is widely understood.

5 Wrap up

In this lesson, we learned how to transform the CRS between different coordinate systems (projected and geographic).

Figure 6. A geographic coordinate system measured in angular units is compared to a projected coordinate system measured in linear units. Source: ArcGIS Pro.

In the next lesson, we are going to use all our previous learning to built one single thematic map by layers, and enrich them with text and labels referring to specific places or regions, and important map elements like scale bars and a north arrow!

Contributors

The following team members contributed to this lesson:

References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License