-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
157 lines (112 loc) · 6.85 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
error = TRUE,
fig.path = 'README-'
# for CRAN release
#, eval = FALSE
)
```
# `geohashTools`
```{r, results = 'asis', echo = FALSE}
local({
pkg = 'geohashTools'
repo = file.path('MichaelChirico', pkg)
cat(sprintf('
![logo](logo.png "%1$s")
[![codecov](https://codecov.io/gh/%2$s/branch/master/graph/badge.svg)](https://app.codecov.io/gh/%2$s?branch=master)
[![Build Status](https://app.travis-ci.com/%2$s.svg?branch=master)](https://app.travis-ci.com/%2$s)
[![CRAN status](https://www.r-pkg.org/badges/version/%1$s)](https://cran.r-project.org/package=%1$s)
', pkg, repo))
})
```
This package provides tools for working with [Gustavo](https://github.com/niemeyer) [Niemeyer](https://twitter.com/gniemeyer)'s [geohash](https://en.wikipedia.org/wiki/Geohash) system of nestable, compact global coordinates based on [Z-order curves](https://en.wikipedia.org/wiki/Z-order_curve). The system consists of carving the earth into equally-sized rectangles (when projected into latitude/longitude space) and nesting this process recursively.
Originally, we adapted C++ source from [Hiroaki Kawai](https://github.com/hkwi), but have now rewritten the implementation completely with a new approach in C.
## Encoding geohashes
Encoding is the process of turning latitude/longitude coordinates into geohash strings. For example, Parque Nacional Tayrona in Colombia is located at roughly 11.3113917 degrees of latitude, -74.0779006 degrees of longitude. This can be expressed more compactly as:
```{r tayrona}
library(geohashTools)
gh_encode(11.3113917, -74.0779006)
```
These 6 characters identify this point on the globe to within 1.2 kilometers (east-west) and .6 kilometers (north-south).
The park is quite large, and this is too precise to cover the park; we can "zoom out" by reducing the precision (which is the number of characters in the output, `6` by default):
```{r tayrona_zoom_out}
gh_encode(11.3113917, -74.0779006, precision = 5L)
```
<!-- TODO(michaelchirico): restore this. Site appears to be down as of this CRAN submission, seemingly temporarily -->
### Public Art in Chicago
We can use this as a simple, regular level of spatial aggregation for spatial points data, e.g., counting presence of public art throughout the city of Chicago, as captured in a dataset
<!-- http s://data.cityofchicago.org/Parks-Recreation/Parks-Public-Art/sj6t-9cju -->
provided by the City:
NB: As of this writing, the Chicago data portal is down, apparently temporarily. However I'd like to submit this package update to CRAN in order to avoid deprecation issues around {rgdal} & co, so please check back on the website for a working version of the code below.
```{r chicago_art}
## first, pull the data internally from https://data.cityofchicago.org
# api_stem = 'https://data.cityofchicago.org/api/views'
# URL = file.path(api_stem, 'sj6t-9cju/rows.csv?accessType=DOWNLOAD')
# suppressPackageStartupMessages(library(data.table))
# art = fread(URL)
# count art by geohash
# gh_freq = art[, .N, by = .(geohash = gh_encode(LATITUDE, LONGITUDE, 5L))]
# only show the top 10
# gh_freq[order(-N)][1:10]
```
This is pretty impractical _per se_ (where is `dp3wm`?); we'll return to this once we've introduced more functionality.
## Decoding geohashes
The reverse of encoding geohashes is of course decoding them -- taking a given geohash string and converting it into global coordinates. For example, the Ethiopian coffee growing region of Yirgacheffe is roughly at `sc54v`:
```{r yirgacheffe}
gh_decode('sc54v')
```
It can also be helpful to know just how precisely we've identified these coordinates; the `include_delta` argument gives the cell half-widths in both directions in addition to the cell centroid:
```{r yirgacheffe_delta}
gh_decode('sc54v', include_delta = TRUE)
```
In terms of latitude and longitude, all geohashes with the same precision have the same dimensions (though the physical size of the "rectangle" changes depending on the latitude); as such it's easy to figure out the cell half-widths from the precision alone using `gh_delta`:
```{r gh_delta}
gh_delta(5L)
```
## Geohash neighborhoods
One unfortunate consequence of the geohash system is that, while geohashes that are lexicographically similar (e.g. `wxyz01` and `wxyz12`) are certainly close to one another, the converse is not true -- for example, `7gxyru` and `k58n2h` are neighbors! Put another way, small movements on the globe occasionally have visually huge jumps in the geohash-encoded output.
Fret not -- one tool for helping overcome this is the `gh_neighbors` function (`gh_neighbours` is also registered, for the Commonwealthy among us), which will return all of the geohashes adjacent to a given geohash (or vector of geohashes) at the same level of precision.
For example, the Merlion statue in Singapore is roughly at `w21z74nz`, but this level of precision zooms in a bit too far. The geohash neighborhood thereof can be found with:
```{r neighbors}
gh_neighbors('w21z74nz')
```
## API to other GIS tools in R
`geohashTools` offers several helper functions for interfacing your geohash objects with GIS tools in R, namely `sp` and `sf`. This will facilitate the best part of working with GIS data -- the visualizations!
Returning to public art locations in Chicago, we can visualize the spatial aggregations carried out above by converting to `sp`, combining with a shapefile of Chicago, and plotting:
```{r chicago_plot}
#| message = FALSE,
#| results = 'hide',
#| fig.alt = 'A viridis-color-scaled plot of Chicago overlaid with two types
#| of polygons: (1) the erose, semi-regular map of neighborhoods; and (2)
#| the regular, rectangular map of geohashes with public art. The salient
#| features of the plot are further described in the README body below.',
#| fig.width = 8,
#| fig.height = 8,
#| out.width = '\\textwidth'
library(sf)
## first, pull neighborhood shapefiles from https://data.cityofchicago.org
# tmp = tempfile(fileext = '.zip')
# shp_url = file.path(
# api_stem, '9wp7-iasj', 'files',
# 'TMTPQ_MTmUDEpDGCLt_B1uaiJmwhCKZ729Ecxq6BPfM?filename=Neighborhoods_2012.zip'
# )
# download.file(shp_url, tmp)
# chicago = paste0('/vsizip/', tmp) |>
# st_read(quiet = TRUE) |>
# st_transform(crs = 4326L)
# artSF = gh_to_sf(
# art[, .N, by = .(geohash = gh_encode(LATITUDE, LONGITUDE, 6L))],
# gh_col = 'geohash'
# )
# plot(st_geometry(chicago), lwd = 0.5, main = 'Public Art Locations in Chicago')
# plot(artSF['N'], add = TRUE)
```
Chicago connoisseurs will recognize the biggest concentration around Lincoln Park, with another concentration along the waterfront near Millenium/Grant Parks.
The process for `sf` is similar; just replace `gh_to_spdf` with `gh_to_sf`.
# See also
You might get benefit out of these more interactive online tools for working with geohashes:
- http://www.movable-type.co.uk/scripts/geohash.html
- https://gofreerange.com/geohash-explorer