Here's one solution for mapping each CustomersZip$ID
with the closest StoreZip$Store
:
library(data.table)
# library(geosphere) # distHaversine
CustomersZip[
, Store := StoreZip$Store[
which.min(geosphere::distHaversine(
cbind(first(lng), first(lat)),
StoreZip[, cbind("lng", "lat"), with = FALSE])) ]
, by = ID ]
CustomersZip
# Zip ID lat lng Store
# <int> <char> <num> <num> <int>
# 1: 1001 65484654805 42.07 -72.63 11113
# 2: 1001 64846124846 42.07 -72.63 11113
# 3: 1001 68421548945 42.07 -72.63 11113
# 4: 1001 84051545484 42.07 -72.63 11113
# 5: 1001 97545154848 42.07 -72.63 11113
# 6: 1002 64841515484 42.38 -72.52 11112
Walk-through:
distHaversine
operates on two arguments, typically matrices (or frames) with two columns each; how it calculates distances depends on the number of points in each of its arguments p1
and p2
:
- if
p1
has one point, then it calculates all p2
points to the individual p1
point; similarly if p2
has one point;
- if
p1
and p2
have the same number of points, it calculates the distance point-wise, so that row1 with row1, row2 with row2, etc; it does not do a cartesian expansion of "row1 with row1,row2,row3,...", "row2 with row1,row2,row3,...", so that has to happen externally
- because of that, we work one customer at a time, and find the store with the minimum distance (
which.min
) and store its Store
id
- though not a factor with this sample data, I chose to group this by
CustomersZip$ID
and use just the first
lat/lng found for that customer; if it's possible that a customer will have different points, then remove the first
calls, and group with by = seq_len(nrow(CustomersZip))
instead; if this step is unnecessary, it will still calculate the same results, the only loss will be some efficiency by calculating the same distances multiple times
Reusable data (a subset of what is in the question):
StoreZip <- fread("
Zip Store lat lng
01026 11111 42.48 -72.93
01040 11112 42.22 -72.64
01104 11113 42.13 -72.57
01104 11114 42.13 -72.57
01301 11115 42.63 -72.59
01301 11116 42.63 -72.59")
CustomersZip <- fread("
Zip ID lat lng
01001 65484654805 42.07 -72.63
01001 64846124846 42.07 -72.63
01001 68421548945 42.07 -72.63
01001 84051545484 42.07 -72.63
01001 97545154848 42.07 -72.63
01002 64841515484 42.38 -72.52", colClasses = list(character="ID"))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…