Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spatial clustering of flow data #28

Merged
merged 37 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d06c81f
clustering script skeleton
Hussein-Mahfouz Jan 31, 2024
6215da5
flow dissimilarity
Hussein-Mahfouz Feb 1, 2024
11848ce
hdbscan v1
Hussein-Mahfouz Feb 2, 2024
554bf7a
pre cleaning up clustering code
Hussein-Mahfouz Feb 6, 2024
57d8470
weighted cluster using DBSCAN, ref #27
Hussein-Mahfouz Feb 6, 2024
4d28f1c
start on jittering
Hussein-Mahfouz Feb 7, 2024
0843e13
jittering flows, ref # 27
Hussein-Mahfouz Feb 8, 2024
c0bc27e
fix dist_mat calculation, ref #27
Hussein-Mahfouz Feb 9, 2024
46e0c87
flows by mode
Hussein-Mahfouz Feb 13, 2024
ace397f
first attempt at clusters with mode, ref #27
Hussein-Mahfouz Feb 14, 2024
ed4d91f
parameter sensitivity for dbscan
Hussein-Mahfouz Feb 19, 2024
892f9e1
add missing argument
Hussein-Mahfouz Feb 20, 2024
06f5e64
dbscan sensitivity plots, ref #27
Hussein-Mahfouz Feb 20, 2024
793752c
filter od pairs based on poor supply. Jitter not working now. ref #27
Hussein-Mahfouz Feb 21, 2024
929a93e
scenarios 1, 2, and 3
Hussein-Mahfouz Mar 1, 2024
0c84a73
plot for scenario 3
Hussein-Mahfouz Mar 5, 2024
d9cc4e2
scenario 3 demand plot
Hussein-Mahfouz Mar 5, 2024
b60611c
asp = 0 for tmap cropping whitespace
Hussein-Mahfouz Mar 5, 2024
93a7124
edit plots and save gtfs sf
Hussein-Mahfouz Mar 6, 2024
6b2cff7
cluster convex hull poly maps
Hussein-Mahfouz Mar 6, 2024
c9e246a
figure: bar plot of transfers
Hussein-Mahfouz Mar 11, 2024
616b8e2
poly and line for cluster maps
Hussein-Mahfouz Mar 25, 2024
99cb38e
cluster maps with bus diff
Hussein-Mahfouz Mar 26, 2024
a8dabe5
cluster maps with bus diff - concave_hull()
Hussein-Mahfouz Mar 27, 2024
f34647d
split clustering vis to new script
Hussein-Mahfouz Mar 27, 2024
eedc50a
validation: maps with urban rural background
Hussein-Mahfouz Mar 27, 2024
9ae0e95
maps with panels
Hussein-Mahfouz Mar 27, 2024
2661978
figures to help with validation
Hussein-Mahfouz Apr 16, 2024
af52dab
edit tmap error
Hussein-Mahfouz Apr 16, 2024
5235c4a
urban rural facet ggplots
Hussein-Mahfouz Apr 22, 2024
676c6bc
ggplot figures of filtered clusters only
Hussein-Mahfouz Apr 22, 2024
2078b14
edit ggplot dimensions
Hussein-Mahfouz Apr 22, 2024
5c98da1
distribution plots: speed and demand
Hussein-Mahfouz May 16, 2024
b73ff37
save clusters and one big map
Hussein-Mahfouz May 16, 2024
3f9592d
one big map
Hussein-Mahfouz May 17, 2024
0d6a715
fix demand percentiles, add density plots, and edit cutoffs
Hussein-Mahfouz May 20, 2024
4f01458
trip level density plot
Hussein-Mahfouz May 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions R/dbscan_sensitivity.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#' Sensitivity analysis of DBSCAN parameters
#'
#' @param distance_matrix a precalculated distance matrix between desire lines
#' @param options_epsilon a vector of options for the epsilon paramter
#' @param options_minpts a vector of options for the minPts paramter
#' @param weights a vector where each value represents the number of people going between an OD pair (to pass to the weights argument of dbscan)
#' @param flows the original od matrix with a column that represents flows between od pairs
#'
#' @return a df with columns {id} (to identify the eps and minpts used), {cluster}, {size} (no of desire lines in cluster), commuters_sum (total no. of commuters in cluster)
#'
#' @examples
#'
#' @export
dbscan_sensitivity = function(distance_matrix, options_epsilon, options_minpts, weights, flows){

# dataframe with all combinations of eps and minpts
options_parameters <- tidyr::expand_grid(eps = options_epsilon, minpts = options_minpts)

# create empty list to store the results for each eps and minpts combination
results <- vector(mode = "list", length = nrow(options_parameters))

# get results for each eps and minpts combination
for(i in 1:nrow(options_parameters)){
# print iteration
print(paste0("running dbscan for option ", i, " of ", nrow(options_parameters),
" : eps = ", options_parameters$eps[i],
" | minpts = ", options_parameters$minpts[i]))

# clustering using dbscan
cluster_dbscan_i = dbscan::dbscan(distance_matrix,
minPts = options_parameters$minpts[i], # 125
eps = options_parameters$eps[i], # 9.5
weights = weights)


# Get results

# add column with clustering results to distance matrix
cluster_dbscan_res_i <- distance_matrix %>%
mutate(cluster = cluster_dbscan_i$cluster)

# prepare data for joining
cluster_dbscan_res_i <- cluster_dbscan_res_i %>%
rownames_to_column(var = "flow_ID") %>%
select(flow_ID, cluster)


# add flow data to get commuters in each cluster
cluster_dbscan_res_i <- flows %>%
inner_join(cluster_dbscan_res_i, by = "flow_ID")

# check size per cluster and total commuters per cluster
cluster_res_i <- cluster_dbscan_res_i %>%
st_drop_geometry() %>%
group_by(cluster) %>%
summarise(size = n(), commuters_sum = sum(commute_all)) %>%
arrange(desc(size)) %>%
ungroup()

# add id that identifies which parameters were used to get this result
cluster_res_i <- cluster_res_i %>%
mutate(id = paste0("eps_", options_parameters$eps[i], "_minpts_", options_parameters$minpts[i]))

# save in list
results[[i]] <- cluster_res_i

}
# turn results into one df
results = bind_rows(results)
# return this output
return(results)
}
Loading