-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial clustering of flow data #27
Comments
Also: you could try just aggregating origins or destinations to make it simpler. |
I have mad a first pass at using HDBSCAN for clustering lines by using a custom distance matrix. The distance between lines is based on equation 3 here. Initial results with minPts = 50: This is just proof that the distance function works, but there are many issues:
To Do:
|
This looks great to me Hussein, no other comment at this stage... |
It seems like I cannot create big matrices using pivot_wider in tidyr (see issue 1097) . This is an issue for point [1] above as it prevents me from going down to the OA (or custom) hexagon level. I need to find another package that allows this |
And what's the max distance you need? |
@Robinlovelace I am looking at all OD pairs with commuting flows. I am getting the "distance" between each pair of desire lines using the metric in eq 1 here. I'm not sure how I would use the function you linked me to. I need to think about it. The clustering algorithm takes in a square matrix, so I still need to create a full matrix even if it is full of NAs (that is where I run into memory issue. I am now able to use the weights parameter in DBSCAN, see here: drt-potential/code/demand_cluster_flows.R Lines 229 to 232 in 57d8470
The results are still not good and I think this is mainly because I am using MSOA centroids. I will try odjitter to distribute the flows spatially and see how that affects the results |
Sounds good, the |
Do you recommend using the |
Good question. I recommend the |
Results are a bit better after (a) jittering and (b) weighting the flows in DBSAN - flows from different origins OR destinations are being clustered together. I still don't understand why most lines are in the big cluster 0. TODO:
|
This is great to see and I can think of applications in other projects. Great work Hussein in figuring out spatial clustering of OD data. |
Last commit fixes a mistake with the distance matrix calculation. I was only calculating distances between flows from the same origin zone, so I wasn't using distances for flows that have a different origin zone. Results now show clusters with flows that start in different zones. Still need to work on the points mentioned above |
Clustering flow data can show where demand is concentrated. This can be overlaid on PT supply to identify gaps. Try:
First pass
Second (more ambitious) pass
Limitations to mention
The text was updated successfully, but these errors were encountered: