Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent geometry orientation and conversion #267

Open
tzakharko opened this issue Jan 17, 2025 · 6 comments
Open

Inconsistent geometry orientation and conversion #267

tzakharko opened this issue Jan 17, 2025 · 6 comments

Comments

@tzakharko
Copy link

tzakharko commented Jan 17, 2025

I've run into an odd issue working with some data found in the wild. Consider this MRE:

library(sf)
library(s2)

# sf polygon as loaded from a database
poly <- structure(list(list(structure(c(12.9718058342186, 12.9718058342186, 
12.9728528296958, 12.9994465148162, 13.0258308008412, 13.0090788732063, 
12.9931645419531, 12.9676178523098, 12.9529599156293, 12.9223876476955, 
12.8830206177534, 12.8428159914296, 12.7963293922426, 12.7460736093378, 
12.7305012888338, 12.7291122826075, 12.7257618970805, 12.725552497985, 
12.7259712961759, 12.7351848563751, 12.7542401740599, 12.7718296980765, 
12.7860688365662, 12.785931374378, 12.801773768724, 12.8160129072137, 
12.8537047443923, 12.9047981236788, 12.9491907319114, 12.9718058342186, 
-4.04623539727417, -4.04623539727417, -4.05062180563782, -4.0750599304832, 
-4.1013770804975, -4.10806066333969, -4.12226309084477, -4.14356625610239, 
-4.16236269020371, -4.18742390394239, -4.21331964830488, -4.22626719749194, 
-4.23169675058643, -4.2296084654374, -4.23006522722775, -4.21833162843474, 
-4.18136751731289, -4.15421762373774, -4.1429397005941, -4.1201745144563, 
-4.09281365829253, -4.07861070745388, -4.06858494389262, -4.06833641784592, 
-4.05960342472729, -4.05166618507748, -4.04289335590868, -4.04080457305779, 
-4.0433111118331, -4.04623539727417), dim = c(30L, 2L)))), class = c("XY", 
"MULTIPOLYGON", "sfg"))

# the polygon contains degenerate vertices, so we fix it using s2_union()
as_s2_geography(poly) # Error:  Loop 0 is not valid: Edge 0 is degenerate (duplicate vertex)
as_s2_geography(poly, check = FALSE) |> s2_union() |> s2_is_valid() # TRUE

# now it gets weird... surely this tiny area does cover the entire Earth globe?
as_s2_geography(poly, check = FALSE) |> s2_union() |> s2_area() # ~ 510e6 km2
as_s2_geography(poly, check = FALSE, oriented = FALSE) |> s2_union() |> s2_area() # ~ 510e6 km2
as_s2_geography(poly, check = FALSE, oriented = FALSE, planar = TRUE) |> s2_union() |> s2_area() # ~ 510e6 km2

# doing a roundtrip conversion to sf fixes the area calculation
as_s2_geography(poly, check = FALSE) |> s2_union() |> st_as_sf() |> as_s2_geography() |> s2_area() # ~ 488 km2

# lwgeom also produces the correct calculation
lwgeom::st_geod_area(st_sfc(poly, crs = "WGS84")) # ~ 488 km2

It seems that s2 interprets the polygon as a hole on the Earth’s surface, which kind of makes sense due to the CW orientation of the vertices in the data. There are a few things that confuse me, however:

  • Shouldn't oriented = FALSE take care of this by reorienting the polygon so that the exterior area is larger than the interior area?
  • Why does the orientation change when converting it back to sf?

In general, how should one deal with situations like this? And how does one catch it to ensure correct data treatment? Right now, what I am doing is manually checking the polygon orientation and reversing the vertex sequence if needed (with logging to double-check the data later), which seems cumbersome?

P.S. If I manually remove the duplicate vertex at index 2, the result is as expected:

xy <- st_coordinates(poly) |> as_tibble()
xy <- xy[-2, ] # remove the 2 vertex
pp <- s2_make_polygon(xy$X, xy$Y, oriented = FALSE)

s2_area(pp) # ~ 488 km2
@edzer
Copy link
Member

edzer commented Jan 17, 2025

First of all, I believe there is no CW or CCW in spherical coordinates, because there is no natural "up" and "down"; there's "to the left" and "to the right" in winding direction.

I get different outcomes from yours:

library(sf)
# Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(s2)

# sf polygon as loaded from a database
poly <- structure(list(list(structure(c(12.9718058342186, 12.9718058342186,
12.9728528296958, 12.9994465148162, 13.0258308008412, 13.0090788732063,
12.9931645419531, 12.9676178523098, 12.9529599156293, 12.9223876476955,
12.8830206177534, 12.8428159914296, 12.7963293922426, 12.7460736093378,
12.7305012888338, 12.7291122826075, 12.7257618970805, 12.725552497985,
12.7259712961759, 12.7351848563751, 12.7542401740599, 12.7718296980765,
12.7860688365662, 12.785931374378, 12.801773768724, 12.8160129072137,
12.8537047443923, 12.9047981236788, 12.9491907319114, 12.9718058342186,
-4.04623539727417, -4.04623539727417, -4.05062180563782, -4.0750599304832,
-4.1013770804975, -4.10806066333969, -4.12226309084477, -4.14356625610239,
-4.16236269020371, -4.18742390394239, -4.21331964830488, -4.22626719749194,
-4.23169675058643, -4.2296084654374, -4.23006522722775, -4.21833162843474,
-4.18136751731289, -4.15421762373774, -4.1429397005941, -4.1201745144563,
-4.09281365829253, -4.07861070745388, -4.06858494389262, -4.06833641784592,
-4.05960342472729, -4.05166618507748, -4.04289335590868, -4.04080457305779,
-4.0433111118331, -4.04623539727417), dim = c(30L, 2L)))), class = c("XY",
"MULTIPOLYGON", "sfg"))

# the polygon contains degenerate vertices, so we fix it using s2_union()
#as_s2_geography(poly) # Error:  Loop 0 is not valid: Edge 0 is degenerate (duplicate vertex)
as_s2_geography(poly, check = FALSE) |> s2_union() |> s2_is_valid() # TRUE
# [1] TRUE

# now it gets weird... surely this tiny area does cover the entire Earth globe?
as_s2_geography(poly, check = FALSE) |> s2_union() |> s2_area() # ~ 510e6 km2
# [1] 488638373
as_s2_geography(poly, check = FALSE, oriented = FALSE) |> s2_union() |> s2_area() # ~ 510e6 km2
# [1] 488638373
as_s2_geography(poly, check = FALSE, oriented = FALSE, planar = TRUE) |> s2_union() |> s2_area() # ~ 510e6 km2
# [1] 488638373

# doing a roundtrip conversion to sf fixes the area calculation
as_s2_geography(poly, check = FALSE) |> s2_union() |> st_as_sf() |> as_s2_geography() |> s2_area() # ~ 488 km2
# [1] 488638373

# lwgeom also produces the correct calculation
lwgeom::st_geod_area(st_sfc(poly, crs = "WGS84")) # ~ 488 km2
# 486487599 [m^2]
sessionInfo()
# R version 4.4.2 (2024-10-31)
# Platform: x86_64-pc-linux-gnu
# Running under: Ubuntu 24.04.1 LTS

# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
# LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

# time zone: Europe/Berlin
# tzcode source: system (glibc)

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] s2_1.1.7.9000 sf_1.0-20    

# loaded via a namespace (and not attached):
#  [1] compiler_4.4.2     magrittr_2.0.3     class_7.3-23       tools_4.4.2       
#  [5] DBI_1.2.3          units_0.8-5.4      proxy_0.4-27       wk_0.9.4          
#  [9] Rcpp_1.0.13-1      KernSmooth_2.23-26 grid_4.4.2         e1071_1.7-16      
# [13] lwgeom_0.2-15      classInt_0.4-11   

What is your sessionInfo()?

@tzakharko
Copy link
Author

Thank you @edzer! I've upgraded to the development version of s2 (s2_1.1.7.9000) and it has fixed the issue. Do you plan to release it officially on CRAN any time soon? That would make things easier for the downstream users.

First of all, I believe there is no CW or CCW in spherical coordinates, because there is no natural "up" and "down"; there's "to the left" and "to the right" in winding direction.

As far as I understand, that is a question of terminology. The official S2 documentation also uses CW vs. CCW (and they state that s2 outer rings have CCW orientation). Here CCW is defined as right hand winding with the thumb aligned with the surface normal (pointing "outwards").

@tzakharko
Copy link
Author

tzakharko commented Jan 17, 2025

It seems I celebrated too early. Here is an example where it still fails for me using the development version:

poly <- structure(list(list(structure(c(146.524290362134, 146.524290362134, 
146.522452708975, 146.51993537588, 146.517418042786, 146.513985315839, 
146.512154528134, 146.51261222506, 146.514671861228, 146.516502648933, 
146.519706527417, 146.525885435922, 146.5272585267, 146.528402769016, 
146.528860465942, 146.528173920553, 146.528691142753, 146.527716223627, 
146.513756467376, 146.50528907424, 146.493388954157, 146.488354287968, 
146.482404227927, 146.480344591758, 146.479200349443, 146.481717682537, 
146.487210045652, 146.495677438788, 146.505975619629, 146.521079618196, 
146.531377799037, 146.541912483177, 146.530562760851, 146.483100285672, 
146.481774245432, 146.476393631571, 146.440538826952, 146.42222450229, 
146.401330695282, 146.416033744658, 146.423514243464, 146.433316276381, 
146.446729584584, 146.45756341044, 146.461948530429, 146.474845942163, 
146.48077875156, 146.48464797508, 146.485937716254, 146.486969509192, 
146.483616182142, 146.481036699795, 146.476135683336, 146.472008511582, 
146.466849546888, 146.465559805715, 146.464785961011, 146.464785961011, 
146.469131017065, 146.47530992557, 146.479200349443, 146.482861924853, 
146.486065803337, 146.488811984894, 146.492473560304, 146.492015863378, 
146.493160105694, 146.495448590325, 146.503687134998, 146.510781437355, 
146.523368102827, 146.524290362134, -6.25664811954823, -6.25664811954823, 
-6.26286593229153, -6.26538326538597, -6.26927368925921, -6.27980071856327, 
-6.28391999089964, -6.28803926323601, -6.29810859561381, -6.30291441333958, 
-6.30726253413907, -6.31641647266433, -6.31847610883253, -6.3205357450007, 
-6.32648580504212, -6.33724168280932, -6.35043084891866, -6.35051489367095, 
-6.36149961990129, -6.3681362253321, -6.37797670924677, -6.38415561775131, 
-6.3919364654978, -6.39925961631801, -6.41207513025338, -6.42214446263118, 
-6.43061185576705, -6.44022349121858, -6.44663124818627, -6.45349670208022, 
-6.45761597441659, -6.46247758921973, -6.46918424332112, -6.50116982442007, 
-6.50228275105007, -6.49136779150265, -6.4320396975288, -6.40418128818455, 
-6.36987417297359, -6.37529108590163, -6.37709672354431, -6.37967620589101, 
-6.37967620589101, -6.37916030942166, -6.37812851648298, -6.3729695517896, 
-6.36858443180024, -6.36419931181087, -6.35929829535216, -6.35155984831209, 
-6.34330550480269, -6.33995217775199, -6.33505116129328, -6.33143988600791, 
-6.32370143896785, -6.31854247427447, -6.31467325075444, -6.31286761311175, 
-6.31115295801231, -6.30840677645473, -6.30405865565523, -6.29787974715066, 
-6.29330277788805, -6.28460653628903, -6.2749949008375, -6.26492556845971, 
-6.26469671999658, -6.26492556845971, -6.26561211384911, -6.26309478075466, 
-6.25691587225009, -6.25664811954823), dim = c(72L, 2L)))), class = c("XY", 
"MULTIPOLYGON", "sfg"))

as_s2_geography(poly, check = FALSE) |> s2_union() |> s2_area() # 510e6 km2
lwgeom::st_geod_area(st_sfc(poly, crs = "WGS84")) # 135.6 km2
as_s2_geography(poly, check = FALSE) |> s2_union() |> st_as_sf() |> as_s2_geography() |> s2_area() # 136.2 km2

I suppose for now I will do the roundtrip conversion + sanity checks. Still, might be good to know why it works for some data and doesn't work for some other.

P.S. sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] s2_1.1.7.9000 sf_1.0-19    

loaded via a namespace (and not attached):
 [1] compiler_4.4.2     magrittr_2.0.3     class_7.3-22       tools_4.4.2        DBI_1.2.3          units_0.8-5        proxy_0.4-27       wk_0.9.4           Rcpp_1.0.13-1      KernSmooth_2.23-24 grid_4.4.2         e1071_1.7-16      
[13] classInt_0.4-10 

@paleolimbot
Copy link
Collaborator

Thank you for reporting! I'll take a closer look in the coming days as we move towards releasing the dev version to CRAN. Getting a bullet-proof st_make_valid() for S2 is hard and collecting examples like this one is mission critical for our journey to get there 🙂

@tzakharko
Copy link
Author

tzakharko commented Jan 19, 2025

@paleolimbot Appreciate your efforts! I also might be able to provide more test cases of similar kind if you are interested.

@edzer
Copy link
Member

edzer commented Jan 19, 2025

Here's a recipe to make Sudan valid on S2 in ne_countries():

library(sf)                                                                                                                 |   name_en                       geometry                                       
library(rnaturalearth)                                                                                                      |15   Sudan MULTIPOLYGON (((24.56737 8....                                       
ne = ne_countries()                                                                                                         |19  Russia MULTIPOLYGON (((178.7253 71...                                       
which(!st_is_valid(ne))                                                                                                     |> ne = st_make_valid(ne)                                                        
ne[c(15,19),"name_en"]                                                                                                      |> which(!st_is_valid(ne)) # 15, Sudan:                                          
ne = st_make_valid(ne)                                                                                                      |[1] 15                                                                          
which(!st_is_valid(ne)) # 15, Sudan:                                                                                        |> sud = st_geometry(ne)[15] |> st_set_crs(NA) |> st_make_valid() |> st_set_crs(s
# Sudan needs to be made valid in R2:                                                                                       |t_crs(ne))                                                                      
sud = st_geometry(ne)[15] |> st_set_crs(NA) |> st_make_valid() |> st_set_crs(st_crs(ne))                                    |> g = st_geometry(ne) # extract geometry                                        
g = st_geometry(ne) # extract geometry                                                                                      |> g[15] = sud         # replace Sudan geometry                                  
g[15] = sud         # replace Sudan geometry                                                                                |> st_geometry(ne) = g # replace back geometry                                   
st_geometry(ne) = g # replace back geometry                                                                                 |> all(st_is_valid(ne))                                                          
all(st_is_valid(ne))                

would be great if that weren't all needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants