-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-455] geoparquet.metadata data source for inspecting GeoParquet metadata #1180
[SEDONA-455] geoparquet.metadata data source for inspecting GeoParquet metadata #1180
Conversation
What will happen if the geoparquet file has more than 1 geometry column? |
The |
@Kontinuation does this work if the CRS is a projjson string? |
Sure. This is an example of the
The CRS may not be a PROJJSON object in old versions of example geoparquet files. They can still be properly loaded by geoparquet.metadata data source. |
…t metadata (apache#1180) * An initial implementation of geoparquet.metadata source for Spark 3.3.0 * Add documentation for geoparquet.metadata data source. * Port geoparquet.metadata data source to Spark 3.5 * Port geoparquet.metadata data source to Spark 3.4 * Use a more proper class for GeoParquetMetadataScan; fixed compilation for Spark 3.3 * Fix compatibility issues for Spark 3.0 ~ 3.2 * Add reference to G-Research/spark-extension for inspecting comprehensive parquet metadata * Add python test for geoparquet.metadata
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
This patch adds a new data source named
geoparquet.metadata
implemented using DataSourceV2 API. It produces a dataframe containing GeoParquet metadata for each data file. Here is an example resulting dataframe loaded from geoparquet.metadata:How was this patch tested?
Add new tests for the newly added data source.
Did this PR include necessary documentation updates?