Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mjohns 0.4.0 docs 20240124 #519

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.

> DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.

As of the 0.4.0 release, Mosaic issues the following ERROR when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:
:warning: **Mosaic 0.4.x series issues the following ERROR on a standard, non-Photon cluster [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:**

> DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.

Expand Down
213 changes: 11 additions & 202 deletions docs/source/api/raster-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ rst_combineavg
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_combineavg_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -229,58 +230,6 @@ rst_combineavg
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_combineavgagg
*****************

.. function:: rst_combineavgagg(tile)

Combines a group by statement over aggregated raster tiles by averaging the pixel values.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy()\
.agg(mos.rst_combineavgagg("tile").limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy()
.agg(rst_combineavgagg(col("tile")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_combineavgagg(tile)
FROM table
GROUP BY 1
+----------------------------------------------------------------------------------------------------------------+
| rst_combineavgagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+


rst_derivedband
**************
Expand All @@ -295,6 +244,7 @@ rst_derivedband
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_derivedband_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -364,96 +314,6 @@ rst_derivedband
+----------------------------------------------------------------------------------------------------------------+


rst_derivedbandagg
*****************

.. function:: rst_derivedbandagg(tile, python_func, func_name)

Combines a group by statement over aggregated raster tiles by using the provided python function.
The rasters must have the same extent, number of bands, and pixel type.
The rasters must have the same pixel size and coordinate reference system.
The output raster will have the same extent as the input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A grouped column containing raster tile(s).
:type tile: Column (RasterTileType)
:param python_func: A function to evaluate in python.
:type python_func: Column (StringType)
:param func_name: name of the function to evaluate in python.
:type func_name: Column (StringType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py
from textwrap import dedent
df\
.select(
"date", "tile",
F.lit(dedent(
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""")).alias("py_func1"),
F.lit("average").alias("func1_name")
)\
.groupBy("date", "py_func1", "func1_name")\
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df
.select(
"date", "tile"
lit(
"""
|import numpy as np
|def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
| out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
|""".stripMargin).as("py_func1"),
lit("average").as("func1_name")
)
.groupBy("date", "py_func1", "func1_name")
.agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql
SELECT
date, py_func1, func1_name,
rst_derivedbandagg(tile, py_func1, func1_name)
FROM SELECT (
date, tile,
"""
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs):
out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar)
""" as py_func1,
"average" as func1_name
FROM table
)
GROUP BY date, py_func1, func1_name
LIMIT 1
+----------------------------------------------------------------------------------------------------------------+
| rst_derivedbandagg(tile,py_func1,func1_name) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+


rst_frombands
**************

Expand Down Expand Up @@ -527,6 +387,7 @@ rst_fromcontent

.. tabs::
.. code-tab:: py

# binary is python bytearray data type
df = spark.read.format("binaryFile")\
.load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\
Expand All @@ -538,6 +399,7 @@ rst_fromcontent
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

//binary is scala/java Array(Byte) data type
val df = spark.read
.format("binaryFile")
Expand Down Expand Up @@ -910,9 +772,12 @@ rst_mapalgebra

Here are examples of the json_spec': (1) shows default indexing, (2) shows reusing an index,
and (3) shows band indexing.
(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'

.. code-block:: text

(1) '{"calc": "A+B/C"}'
(2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}'
(3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}'

:param tile: A column containing the raster tile.
:type tile: Column (RasterTileType)
Expand Down Expand Up @@ -1011,6 +876,7 @@ rst_merge
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.
Also, see :doc:`rst_merge_agg </api/spatial-aggregations>` function.

:param tiles: A column containing an array of raster tiles.
:type tiles: Column (ArrayType(RasterTileType))
Expand Down Expand Up @@ -1048,63 +914,6 @@ rst_merge
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_mergeagg
************

.. function:: rst_mergeagg(tile)

Combines a grouped aggregate of raster tiles into a single raster.
The rasters do not need to have the same extent.
The rasters must have the same coordinate reference system.
The rasters are combined using gdalwarp.
The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster.
The rasters are stacked in the order they are provided.
This order is randomized since this is an aggregation function.
If the order of rasters is important please first collect rasters and sort them by metadata information and then use
rst_merge function.
The output raster will have the extent covering all input rasters.
The output raster will have the same number of bands as the input rasters.
The output raster will have the same pixel type as the input rasters.
The output raster will have the same pixel size as the highest resolution input rasters.
The output raster will have the same coordinate reference system as the input rasters.

:param tile: A column containing raster tiles.
:type tile: Column (RasterTileType)
:rtype: Column: RasterTileType

:example:

.. tabs::
.. code-tab:: py

df.groupBy("date")\
.agg(mos.rst_mergeagg("tile")).limit(1).display()
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: scala

df.groupBy("date")
.agg(rst_mergeagg(col("tile"))).limit(1).show
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

.. code-tab:: sql

SELECT rst_mergeagg(tile)
FROM table
GROUP BY date
+----------------------------------------------------------------------------------------------------------------+
| rst_mergeagg(tile) |
+----------------------------------------------------------------------------------------------------------------+
| {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } |
+----------------------------------------------------------------------------------------------------------------+

rst_metadata
*************
Expand Down
Loading
Loading