Skip to content

Commit

Permalink
Merge pull request #38 from danielfromearth/feature/issue-24-test-the…
Browse files Browse the repository at this point in the history
…-docker-image-with-harmony

Feature/issue 24 test the docker image with harmony
  • Loading branch information
danielfromearth authored Oct 31, 2023
2 parents 1a5601b + 4420c19 commit da13976
Show file tree
Hide file tree
Showing 11 changed files with 95 additions and 52 deletions.
6 changes: 0 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,3 @@ repos:
- id: yamllint
args: ["-d {extends: relaxed, rules: {line-length: {max: 120}}}"]
stages: [commit, push]

- repo: https://github.com/pryorda/dockerfilelint-precommit-hooks
rev: v0.1.0
hooks:
- id: dockerfilelint
stages: [commit, push]
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- [issue/6](https://github.com/danielfromearth/batchee/issues/6): Create Adapter code that processes a Harmony Message and STAC Catalog
- [issue/7](https://github.com/danielfromearth/batchee/issues/7): Create working Docker image
- [issue/13](https://github.com/danielfromearth/batchee/issues/13): Add simple command line interface for testing
- [issue/16](https://github.com/danielfromearth/batchee/issues/16): Add a logo
- [issue/6](https://github.com/danielfromearth/batchee/issues/6): Create Adapter code that processes a Harmony Message and STAC Catalog
### Changed
- [issue/11](https://github.com/danielfromearth/batchee/issues/11): Rename from concat_batcher to batchee
- [issue/21](https://github.com/danielfromearth/batchee/issues/21): Improve CICD workflows
Expand Down
15 changes: 9 additions & 6 deletions docker/Dockerfile → Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ RUN apt-get update \
#hdf5-helpers \
&& pip3 install --upgrade pip \
&& pip3 install cython \
&& apt-get clean \
&& pip3 install poetry
&& pip3 install poetry \
&& apt-get clean


# Create a new user
RUN adduser --quiet --disabled-password --shell /bin/sh --home /home/dockeruser --gecos "" --uid 1000 dockeruser
Expand All @@ -28,12 +29,13 @@ ARG DIST_PATH

USER root
RUN mkdir -p /worker && chown dockeruser /worker
COPY ../pyproject.toml /worker
COPY pyproject.toml /worker
# COPY ../pyproject.toml /worker
USER dockeruser

WORKDIR /worker

ENV PYTHONPATH=${PYTHONPATH}:${PWD}
# ENV PYTHONPATH=${PYTHONPATH}:${PWD}

COPY --chown=dockeruser $DIST_PATH $DIST_PATH
USER dockeruser
Expand All @@ -46,5 +48,6 @@ RUN poetry config virtualenvs.create false
RUN poetry install --no-dev

USER dockeruser
# Run the Batchee Harmony service
ENTRYPOINT ["batchee_harmony"]
COPY --chown=dockeruser ./docker-entrypoint.sh docker-entrypoint.sh
# Run the service
ENTRYPOINT ["./docker-entrypoint.sh"]
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,36 @@
_____

_Batchee_ groups together filenames so that further operations (such as concatenation) can be performed separately on each group of files.

## Installing
_____

For local development, one can clone the repository and then use poetry or pip from the local directory:

```shell
git clone <Repository URL>
```

###### (Option A) using poetry:
i) Follow the instructions for installing `poetry` [here](https://python-poetry.org/docs/).

ii) Run ```poetry install``` from the repository directory.

###### (Option B) using pip: Run ```pip install .``` from the repository directory.

## Usage
_____

```shell
batchee [file_names ...]
```

###### Or, If installed using a `poetry` environment:
```shell
poetry run batchee [file_names ...]
```

#### Options

- `-h`, `--help` show this help message and exit
- `-v`, `--verbose` Enable verbose output to stdout; useful for debugging
21 changes: 8 additions & 13 deletions batcher/harmony/cli.py
Original file line number Diff line number Diff line change
@@ -1,38 +1,33 @@
"""A Harmony CLI wrapper around the concatenate-batcher"""
import sys
from argparse import ArgumentParser

import harmony

from batcher.harmony.service_adapter import ConcatBatching as HarmonyAdapter


def main(argv, **kwargs):
"""Main Harmony CLI entrypoint
def main(config: harmony.util.Config = None) -> None:
"""Parse command line arguments and invoke the service to respond to them.
Parses command line arguments and invokes the appropriate method to respond to them
Parameters
----------
config : harmony.util.Config
harmony.util.Config is injectable for tests
Returns
-------
None
"""

config = None
# Optional: harmony.util.Config is injectable for tests
if "config" in kwargs:
config = kwargs.get("config")

parser = ArgumentParser(
prog="Pre-concatenate-batching", description="Run the pre-concatenate-batching service"
)
harmony.setup_cli(parser)

args = parser.parse_args(argv[1:])
args = parser.parse_args()
if harmony.is_harmony_cli(args):
harmony.run_cli(parser, args, HarmonyAdapter, cfg=config)
else:
parser.error("Only --harmony CLIs are supported")


if __name__ == "__main__":
main(sys.argv)
main()
13 changes: 11 additions & 2 deletions batcher/harmony/service_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def invoke(self):

def process_catalog(self, catalog: pystac.Catalog):
"""Converts a list of STAC catalogs into a list of lists of STAC catalogs."""
self.logger.info("process_catalog() started.")
try:
result = catalog.clone()
result.id = str(uuid4())
Expand All @@ -58,16 +59,20 @@ def process_catalog(self, catalog: pystac.Catalog):
# Get all the items from the catalog, including from child or linked catalogs
items = list(self.get_all_catalog_items(catalog))

self.logger.info(f"length of items==={len(items)}.")

# Quick return if catalog contains no items
if len(items) == 0:
return result

# # --- Get granule filepaths (urls) ---
netcdf_urls: list[str] = _get_netcdf_urls(items)
self.logger.info(f"netcdf_urls==={netcdf_urls}.")

# --- Map each granule to an index representing the batch to which it belongs ---
batch_indices: list[int] = get_batch_indices(netcdf_urls)
batch_indices: list[int] = get_batch_indices(netcdf_urls, self.logger)
sorted(set(batch_indices), key=batch_indices.index)
self.logger.info(f"batch_indices==={batch_indices}.")

# --- Construct a dictionary with a separate key for each batch ---
grouped: dict[int, list[Item]] = {}
Expand All @@ -83,14 +88,16 @@ def process_catalog(self, catalog: pystac.Catalog):
bounding_box = _get_output_bounding_box(batch_items)
properties = _get_output_date_range(batch_items)

self.logger.info(f"constructing new pystac.Item for batch_id==={batch_id}.")

# Construct a new pystac.Item with every granule in the batch as a pystac.Asset
output_item = Item(
str(uuid4()), bbox_to_geometry(bounding_box), bounding_box, None, properties
)

for idx, item in enumerate(batch_items):
output_item.add_asset(
"data",
f"data_{idx}",
Asset(
batch_urls[idx],
title=batch_urls[idx],
Expand All @@ -101,6 +108,8 @@ def process_catalog(self, catalog: pystac.Catalog):

result.add_item(output_item)

self.logger.info("STAC catalog creation complete.")

return result

except Exception as service_exception:
Expand Down
8 changes: 7 additions & 1 deletion batcher/tempo_filename_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
from argparse import ArgumentParser
from pathlib import Path

default_logger = logging.getLogger(__name__)

tempo_granule_filename_pattern = re.compile(
r"^.*TEMPO_"
r"(?P<product_type>[1-9A-Z]+)"
Expand All @@ -17,13 +19,15 @@
)


def get_batch_indices(filenames: list) -> list[int]:
def get_batch_indices(filenames: list, logger: logging.Logger = default_logger) -> list[int]:
"""
Returns
-------
list[int]
batch index for each filename in the original list, e.g. [0, 0, 0, 1, 1, 1, ...]
"""
logger.info(f"get_batch_indices() starting --- with {len(filenames)} filenames")

# Make a new list with days and scans, e.g. [('20130701', 'S009'), ('20130701', 'S009'), ...]
day_and_scans: list[tuple[str, str]] = []
for name in filenames:
Expand All @@ -35,6 +39,8 @@ def get_batch_indices(filenames: list) -> list[int]:
# Unique day-scans are determined (while keeping the same order). Each will be its own batch.
unique_day_scans: list[tuple[str, str]] = sorted(set(day_and_scans), key=day_and_scans.index)

logger.info(f"unique_day_scans==={unique_day_scans}.")

# Map each day/scan to an integer
batch_mapper: dict[tuple[str, str], int] = {
day_scan: idx for idx, day_scan in enumerate(unique_day_scans)
Expand Down
10 changes: 10 additions & 0 deletions docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
set -e

if [ "$1" = 'batchee' ]; then
exec batchee "$@"
elif [ "$1" = 'batchee_harmony' ]; then
exec batchee_harmony "$@"
else
exec batchee_harmony "$@"
fi
8 changes: 0 additions & 8 deletions docker/Readme.md

This file was deleted.

19 changes: 9 additions & 10 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,18 @@ packages = [
python = "^3.10"
harmony-service-lib = "^1.0.23"

[tool.poetry.scripts]
batchee_harmony = 'batcher.harmony.cli:main'
batchee = 'batcher.tempo_filename_parser:main'

[tool.poetry.group.dev.dependencies]
coverage = "^7.3.2"
ruff = "^0.1.3"
pytest = "^7.4.3"
black = "^23.10.1"
mypy = "^1.6.1"
ruff = "^0.1.3"
pytest-cov = "^4.1.0"

[tool.poetry.scripts]
batchee_harmony = 'batcher.harmony.cli:main'
batchee = 'batcher.tempo_filename_parser:main'

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Expand Down

0 comments on commit da13976

Please sign in to comment.