Skip to content

Commit

Permalink
Merge pull request #5 from clemensv/usgs
Browse files Browse the repository at this point in the history
USGS Instantaneous Values Service
  • Loading branch information
clemensv authored Sep 24, 2024
2 parents dd424dd + cedf61b commit 885e17f
Show file tree
Hide file tree
Showing 90 changed files with 19,666 additions and 5 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The container image documentation provides detailed information:
* [NOAA Tides ands Currents - Water level and current data](noaa/CONTAINER.md)
* [RSS Feeds - News and blog posts](rss/CONTAINER.md)
* [Pegelonline - Water level and current data](pegelonline/CONTAINER.md)
* [USGS Instantaneous Values - Water quality and quantity data](usgs-iv/CONTAINER.md)

Details about the tools and the data sources are provided in the respective
README files.
Expand Down Expand Up @@ -73,6 +74,15 @@ retrieve real-time data from the [Nextbus](https://www.nextbus.com/) service and
feed that data into Azure Event Hubs and Microsft Fabric Event Streams. The tool
can also be used to query the Nextbus service interactively.

### USGS Instantaneous Values - Water quality and quantity data

The [USGS Instantaneous Values tool](usgs-iv/README.md) is a command line tool that
can be used to retrieve real-time water quality and quantity data from the
United States Geological Survey (USGS) Instantaneous Values API. The data is
available for over 1.5 million stations in the United States and its territories.
The USGS data is updated every 15 minutes, and the data volume is relatively low.


### Forza Motorsport PC - Racing game telemetry data

The
Expand Down
4 changes: 0 additions & 4 deletions pegelonline/EVENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ This document describes the events that are emitted by the PegelOnline API Bridg

### Message: de.wsv.pegelonline.Station

*A PEGELONLINE station with location and water body information.*

#### CloudEvents Attributes:

| **Name** | **Description** | **Type** | **Required** | **Value** |
Expand Down Expand Up @@ -57,8 +55,6 @@ This document describes the events that are emitted by the PegelOnline API Bridg

### Message: de.wsv.pegelonline.CurrentMeasurement

*The current measurement for a PEGELONLINE station.*

#### CloudEvents Attributes:

| **Name** | **Description** | **Type** | **Required** | **Value** |
Expand Down
3 changes: 2 additions & 1 deletion tools/generate-events-md.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ pushd $PSScriptRoot
python .\printdoc.py ..\gtfs\xreg\gtfs.xreg.json --title "GTFS API Bridge Events" --description "This document describes the events that are emitted by the GTFS API Bridge." > ..\gtfs\EVENTS.md
python .\printdoc.py ..\pegelonline\xreg\pegelonline.xreg.json --title "PegelOnline API Bridge Events" --description "This document describes the events that are emitted by the PegelOnline API Bridge." > ..\pegelonline\EVENTS.md
python .\printdoc.py ..\rss\xreg\feeds.xreg.json --title "RSS API Bridge Events" --description "This document describes the events that are emitted by the RSS API Bridge." > ..\rss\EVENTS.md
python .\printdoc.py ..\noaa\noaa\noaa.xreg.json --title "NOAA Tides and Currents API Bridge Events" --description "This document describes the events that are emitted by the NOAA API Bridge." > ..\noaa\EVENTS.md
python .\printdoc.py ..\noaa\xreg\noaa.xreg.json --title "NOAA Tides and Currents API Bridge Events" --description "This document describes the events that are emitted by the NOAA API Bridge." > ..\noaa\EVENTS.md
python .\printdoc.py ..\usgs-iv\xreg\usgs_iv.xreg.json --title "USGS Instantaneous Values API Bridge Events" --description "This document describes the events that are emitted by the USGS Instantaneous Values API Bridge." > ..\usgs-iv\EVENTS.md

popd
145 changes: 145 additions & 0 deletions usgs-iv/CONTAINER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# USGS Instantaneous Values Service bridge to Apache Kafka, Azure Event Hubs, and Fabric Event Streams

This container image provides a bridge between the USGS Instantaneous Values
Service and Apache Kafka, Azure Event Hubs, and Fabric Event Streams. The bridge
fetches entries from specified feeds and forwards them to the configured Kafka
endpoints.

## Functionality

The bridge retrieves data from the USGS Instantaneous Values Service and writes the entries to a
Kafka topic as [CloudEvents](https://cloudevents.io/) in a JSON format, which is
documented in [EVENTS.md](EVENTS.md). You can specify multiple feed URLs by
providing them in the configuration.

## Database Schemas and handling

If you want to build a full data pipeline with all events ingested into
database, the integration with Fabric Eventhouse and Azure Data Explorer is
described in [DATABASE.md](../DATABASE.md).

## Installing the Container Image

Pull the container image from the GitHub Container Registry:

```shell
$ docker pull ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

To use it as a base image in a Dockerfile:

```dockerfile
FROM ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

## Using the Container Image

The container defines a command that starts the bridge, reading data from the
USGS services and writing it to Kafka, Azure Event Hubs, or
Fabric Event Streams.

### With a Kafka Broker

Ensure you have a Kafka broker configured with TLS and SASL PLAIN
authentication. Run the container with the following command:

```shell
$ docker run --rm \
-e KAFKA_BOOTSTRAP_SERVERS='<kafka-bootstrap-servers>' \
-e KAFKA_TOPIC='<kafka-topic>' \
-e SASL_USERNAME='<sasl-username>' \
-e SASL_PASSWORD='<sasl-password>' \
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

### With Azure Event Hubs or Fabric Event Streams

Use the connection string to establish a connection to the service. Obtain the
connection string from the Azure portal, Azure CLI, or the "custom endpoint" of
a Fabric Event Stream.

```shell
$ docker run --rm \
-e CONNECTION_STRING='<connection-string>' \
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

### Preserving State Between Restarts

To preserve the state between restarts and avoid reprocessing feed entries,
mount a volume to the container and set the `USGS_LAST_POLLED_FILE` environment variable:

```shell
$ docker run --rm \
-v /path/to/state:/mnt/state \
-e USGS_LAST_POLLED_FILE='/mnt/state/usgs_last_polled.json' \
... other args ... \
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

## Environment Variables

### `CONNECTION_STRING`

An Azure Event Hubs-style connection string used to connect to Azure Event Hubs
or Fabric Event Streams. This replaces the need for `KAFKA_BOOTSTRAP_SERVERS`,
`SASL_USERNAME`, and `SASL_PASSWORD`.

### `KAFKA_BOOTSTRAP_SERVERS`

The address of the Kafka broker. Provide a comma-separated list of host and port
pairs (e.g., `broker1:9092,broker2:9092`). The client communicates with
TLS-enabled Kafka brokers.

### `KAFKA_TOPIC`

The Kafka topic where messages will be produced.

### `SASL_USERNAME`

Username for SASL PLAIN authentication. Ensure your Kafka brokers support SASL PLAIN authentication.

### `SASL_PASSWORD`

Password for SASL PLAIN authentication.

### `USGS_LAST_POLLED_FILE`

The file path where the bridge stores the state of processed entries. This helps
in resuming data fetching without duplication after restarts. Default is
`/mnt/state/usgs_last_polled.json`.

## Deploying into Azure Container Instances

You can deploy the USGS Instananeous Values Service bridge as a container directly to Azure Container
Instances providing the information explained above. Just click the button below and go.

[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fclemensv%2Freal-time-sources%2Fmain%2Fusgs_iv%2Fazure-template.json)

## Additional Information

- **Source Code**: [GitHub Repository](https://github.com/clemensv/real-time-sources/tree/main/usgs_iv)
- **Documentation**: Refer to [EVENTS.md](EVENTS.md) for the JSON event format.
- **License**: MIT

## Example

To run the bridge fetching entries from multiple feeds every 10 minutes and sending them to an Azure Event Hub:

```shell
$ docker run --rm \
-e CONNECTION_STRING='Endpoint=sb://...;SharedAccessKeyName=...;SharedAccessKey=...;EntityPath=...' \
-v /path/to/state:/mnt/state \
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
```

This setup allows you to integrate USGS services data into your data processing pipelines, enabling real-time data analysis and monitoring.

## Notes

- Ensure that you have network connectivity to the USGS services.
- The bridge efficiently handles data fetching and forwarding, but monitor resource usage if you are fetching data from many feeds at a high frequency.

## Support

For issues or questions, please open an issue on the [GitHub repository](https://github.com/clemensv/real-time-sources/issues).
24 changes: 24 additions & 0 deletions usgs-iv/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Use an official Python runtime as a parent image
FROM python:3.11-slim

LABEL org.opencontainers.image.source = "https://github.com/clemensv/real-time-sources/tree/main/usgs_iv"
LABEL org.opencontainers.image.title = "USGS Instantaneous Values Service bridge to Kafka endpoints"
LABEL org.opencontainers.image.description = "This container is a bridge between USGS feeds and Kafka endpoints. It fetches entries from feeds and forwards them to the configured Kafka endpoints."
LABEL org.opencontainers.image.documentation = "https://github.com/clemensv/real-time-sources/blob/main/usgs_iv/CONTAINER.md"
LABEL org.opencontainers.image.license = "MIT"

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install the required Python packages
RUN pip install .

# Define environment variables (default values)
ENV CONNECTION_STRING=""
ENV LOG_LEVEL="INFO"

# Run the application
CMD ["python", "-m", "usgs_iv", "feed"]
Loading

0 comments on commit 885e17f

Please sign in to comment.