Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: hotosm/raw-data-api
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 1.5.11
Choose a base ref
...
head repository: hotosm/raw-data-api
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: develop
Choose a head ref

Commits on Jan 29, 2025

  1. Copy the full SHA
    22c20f1 View commit details
  2. Copy the full SHA
    ab65795 View commit details
  3. Copy the full SHA
    42589bf View commit details
  4. Black code

    emi420 committed Jan 29, 2025
    Copy the full SHA
    533caec View commit details
  5. - Remove debug parameter

    emi420 committed Jan 29, 2025
    Copy the full SHA
    909f541 View commit details
  6. Copy the full SHA
    1a38f9a View commit details

Commits on Jan 30, 2025

  1. Merge pull request #291 from hotosm/refactor/multipleCustomviz

    Refactor for custom visualizations for HDX custom exports
    kshitijrajsharma authored Jan 30, 2025
    Copy the full SHA
    144b898 View commit details

Commits on Feb 3, 2025

  1. Copy the full SHA
    10fa2bc View commit details
  2. Copy the full SHA
    364e3da View commit details

Commits on Feb 4, 2025

  1. Merge pull request #292 from hotosm/bump/snapshot_fix

    Fix error with param type on S3FileTransfer.upload
    kshitijrajsharma authored Feb 4, 2025
    Copy the full SHA
    c15035b View commit details

Commits on Feb 9, 2025

  1. Copy the full SHA
    e2843ed View commit details

Commits on Feb 10, 2025

  1. fix(app.py): timeout reset

    reset timeout with dropback
    kshitijrajsharma committed Feb 10, 2025
    Copy the full SHA
    245c62f View commit details
  2. Copy the full SHA
    eef1125 View commit details
  3. Copy the full SHA
    26538a8 View commit details
  4. Copy the full SHA
    d0d0034 View commit details
  5. Copy the full SHA
    a01c429 View commit details

Commits on Feb 18, 2025

  1. Copy the full SHA
    05ef83e View commit details
  2. Copy the full SHA
    c0c2662 View commit details
  3. Copy the full SHA
    d03e0ea View commit details
  4. Copy the full SHA
    532e977 View commit details
  5. Copy the full SHA
    2cfb6d9 View commit details
13 changes: 13 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# These are supported funding model platforms

github: hotosm
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -24,3 +24,7 @@ Pipfile.lock
#backend
backend/data
backend/.env

# OS files
.DS_Store

9 changes: 6 additions & 3 deletions API/api_worker.py
Original file line number Diff line number Diff line change
@@ -16,10 +16,11 @@

# Reader imports
from src.app import CustomExport, PolygonStats, RawData, S3FileTransfer
from src.config import ALLOW_BIND_ZIP_FILTER
from src.config import ALLOW_BIND_ZIP_FILTER, CELERY_BROKER_HEARTBEAT
from src.config import CELERY_BROKER_URL as celery_broker_uri
from src.config import CELERY_RESULT_BACKEND as celery_backend
from src.config import (
CELERY_WORKER_LOST_WAIT,
DEFAULT_HARD_TASK_LIMIT,
DEFAULT_README_TEXT,
DEFAULT_SOFT_TASK_LIMIT,
@@ -50,13 +51,15 @@
celery = Celery("Raw Data API")
celery.conf.broker_url = celery_broker_uri
celery.conf.result_backend = celery_backend
celery.conf.broker_heartbeat = CELERY_BROKER_HEARTBEAT
celery.conf.worker_lost_wait = CELERY_WORKER_LOST_WAIT
# celery.conf.task_serializer = "pickle"
# celery.conf.result_serializer = "json"
# celery.conf.accept_content = ["application/json", "application/x-python-serialize"]
celery.conf.task_track_started = True
celery.conf.update(result_extended=True)
# celery.conf.task_reject_on_worker_lost = True
# celery.conf.task_acks_late = True
celery.conf.task_reject_on_worker_lost = False
celery.conf.task_acks_late = False # to avoid task duplication

if WORKER_PREFETCH_MULTIPLIER:
celery.conf.update(worker_prefetch_multiplier=WORKER_PREFETCH_MULTIPLIER)
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,42 @@
## 1.5.17 (2025-02-18)

### Fix

- **taskreject**: avoid worker acks

## 1.5.16 (2025-02-18)

### Fix

- **workers**: avoid requeue of task if not acks
- **worker**: set task_acks_late to False in api_worker.py

## 1.5.15 (2025-02-11)

### Fix

- **worker**: added worker lost wait in worker
- **worker**: fixes bug on the heartbeat

## 1.5.14 (2025-02-10)

### Fix

- **app.py**: timeout reset

## 1.5.13 (2025-02-03)

### Fix

- **S3FileTransfer.upload**: fix upload when file_path is not a string

## 1.5.12 (2025-01-29)

### Refactor

- Remove unused stats collection, fix stats templates
- Multiple custom visualizations for custom/HDX exports

## 1.5.11 (2025-01-28)

### Fix
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "raw-data-api"
version = "1.5.11"
version = "1.5.17"
description = "Set of high-performant APIs for transforming and exporting OpenStreetMap (OSM) data in different GIS file formats."
readme = "README.md"
authors = [
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ humanize==4.9.0
python-slugify==8.0.1
geomet==1.1.0
PyYAML==6.0.1
geojson-stats==0.2.5
geojson-stats==0.2.6
transliterate==1.10.2

## documentation
Binary file removed src/.DS_Store
Binary file not shown.
28 changes: 23 additions & 5 deletions src/app.py
Original file line number Diff line number Diff line change
@@ -949,7 +949,7 @@ def upload(self, file_path, file_name, file_suffix=None):
start_time = time.time()

try:
if file_path[-5:] == ".html":
if type(file_path) == str and file_path[-5:] == ".html":
self.s_3.upload_file(
str(file_path),
BUCKET_NAME,
@@ -1063,6 +1063,7 @@ def get_osm_analytics_meta_stats(self):
MAX_RETRIES = 2 # Maximum number of retries
INITIAL_DELAY = 1 # Initial delay in seconds
MAX_DELAY = 8
API_TIMEOUT = 10

retries = 0
delay = INITIAL_DELAY
@@ -1071,7 +1072,9 @@ def get_osm_analytics_meta_stats(self):
try:
query = generate_polygon_stats_graphql_query(self.INPUT_GEOM)
payload = {"query": query}
response = requests.post(self.API_URL, json=payload, timeout=45)
response = requests.post(
self.API_URL, json=payload, timeout=API_TIMEOUT
)
response.raise_for_status()
return response.json()
except Exception as e:
@@ -1299,6 +1302,7 @@ def __init__(self, params, uid=None):
self.default_export_base_name = (
self.iso3.upper() if self.iso3 else self.params.dataset.dataset_prefix
)

self.default_export_path = os.path.join(
export_path,
self.uuid,
@@ -1307,6 +1311,7 @@ def __init__(self, params, uid=None):
)
if os.path.exists(self.default_export_path):
shutil.rmtree(self.default_export_path, ignore_errors=True)

os.makedirs(self.default_export_path)

if USE_DUCK_DB_FOR_CUSTOM_EXPORTS is True:
@@ -1941,9 +1946,22 @@ def add_resource(self, resource_meta):

# Add customviz if available
if resource_meta.get("stats_html"):
self.dataset.update(
{"customviz": [{"url": resource_meta["stats_html"]}]}
)
dataset_customviz = self.dataset.get("customviz")
if not dataset_customviz:
dataset_customviz = [
{
"name": resource_meta["name"],
"url": resource_meta["stats_html"],
}
]
else:
dataset_customviz.append(
{
"name": resource_meta["name"],
"url": resource_meta["stats_html"],
}
)
self.dataset.update({"customviz": dataset_customviz})

def upload_dataset(self, dump_config_to_s3=False):
"""
7 changes: 7 additions & 0 deletions src/config.py
Original file line number Diff line number Diff line change
@@ -52,6 +52,13 @@ def get_bool_env_var(key, default=False):
"CELERY", "CELERY_RESULT_BACKEND", fallback="redis://localhost:6379"
)

CELERY_BROKER_HEARTBEAT = os.environ.get("CELERY_BROKER_HEARTBEAT") or config.get(
"CELERY", "CELERY_BROKER_HEARTBEAT", fallback=120
)
CELERY_WORKER_LOST_WAIT = os.environ.get("CELERY_WORKER_LOST_WAIT") or config.get(
"CELERY", "CELERY_WORKER_LOST_WAIT ", fallback=10
)

WORKER_PREFETCH_MULTIPLIER = int(
os.environ.get("WORKER_PREFETCH_MULTIPLIER")
or config.get("CELERY", "WORKER_PREFETCH_MULTIPLIER", fallback=1)
43 changes: 3 additions & 40 deletions src/post_processing/geojson_stats.py
Original file line number Diff line number Diff line change
@@ -1,61 +1,24 @@
from geojson_stats.stats import Stats
from geojson_stats.html import Html

CONFIG_AREA = ["building"]
CONFIG_LENGTH = ["highway", "waterway"]


class GeoJSONStats(Stats):
"""Used for collecting stats while processing GeoJSON files line by line"""
"""Used for collecting stats while processing GeoJSON files"""

def __init__(self, filters, *args, **kwargs):
super().__init__(*args, **kwargs)

self.config.clean = True
self.config.properties_prop = "properties.tags"

if filters and filters.tags:
for tag in CONFIG_AREA:
if self.check_filter(filters.tags, tag):
self.config.keys.append(tag)
self.config.value_keys.append(tag)
self.config.area = True

for tag in CONFIG_LENGTH:
if self.check_filter(filters.tags, tag):
self.config.keys.append(tag)
self.config.value_keys.append(tag)
self.config.length = True

def check_filter(self, tags, tag):
"""
Check if a tag is present in tag filters
"""

if tags.all_geometry:
if tags.all_geometry.join_or and tag in tags.all_geometry.join_or:
return True
if tags.all_geometry.join_and and tag in tags.all_geometry.join_and:
return True
if tags.polygon:
if tags.polygon.join_or and tag in tags.polygon.join_or:
return True
if tags.polygon.join_and and tag in tags.polygon.join_and:
return True
if tags.line:
if tags.line.join_or and tag in tags.line.join_or:
return True
if tags.line.join_and and tag in tags.line.join_and:
return True

def raw_data_line_stats(self, json_object: dict):
"""
Process a GeoJSON line (for getting stats) and return that line
"""
self.get_object_stats(json_object)

def html(self, tpl):
def html(self, tpl, tpl_params):
"""
Returns stats Html object, generated from stats data using a template
"""
return Html(tpl, self)
return Html(tpl, self, tpl_params)
42 changes: 23 additions & 19 deletions src/post_processing/processor.py
Original file line number Diff line number Diff line change
@@ -4,9 +4,17 @@
import os
import pathlib

CATEGORIES_CONFIG = {
"roads": {"tag": "highway", "length": True, "area": False},
"buildings": {"tag": "building", "length": False, "area": True},
"waterways": {"tag": "waterway", "length": True, "area": False},
"railways": {"tag": "railway", "length": True, "area": False},
"default": {"tag": None, "length": False, "area": False},
}


class PostProcessor:
"""Used for posst-process data while processing GeoJSON files line by line"""
"""Used for post-process GeoJSON files"""

options = {}
filters = {}
@@ -27,6 +35,13 @@ def post_process_line(self, line: str):

return json.dumps(line_object)

def get_categories_config(self, category_name):
"""
Get configuration for categories
"""
config = CATEGORIES_CONFIG.get(category_name)
return config if config else CATEGORIES_CONFIG["default"]

def custom(
self, category_name, export_format_path, export_filename, file_export_path
):
@@ -35,25 +50,12 @@ def custom(
"""
self.geoJSONStats.config.properties_prop = "properties"

category_tag = ""
if category_name == "roads":
category_tag = "highway"
self.geoJSONStats.config.length = True
elif category_name == "buildings":
category_tag = "building"
self.geoJSONStats.config.area = True
elif category_name == "waterways":
category_tag = "waterway"
self.geoJSONStats.config.length = True
elif category_name == "railways":
category_tag = "railway"
self.geoJSONStats.config.length = True
category_config = self.get_categories_config(category_name)
category_tag = category_config["tag"]
self.geoJSONStats.config.length = category_config["length"]
self.geoJSONStats.config.area = category_config["area"]

if self.options["include_stats"]:
if category_tag:
self.geoJSONStats.config.keys.append(category_tag)
self.geoJSONStats.config.value_keys.append(category_tag)

path_input = os.path.join(export_format_path, f"{export_filename}.geojson")
path_output = os.path.join(
export_format_path, f"{export_filename}-post.geojson"
@@ -102,7 +104,9 @@ def custom(
project_root,
"{tpl}_tpl.html".format(tpl=tpl),
)
geojson_stats_html = self.geoJSONStats.html(tpl_path).build()
geojson_stats_html = self.geoJSONStats.html(
tpl_path, {"title": f"{export_filename}.geojson"}
).build()
upload_html_path = os.path.join(file_export_path, "stats-summary.html")
with open(upload_html_path, "w") as f:
f.write(geojson_stats_html)
20 changes: 13 additions & 7 deletions src/post_processing/stats_building_tpl.html
Original file line number Diff line number Diff line change
@@ -3,31 +3,31 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/hot.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Archivo:ital,wght@0,100..900;1,100..900&display=swap" rel="stylesheet">
<title>HOT Export Stats</title>
<style type="text/css">

:root,
:host,
.hot-theme-light {
--hot-color-red-700: #C53639;
--hot-color-gray-950: #2C3038;
--hot-font-sans: Archivo, -apple-system, Roboto, Helvetica, Arial, sans-serif,
'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol';
--hot-font-size-2x-large: 2.25rem; /* 36px */
--hot-font-size-medium: 1rem; /* 16px */
--hot-font-size-small: 0.875rem; /* 14px */
--hot-font-weight-normal: 400;
--hot-border-radius-medium: 0.25rem; /* 4px */
--hot-color-neutral-0: #fff;
--hot-color-gray-200: #C4C3C5;
--hot-color-red-700: #C53639;
--hot-color-gray-50: #F3F3F3;
--hot-color-gray-200: #C4C3C5;
--hot-color-gray-100: #E1E0E1;
--hot-color-gray-400: #9A969B;
--hot-color-gray-950: #2C3038;
--hot-spacing-medium: 1rem;
--hot-spacing-x-small: 0.5rem; /* 8px */
--hot-color-gray-100: #E1E0E1;
}
body {
font-family: var(--hot-font-sans);
@@ -42,6 +42,11 @@
vertical-align: baseline;
background: transparent;
}
h2 {
margin-left: var(--hot-spacing-medium);
color: var(--hot-color-gray-400);
font-weight: var(--hot-font-weight-normal);
}
.container {
display: flex;
width: 100%;
@@ -96,9 +101,10 @@
</head>
<body>
<div id="root">
<h2>${title}</h2>
<div class="container">
<div class="box featured">
<h3>${key_building_area}</h3>
<h3>${area}</h3>
<h4>Km2 of buildings</h4>
</div>
<div class="box">
Loading