Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refacto pipeline #101

Open
wants to merge 524 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
524 commits
Select commit Hold shift + click to select a range
a79e47e
Merge branch 'main' into feat/refacto_pipeline_first_steps
tgrandje Sep 13, 2024
18659cb
Update pipeline.yaml
tgrandje Sep 13, 2024
11ad113
Merge branch 'main' into feat/refacto_pipeline_first_steps
tgrandje Sep 13, 2024
1de24e4
restrict fix-geometry to topjson format
tgrandje Sep 14, 2024
fdd618c
Update geodata pod output
tgrandje Sep 14, 2024
6abd1a3
update crossproduct with selection of best upstream candidate
tgrandje Sep 14, 2024
5733155
update result from geodatasets preparation
tgrandje Sep 14, 2024
f0e6f13
Update pipeline
tgrandje Sep 14, 2024
f681bb8
Test map-reduce on pipeline for geodata generation
tgrandje Sep 14, 2024
c10c4d3
Bugfix fanout
tgrandje Sep 14, 2024
b6b0752
Fix selection of vintage in dev environment
tgrandje Sep 14, 2024
190e4f5
FIx pipeline quotes
tgrandje Sep 14, 2024
54a1503
Typo
tgrandje Sep 14, 2024
428b14f
bugfix map-reduce
tgrandje Sep 14, 2024
9d23b59
Fix output dir creation
tgrandje Sep 14, 2024
b617625
Bugfix years/year
tgrandje Sep 14, 2024
7f536ae
Move source tagging during geodataset creation & fix pipeline
tgrandje Sep 15, 2024
b6cdb10
Fix output
tgrandje Sep 15, 2024
f854524
handle missing base geodataset
tgrandje Sep 15, 2024
8f61e2a
Update result dir creation
tgrandje Sep 15, 2024
a979079
fix typo
tgrandje Sep 15, 2024
1babe78
Fix typo
tgrandje Sep 15, 2024
2863e99
Update output path
tgrandje Sep 15, 2024
bf9fc95
shorten output for argo
tgrandje Sep 15, 2024
379a8a6
Update crossproduct input preprocessing
tgrandje Sep 15, 2024
72cdc57
Fix geodata output
tgrandje Sep 15, 2024
7ac7f13
Fix metadata output
tgrandje Sep 15, 2024
763e366
Update crossproduct default parameters
tgrandje Sep 15, 2024
c4254c6
Fix crossproduct input handling
tgrandje Sep 15, 2024
1bef525
test fanout step to avoid reaching limit of output number of characters
tgrandje Sep 16, 2024
1eddb49
Fix typos
tgrandje Sep 16, 2024
d2ae836
Fix file path
tgrandje Sep 16, 2024
f258aac
Update config from environment + set warning
tgrandje Sep 16, 2024
2ba6573
Fix result json path for fanout
tgrandje Sep 16, 2024
7d18ff7
Update .gitignore
tgrandje Sep 16, 2024
62b0cb5
Update log
tgrandje Sep 16, 2024
391c80c
Set intermediate format from config
tgrandje Sep 16, 2024
c25aa2d
clean garbage
tgrandje Sep 16, 2024
e622ec7
Add EPCI & ARRONDISSEMENT to configs/constants
tgrandje Sep 16, 2024
ca55657
refacto crossproduct
tgrandje Sep 16, 2024
01f4174
handle ARRONDISSEMENT & CANTON from metadata
tgrandje Sep 16, 2024
52764cb
Refacto split_merge_tiles
tgrandje Sep 16, 2024
9c76556
Update pipeline.yaml
tgrandje Sep 16, 2024
dd2a893
Add to_frame method to S3Dataset
tgrandje Sep 16, 2024
1a536ba
Update mapshaper_split_from_s3.py
tgrandje Sep 16, 2024
ec7aa3d
Refacto enrich
tgrandje Sep 16, 2024
0471092
Fix metadata with lowercase columns
tgrandje Sep 17, 2024
6d7a2d8
fix type hint
tgrandje Sep 17, 2024
683f7cc
Fix CANTON's available territorial splits
tgrandje Sep 17, 2024
91bd8df
Add _get_columns + start fixing enrich with rename arg
tgrandje Sep 17, 2024
3cf792e
Better logging
tgrandje Sep 17, 2024
6a8297b
Fix metadata with missing MAYOTTE from ARR
tgrandje Sep 17, 2024
fcc2c37
Add available dissolutions/split + integrity control
tgrandje Sep 17, 2024
a629e1a
Fix mapshaper enrich
tgrandje Sep 17, 2024
fdf9ae3
Remove unused CV/CANOV from metadata
tgrandje Sep 17, 2024
cf8c345
Bugfix on automatic merge with REG (error on MAYOTTE)
tgrandje Sep 17, 2024
b9d23de
Add TODO for TOM
tgrandje Sep 17, 2024
16c84d5
Fix rename ARRONDISSEMENT
tgrandje Sep 17, 2024
e086cd2
Fix geodata fro creating downstream datasets
tgrandje Sep 17, 2024
694a3b1
Comment out debug
tgrandje Sep 17, 2024
2a08304
Update pipeline.yaml
tgrandje Sep 17, 2024
e07d799
Update pipeline.yaml
tgrandje Sep 17, 2024
490c408
Update pipeline.yaml
tgrandje Sep 17, 2024
f1f02cf
fix import
tgrandje Sep 17, 2024
b920fac
Fix prepare geodatasets
tgrandje Sep 17, 2024
86b650f
test setting pipeline steps inside task
tgrandje Sep 17, 2024
637a967
Fix typo
tgrandje Sep 17, 2024
8d91bc9
Test update withParam
tgrandje Sep 17, 2024
59eeedf
Fix indentation
tgrandje Sep 17, 2024
9fad6cc
Fix pipeline (?)
tgrandje Sep 17, 2024
901d36d
Merge branch 'main' into feat/refacto_pipeline_first_steps
tgrandje Sep 17, 2024
f702195
test redefinition of arguments
tgrandje Sep 17, 2024
0cf0908
Fix pipeline (?)
tgrandje Sep 17, 2024
ed37c01
Update pipeline.yaml
tgrandje Sep 17, 2024
1273d1b
Test template task without input
tgrandje Sep 17, 2024
cacf8f8
Fix input (?) for nested steps
tgrandje Sep 18, 2024
af214b0
Fix typo
tgrandje Sep 18, 2024
54d897f
Update input value
tgrandje Sep 18, 2024
9d66531
Fix year input
tgrandje Sep 18, 2024
8498753
Fix argument
tgrandje Sep 18, 2024
5061d80
Fix typos
tgrandje Sep 18, 2024
44e18f3
Fix typo
tgrandje Sep 18, 2024
2e11565
Add keys to dict_correspondance (dissolve by COMMUNE from IRIS or by …
tgrandje Sep 18, 2024
1b47289
Move fix-geometry option
tgrandje Sep 18, 2024
9b40af0
Manage gpkg format & add docstrings
tgrandje Sep 18, 2024
fe59a49
Fix missing ARR from CANTON metadata
tgrandje Sep 18, 2024
757fb63
Remp removal of fix-geometry
tgrandje Sep 18, 2024
107130d
Fix typo
tgrandje Sep 18, 2024
b95ad79
Better subprocess for windows
tgrandje Sep 18, 2024
a1f9ba3
Add copy of gis_file into
tgrandje Sep 18, 2024
880eee5
Fix to gpkg
tgrandje Sep 18, 2024
340179d
Fix typo
tgrandje Sep 19, 2024
f59954a
Manage exception and cascade traceback
tgrandje Sep 19, 2024
a8fa2de
Fix logs of failed datasets
tgrandje Sep 19, 2024
e48e487
Move from hard-coded fields to regex patterns (without hard coded vin…
tgrandje Sep 19, 2024
994f7b5
Fix split/dissolve with selection of column from regex pattern
tgrandje Sep 19, 2024
9e57e44
Update spyder dev dependency
tgrandje Sep 19, 2024
b21c07e
Add basic coverage for TOM affine transformations
tgrandje Sep 19, 2024
bd5a4f0
Add download of COG COMMUNE file
tgrandje Sep 19, 2024
8eacdcb
Add coverage of fields on geometric dissolution
tgrandje Sep 19, 2024
609b19c
Better logging for splitting datasets
tgrandje Sep 19, 2024
5efc979
Improve datasets' generation
tgrandje Sep 19, 2024
5e8a238
Start refacto of metadata with COMMUNE/ARM
tgrandje Sep 19, 2024
ecb668d
refacto metadata with cache
tgrandje Sep 19, 2024
1714c6d
Add ultramarine in sources & download pipeline
tgrandje Sep 19, 2024
14297a1
prepare metadata for TOM
tgrandje Sep 19, 2024
05c7f49
Add BANATIC SIREN/INSEE for cities in sources
tgrandje Sep 20, 2024
c8f38ce
Check header "Content-length" after full GET request
tgrandje Sep 20, 2024
ec46fa5
Check unique URLs for every vintages
tgrandje Sep 20, 2024
af4bbb1
Add EPCI-FP & EPT to sources from INSEE
tgrandje Sep 20, 2024
9ac7f9a
Add python-calamine for bad INSEE xlsx
tgrandje Sep 20, 2024
9bea2bc
Fix metadata preparation
tgrandje Sep 20, 2024
187f345
Update pipeline_constants.py
tgrandje Sep 20, 2024
d53d9fe
Update dict_correspondance.py
tgrandje Sep 20, 2024
0d71b60
invert IRIS/COMMUNE preference for metadata (to keep population)
tgrandje Sep 20, 2024
242f43d
Fix typo
tgrandje Sep 20, 2024
f54fd8f
Update geodataset.py
tgrandje Sep 20, 2024
e6d72b6
Fix column name from metadata
tgrandje Sep 20, 2024
0256bfb
handle existing tempdir for local tests
tgrandje Sep 20, 2024
023015a
filter vintage -> force default value in dev pipeline
tgrandje Sep 20, 2024
c0e706d
handle missing ENVIRONMENT environ variable
tgrandje Sep 20, 2024
abf0943
Fix typos
tgrandje Sep 20, 2024
25b060c
Uprgade cartiflette version
tgrandje Sep 20, 2024
c4f01b7
Reset specific selections for geodatasets' affine transformations
tgrandje Sep 20, 2024
07d7c0e
Fix dissolved layer's name & handle multiple keys for dissolution
tgrandje Sep 20, 2024
002a42a
Fix dissolution/split in s3geodataset
tgrandje Sep 20, 2024
64be7c9
Fix geopackage storage on S3
tgrandje Sep 20, 2024
cbdd5a9
quick and dirty hack to handle shapefile generation
tgrandje Sep 20, 2024
01f11de
Comment out tests
tgrandje Sep 20, 2024
96304c8
Merge branch 'main' into feat/refacto_pipeline_first_steps
tgrandje Sep 20, 2024
3a7f9e2
Fix create_path_bucket induced bug on S3 part
tgrandje Sep 20, 2024
94875ee
Handle missing siren
tgrandje Sep 20, 2024
411abe0
handle missing metadata on vintage
tgrandje Sep 21, 2024
f68941e
remov dict_corresp from arguments (keep this a constant)
tgrandje Sep 21, 2024
d3a464b
Update dict_correspondance.py
tgrandje Sep 21, 2024
20456d3
Add staticmethod find_column_name to S3Dataset
tgrandje Sep 21, 2024
93758ba
Check dissolution/aggregation's presence before starting dataset's ge…
tgrandje Sep 21, 2024
c6797f5
Refacto S3GeoDataset with new find_column_name method
tgrandje Sep 21, 2024
daf0080
Try to get IdF entities by BBox
tgrandje Sep 21, 2024
e00a2e1
Fix france_entiere/drom bug
tgrandje Sep 21, 2024
4296d13
temp simplification of pipeline
tgrandje Sep 21, 2024
25a6dc8
Fix ARM bug
tgrandje Sep 21, 2024
18a1f1f
Switch to ARM for base geodataset (from adminexpress)
tgrandje Sep 22, 2024
49d7b2a
mark metadata with IDF tag and switch from COMMUNE to ARM
tgrandje Sep 22, 2024
82f0757
Rename "mercator" to "epsg" (allow for native different projections)
tgrandje Sep 22, 2024
03a7afa
Try to capture/drop IDF entities for bringing closer
tgrandje Sep 22, 2024
ba09ba8
Fix typo
tgrandje Sep 22, 2024
6a1bcab
Adjust zoom level for IdF dep level
tgrandje Sep 22, 2024
a1fb81c
Update crossproduct
tgrandje Sep 22, 2024
62a2af1
Update pipeline with result on last step and updated config default v…
tgrandje Sep 22, 2024
b5f44c2
FIx IDF js selector
tgrandje Sep 22, 2024
99bdeb5
drop IDF at finish
tgrandje Sep 22, 2024
9ef5cbd
Rename crossproduct outputs (epsg, format_output)
tgrandje Sep 22, 2024
ffbed77
FIx pipeline
tgrandje Sep 22, 2024
bd18794
Fix typo
tgrandje Sep 22, 2024
89c215d
Update defaults and make sure dir exists for ultimate result
tgrandje Sep 22, 2024
f2e8dd0
Fix(?) CANTON & ARR unicity
tgrandje Sep 22, 2024
92eb80e
Hack to add PARIS to CANTON metadata and ensure IDF tagging
tgrandje Sep 22, 2024
3e9cd07
Fix ARM dissolution (?)
tgrandje Sep 22, 2024
a49af38
Reduce zoom for IDF
tgrandje Sep 22, 2024
e2d79d1
Update IDF Zoom Level
tgrandje Sep 23, 2024
fb94a52
Fix paris CANTON bug
tgrandje Sep 23, 2024
21ca088
Update prepare_cog_metadata.py
tgrandje Sep 23, 2024
973bdad
Collect IRIS populations
tgrandje Sep 23, 2024
ee9d11f
Fix IRIS-POPULATION vintage
tgrandje Sep 23, 2024
72651f1
Remove unused localpath argument
tgrandje Sep 23, 2024
aa3c108
Refacto metada factory
tgrandje Sep 23, 2024
c303f79
Handle missing geodata vintages
tgrandje Sep 23, 2024
73cbc5d
Fix zoom level
tgrandje Sep 23, 2024
9969340
Fix upload COMMUNE metadata
tgrandje Sep 23, 2024
7b44059
Update crossproduct
tgrandje Sep 23, 2024
f6014b7
Simplify pipeline constants
tgrandje Sep 23, 2024
9f995f3
Update keys for joining metadata/geodata
tgrandje Sep 23, 2024
7222b51
Remove calamine
tgrandje Sep 23, 2024
60b8e48
handle missing CANTONS in metadata
tgrandje Sep 24, 2024
67fbf70
Allow to create dynamic field with mapshaper
tgrandje Sep 24, 2024
7615a41
Fix keys for CANTON & ARR
tgrandje Sep 24, 2024
d1ea745
Createa dynamic CAN field (composite of dep & canton) in geodataset
tgrandje Sep 24, 2024
5c5786b
handle collection of raw xlsx files
tgrandje Sep 24, 2024
d4ad6a0
collect mayotte population
tgrandje Sep 24, 2024
841d8c9
Update mapshaper_remove_cities_with_districts.py
tgrandje Sep 24, 2024
b94322d
Update CANTON metadata factory
tgrandje Sep 24, 2024
a81e000
Create a composite COMMUNE/TOM from IRIS base layer
tgrandje Sep 24, 2024
b4cd216
Update docstring
tgrandje Sep 24, 2024
aa1fe1c
Fix typos
tgrandje Sep 24, 2024
ea11a8f
Cancel mayotte collection
tgrandje Sep 24, 2024
aadc909
Zoom in ARM
tgrandje Sep 24, 2024
0ae900e
Fix read_excel with polars
tgrandje Sep 24, 2024
ebccb8e
Analyse columns to be kept with dropna before dissolve
tgrandje Sep 24, 2024
8b6e458
Hack to avoid zooming too largely outside IdF
tgrandje Sep 24, 2024
bf350e4
Fix typo
tgrandje Sep 24, 2024
2936ae0
Zoom in EPT
tgrandje Sep 24, 2024
bb86bb0
Bugfix
tgrandje Sep 24, 2024
fcd1960
Bugfix
tgrandje Sep 24, 2024
aceb3ac
Add sources with INSEE zonages
tgrandje Sep 24, 2024
3519263
All labels for all zoning
tgrandje Sep 24, 2024
94ce27a
Fix strange pyogrio.errors.CRSError on excel files
tgrandje Sep 24, 2024
a08a2cd
Fix zoning labels
tgrandje Sep 24, 2024
39ebfe5
Fix typo
tgrandje Sep 24, 2024
2afe301
Reset pipeline constants for full test
tgrandje Sep 24, 2024
ad8d685
Set all vintages with preprod env
tgrandje Sep 24, 2024
fee9e23
Fix sources
tgrandje Sep 25, 2024
c816c77
Update selection of vintages
tgrandje Sep 25, 2024
c9f48ce
Fix typo in IRIS-GE source
tgrandje Sep 25, 2024
8731a0b
Bugfix : dynamic years filtering for collection, using dev environmen…
tgrandje Sep 25, 2024
42f7557
Rename pipeline step to avoid confusion
tgrandje Sep 25, 2024
24f6e06
Better logging in crossproduct and pruning dead code
tgrandje Sep 25, 2024
97d3ee1
Update datasets with resolution config
tgrandje Sep 25, 2024
bcdf249
Escape single quote for mapshaper
tgrandje Sep 25, 2024
4f1bd6c
raise ValueError if no available combination
tgrandje Sep 25, 2024
0d33f10
Disable failFast for dag
tgrandje Sep 25, 2024
1d96065
Set pipeline configurations for dev environment
tgrandje Sep 26, 2024
788a47f
test pipeline with multithreading on last step
tgrandje Sep 26, 2024
b5e7ef6
Update log entries
tgrandje Sep 26, 2024
c8c1e18
Fix multithreading
tgrandje Sep 27, 2024
0a9bb1d
Test mutithreading in dev environ
tgrandje Sep 27, 2024
901c8de
Merge branch 'feat/refacto_pipeline_first_steps' of https://github.co…
tgrandje Sep 27, 2024
f1306a9
Update mapshaper_split_from_s3.py
tgrandje Sep 28, 2024
c6cf7e0
Update mapshaper_split_from_s3.py
tgrandje Sep 28, 2024
5abbd8f
Fix pipeline
tgrandje Sep 28, 2024
61b4ad7
Test preprod pipeline
tgrandje Sep 28, 2024
f7729d2
Move format_output & crs out from crossproduct
tgrandje Sep 28, 2024
dacd0a4
Fix pipeline arg
tgrandje Sep 29, 2024
f4d381d
Fix arguments
tgrandje Sep 29, 2024
1f0c38b
Fix return
tgrandje Sep 30, 2024
58cb56a
Fix pipeline return
tgrandje Oct 1, 2024
18660ac
Update make_metadata_datasets.py
tgrandje Oct 1, 2024
0752aec
dev pipeline
tgrandje Oct 1, 2024
d3d4a43
Remove closer IDF for AAV
tgrandje Oct 1, 2024
07e32ed
update spyder
tgrandje Oct 1, 2024
f551be5
update source with fixed URL for 2019/2020 on admin-express-carto
tgrandje Oct 1, 2024
6e5d03e
Fix metadata
tgrandje Oct 1, 2024
f96d922
Update mapshaper_split_from_s3.py
tgrandje Oct 1, 2024
3db0db3
Test on preprod config
tgrandje Oct 2, 2024
f30cd9e
Merge branch 'main' into feat/refacto_pipeline_first_steps
linogaliana Oct 17, 2024
67e56db
Zoom IDF on 75, 92, 93, 94 only
tgrandje Oct 18, 2024
073b9b8
test pipeline
tgrandje Oct 18, 2024
bb1825b
bugfix
tgrandje Oct 18, 2024
b407cce
Update pipeline.yaml
tgrandje Oct 18, 2024
c3f1974
Update pipeline.yaml
tgrandje Oct 18, 2024
ff8bc51
Remove IdF dissolution
tgrandje Oct 18, 2024
9da8cb4
Remove closer IDF for EPT
tgrandje Oct 18, 2024
4e4bd6a
Fix json default
tgrandje Oct 18, 2024
cacbef9
create FRANCE_ENTIERE_IDF_DROM_RAPPROCHES
tgrandje Oct 18, 2024
ec227c1
Fix loop without multithreading
tgrandje Oct 28, 2024
46dcca1
Remove closer IDF for EPCI & EPT
tgrandje Oct 28, 2024
31e438c
refacto Idf zoom/position
tgrandje Oct 28, 2024
ef68adc
fix type hint
tgrandje Oct 28, 2024
ab3b8de
Fix Paris canton
tgrandje Oct 29, 2024
56b4b30
Preprod pipeline
tgrandje Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,4 +130,5 @@ dmypy.json

*.sqlite
*.sqlite*
/argo-pipeline/src/cartiflette-s3-cache
**/cartiflette-s3-cache/*
*.json
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,6 @@ COPY docker/test.py .
RUN curl https://install.python-poetry.org/ | python -
RUN poetry install --only main --no-interaction

# TODO : is this necessary? This should throw an exception if datasets have not
# already been (manually) uploaded ?
CMD ["python", "test.py"]
11 changes: 7 additions & 4 deletions argo-pipeline/api.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
"""A simple API to expose cartiflette files"""

import typing
from fastapi import FastAPI, Response
from fastapi.responses import FileResponse

from cartiflette.api import download_from_cartiflette_inner
from cartiflette.config import PATH_WITHIN_BUCKET
from cartiflette.config import PATH_WITHIN_BUCKET, DATASETS_HIGH_RESOLUTION
from cartiflette.pipeline_constants import COG_TERRITOIRE # , IRIS

app = FastAPI(
title="API de récupération des fonds de carte avec <code>cartiflette</code>",
Expand Down Expand Up @@ -42,9 +44,10 @@ def download_from_cartiflette_api(
year=2022,
crs=4326,
simplification=simplification,
provider="IGN",
dataset_family="ADMINEXPRESS",
source="EXPRESS-COG-CARTO-TERRITOIRE",
provider="Cartiflette",
dataset_family="production",
# TODO : source can also be IRIS[DATASETS_HIGH_RESOLUTION]
source=COG_TERRITOIRE[DATASETS_HIGH_RESOLUTION],
return_as_json=False,
path_within_bucket=PATH_WITHIN_BUCKET,
)
Expand Down
Loading
Loading