Skip to content

Commit

Permalink
support parallel running script
Browse files Browse the repository at this point in the history
  • Loading branch information
smercier committed Sep 12, 2019
1 parent db37fb0 commit c4246c5
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 57 deletions.
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ We need to create a development virtual environment available only for the activ
cd ./pgrastertime
echo 'PATH="$HOME/.local/bin/:$PATH"' >>~/.bashrc
pip3 install --user pipenv
pip3 install
pipenv install
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
pipenv run pip install "GDAL<=$(gdal-config --version)"
Expand Down Expand Up @@ -89,11 +89,12 @@ Basic postgresql database init

```
psql -h localhost -p 5432 -U postgres -W
CREATE USER loader WITH PASSWORD 'ChangeMe';
CREATE USER loader WITH PASSWORD 'loader';
ALTER USER loader WITH SUPERUSER;
CREATE DATABASE pgraster WITH OWNER loader ENCODING 'UTF8';
\q
psql -h localhost -p 5432 -d pgraster -U loader -c "CREATE EXTENSION postgis;"
psql -h localhost -p 5432 -U loader -W -d pgraster -f ./sql/init_exta.sql
psql -h localhost -p 5432 -d pgraster -U loader -W -c "CREATE EXTENSION postgis;"
psql -h localhost -p 5432 -d pgraster -U loader -W -f ./sql/init_exta.sql
```
From pgrastertime directory, copy the `development.ini` file to `local.ini` and
edit database connection string for sqlalchemy module:
Expand All @@ -115,7 +116,7 @@ psql -h localhost -p 5432 -U loader -W -d pgraster -f ./sql/wis/dfo_all_tables.s
# Running pgrastertime

```
python pgrastertime.py -h
python3 pgrastertime.py -h
usage: pgrastertime [-h] [--config config_file] --tablename TABLENAME
[--sqlfiles SQLFILES] [--dataset DATASET]
[--reader READER]
Expand Down Expand Up @@ -161,75 +162,74 @@ The `local.ini` is the default configuration file. You can have multiple config
can use `-c` flag to use a different one.

```
python pgrastertime.py -c myconf_dev.ini -r ./data/ -p xml
python3 pgrastertime.py -c myconf_dev.ini -r ./data/ -p xml
```

*NOTE:* When use GDAL with path, add this environment variable
```
export GDAL_DATA=/usr/share/gdal/2.2/
```

## Examples

First iteration of pgRastertime was designed to import your raster data in a postgresql database. You need to
edit your local.ini file to change your postgresql connection info, local path and postprocess file.

```
python pgrastertime.py -t testtable -r ./data/18g063330911_0250.object.xml -p load
python3 pgrastertime.py -t testtable -r ./data/18g063330911_0250.object.xml -p load
```

A specific driver was added for a specific raster format define by an XML file. You can create your own
driver in `process` folder. As example, `xml_import.py` driver alows to import files link to a specific XML object file.

```
python pgrastertime.py -t testtable -r ./data/18g063330911_0250.object.xml -p xml
python3 pgrastertime.py -f -t testtable -r ./data/18g063330911_0250.object.xml -p xml
```

You can add post process SQL script(s) to the command line (can be multiple script separated by commas).
Postprocess script (-s option) are execute after each raster updated in table. Use `pgrastertime` template
name and the pgrastertime script will find and replace them with your target table name of `-t` flag.

```
python pgrastertime.py -s ./sql/basePostProcess.sql -t testtable -f -r ./data/ -p xml
python3 pgrastertime.py -s ./sql/basePostProcess.sql -t testtable -f -r ./data/ -p xml
```

if secteur_sondage (dfo) is loaded in db we can use postprocess.
python pgrastertime.py -s ./sql/postprocess.sql -t testtable -f -r ./data/ -p xml
```
python3 pgrastertime.py -s ./sql/postprocess.sql -t testtable -f -r ./data/ -p xml
```
To validate postprocess
select metadata_id,resolution , st_scalex(raster),st_area(tile_geom) ,filename ,st_numbands(raster)
from soundingue ;
```
select metadata_id,resolution , st_scalex(raster),st_area(tile_geom) ,filename ,st_numbands(raster) from soundingue;
```

* The force `-f` optional flag is used to force overwrite the target table. When force is not use and `-r` is a directory, all validation is made to import ONLY raster that is not already processed. This check is made through the metadata target raster table.
The force `-f` optional flag is used to force overwrite the target table. When force is not use and `-r` is a directory, all validation is made to import ONLY raster that is not already processed. This check is made through the metadata target raster table.

You can `deploy` your pgrastertable table ( `-t` flag) to your production table through `./sql/deploy.sql` script (edit this
script for your needed).

```
python pgrastertime.py -p deploy -t testtable
python3 pgrastertime.py -p deploy -t testtable
```

sql validation
SQL for validation
```
select count(*) from soundings_4m; should be greater than 0
select count(*) from soundings_error; should be 0
```

Validation script can be use and updated for your need.

```
python pgrastertime.py -p validate -t datatest
python3 pgrastertime.py -p validate -t datatest
```

This custom sedimentation process need multiple input value add with `-m` flags. This example output the table processed with a cutome postgresql function to a tif file

```
python pgrastertime.py -t soundings_4m -m time_start='2017-12-31' -m time_end='2018-10-22' -m resolution=4 -d ./datatest/secteur.shp -o ./datatest/sm1.tif -of gtiff -v -p sedimentation
python3 pgrastertime.py -t soundings_4m -m time_start='2017-12-31' -m time_end='2018-10-22' -m resolution=4 -d ./datatest/secteur.shp -o ./datatest/sm1.tif -of gtiff -v -p sedimentation
```

In this example, the output is a Postgresql table `my_table`

```
python pgrastertime.py -t soundings_4m -m time_start='2017-12-31' -m time_end='2018-10-22' -m resolution=4 -d ./datatest/secteur.shp -o my_table -of pg -v -p sedimentation
python3 pgrastertime.py -t soundings_4m -m time_start='2017-12-31' -m time_end='2018-10-22' -m resolution=4 -d ./datatest/secteur.shp -o my_table -of pg -v -p sedimentation
```

## Todo list
Expand Down
8 changes: 4 additions & 4 deletions development.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

# Connection to PGSQL
# This DB needs to have the PostGIS Raster extension enabled
sqlalchemy.url = postgresql://user:password@localhost:5432/pgrastertime
sqlalchemy.url = postgresql://user:password@localhost:5432/pgraster

# Resolutions at which we need to create overviews
output.resolutions = 0.25,0.5,1,2,4,8,16

# customizable pgrastertime table
db.sqlpath = /sql
db.pgrastertable = /sql/pgrastertime_table.sql
db.metadatatable = /sql/metadata_table.sql
db.sqlpath = ./sql
db.pgrastertable = ./sql/pgrastertime_table.sql
db.metadatatable = ./sql/metadata_table.sql


# A generic, single database configuration.
Expand Down
3 changes: 2 additions & 1 deletion pgrastertime/commandline.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,9 @@ def parse_arguments():
'--output-format', '-of', default='', choices=['gtiff', 'pg'],
help='Output format Geotiff or PostGIS table'
)
## python 3.6.8 doent support action?? need investigation: action='append',
parser.add_argument(
'--param', '-m', nargs='?',default='', action='append',
'--param', '-m', nargs='?',default='',
help='Option(s) input'
)
parser.add_argument(
Expand Down
5 changes: 2 additions & 3 deletions pgrastertime/processes/xml_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def getConParam(self):
return conDic

def insertXML(self, xml_filename):

#this bash file create ins.sql to run
cmd = "sh ./xml.sh " + xml_filename + " " + self.tablename
if subprocess.call(cmd, shell=True) != 0:
Expand All @@ -44,11 +43,11 @@ def insertXML(self, xml_filename):
con_pg['pg_port'],
con_pg['pg_user'],
con_pg['pg_dbname'])

print(cmd)
if subprocess.call(cmd, shell=True) != 0:
print("Fail to insert sql in database...")
return False
os.remove("ins.sql")
#os.remove("ins.sql")

return True

Expand Down
44 changes: 25 additions & 19 deletions pgrastertime/processes/xml_resampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ def __init__(self, xml_filename, tablename, force, sqlfiles, verbose, dry_run,us
def getParams(self,param):
# this process need gdal_path parametre
p = ''
for i in self.userparam:
if i.split("=")[0].lower() == param:
p = i.split("=")[1]
#print('self.userparam:'+self.userparam)
#for i in self.userparam:
#if i.split("=")[0].lower() == param:
if self.userparam.split("=")[0].lower() == param:
p = self.userparam.split("=")[1]
return p


Expand All @@ -52,16 +54,16 @@ def insertXML(self, xml_filename):


os.environ["PGPASSWORD"] = con_pg['pg_pw']
cmd = "psql -q -h %s -p %s -U %s -d %s -f ins.sql" % (
cmd = "psql -q -h %s -p %s -U %s -d %s -f %s.sql" % (
con_pg['pg_host'],
con_pg['pg_port'],
con_pg['pg_user'],
con_pg['pg_dbname'])

con_pg['pg_dbname'],
self.tablename)
if subprocess.call(cmd, shell=True) != 0:
print("Fail to insert sql in database...")
return False
os.remove("ins.sql")
os.remove( self.tablename + ".sql")

return True

Expand All @@ -78,7 +80,6 @@ def ifLoaded(self,rester_prefix):
# NOTE: at this stage, even at the first raster, the table is already created but without any row...

sql = "SELECT count(*) as cnt FROM %s_metadata WHERE objnam='%s'" % (self.tablename,raster_id)
print(sql)
try:
r = DBSession().execute(sql).fetchone()
if r[0] == 0:
Expand Down Expand Up @@ -178,11 +179,11 @@ def getGDALcmd(self, gdalwarp_path, source_file, target_file, resolution, resamp
target_file)
return cmd

def clearTmp(self):
def clearTmp(self,prefix):
dir_tmp_name = "/tmp/"
tmp = os.listdir(dir_tmp_name)
for item in tmp:
if item.endswith(".tiff"):
if item.startswith(prefix):
os.remove(os.path.join(dir_tmp_name, item))

def ImportXmlObject(self, raster_prefix):
Expand All @@ -193,6 +194,12 @@ def ImportXmlObject(self, raster_prefix):

# Check gdal_path param
gdal_path = self.getParams('gdal_path')
if gdal_path !='':
#we will need to set the GDAL_DATA path
os.environ["GDAL_DATA"] = gdal_path + '/data/'
# and fix the path of bin file
gdal_path = gdal_path + '/apps/'


# loop in all raster type
nb_of_raster = 0
Expand All @@ -213,7 +220,7 @@ def ImportXmlObject(self, raster_prefix):
# start at the nearest resolution
if float(resolution) >= float(reader.resolution):

output_raster_filename = tempfile.NamedTemporaryFile().name + "_" + resolution + ".tiff" #reader.get_file(resolution)
output_raster_filename = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + ".tiff" #reader.get_file(resolution)
# we will keep a sequential number for all resolution, 1, 2, 3, 4, 5 ...
resolution_id += 1
step1=step2=step2a=step2b=step3=step4=''
Expand Down Expand Up @@ -247,13 +254,13 @@ def ImportXmlObject(self, raster_prefix):

if raster_type == 'mean':
## see http://10.208.34.178/projects/wis-sivn/wiki/Resampling
tmp_step1 = tempfile.NamedTemporaryFile().name + "_" + resolution + "_mean_step1.tiff"
tmp_step1 = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_mean_step1.tiff"
step1 = "python gdal_calc.py --overwrite -A %s -B %s --calc='%s' --outfile='%s'" % (
raster_dict['density'][resolution_id-1],
raster_dict['mean'][resolution_id-1],
"nan_to_num(multiply(A,B))", # "A*B",
tmp_step1)
tmp_step2 = tempfile.NamedTemporaryFile().name + "_" + resolution + "_mean_step2.tiff"
tmp_step2 = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_mean_step2.tiff"
step2 = self.getGDALcmd(reader.gdalwarp_path,
tmp_step1,
tmp_step2,
Expand All @@ -278,27 +285,26 @@ def ImportXmlObject(self, raster_prefix):

if raster_type == 'stddev':
## see http://10.208.34.178/projects/wis-sivn/wiki/Resampling
tmp_step1 = tempfile.NamedTemporaryFile().name + "_" + resolution + "_stddev_step1.tiff"
tmp_step1 = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_stddev_step1.tiff"
step1 = "python gdal_calc.py --overwrite -A %s -B %s --calc='%s' --outfile='%s'" % (
raster_dict['density'][resolution_id-1],
raster_dict['stddev'][resolution_id-1],
"nan_to_num((A-1)*(B*B))",
tmp_step1)

tmp_step2 = tempfile.NamedTemporaryFile().name + "_" + resolution + "_stddev_step2.tiff"
tmp_step2 = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_stddev_step2.tiff"
step2 = self.getGDALcmd(reader.gdalwarp_path,
tmp_step1,
tmp_step2,
resolution,
'sum')
#reclass 1 - 0
tmp_step2a = tempfile.NamedTemporaryFile().name + "_" + resolution + "_stddev_step2a.tiff"
tmp_step2a = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_stddev_step2a.tiff"
step2a = "python gdal_calc.py --overwrite --co='COMPRESS=DEFLATE' -A %s --calc='%s' --outfile='%s'" % (
raster_dict['density'][resolution_id-1],
"1*(A<3.4028234663852886e+38)",
tmp_step2a)

tmp_step2b = tempfile.NamedTemporaryFile().name + "_" + resolution + "_stddev_step2b.tiff"
tmp_step2b = tempfile.NamedTemporaryFile(prefix=self.tablename).name + "_" + resolution + "_stddev_step2b.tiff"
step2b = self.getGDALcmd(reader.gdalwarp_path,
tmp_step2a,
tmp_step2b,
Expand Down Expand Up @@ -409,7 +415,7 @@ def ImportXmlObject(self, raster_prefix):


# we need to flush all tmp file of this object
self.clearTmp()
self.clearTmp(self.tablename)

return "SUCCESS"

12 changes: 6 additions & 6 deletions xml.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
echo "create temporary table metadatatemp as SELECT xml " > ins.sql
echo "\$\$ " >> ins.sql
cat $1 >> ins.sql
echo "\$\$ AS objectXml;" >> ins.sql
sed -i '3d' ins.sql
echo "create temporary table metadatatemp as SELECT xml " > $2.sql
echo "\$\$ " >> $2.sql
cat $1 >> $2.sql
echo "\$\$ AS objectXml;" >> $2.sql
sed -i '3d' $2.sql
echo "INSERT INTO $2_metadata
SELECT xmltable.*
FROM metadatatemp,
Expand Down Expand Up @@ -38,4 +38,4 @@ echo "INSERT INTO $2_metadata
srftyp text PATH 'Attribute[@name = \"srftyp\"]/Value',
sursso text PATH 'Attribute[@name = \"sursso\"]/Value',
uidcre text PATH 'Attribute[@name = \"uidcre\"]/Value');
">>ins.sql
">>$2.sql

0 comments on commit c4246c5

Please sign in to comment.