Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest process for the realtime data is not using origin consistently #1

Open
caparker opened this issue Sep 27, 2022 · 0 comments
Open

Comments

@caparker
Copy link
Collaborator

During the process of importing the realtime data the sensor_nodes_id is found by either matching the location or site_name + source_name and in both cases is also requiring that the existing sensor_node has origin = 'OPENAQ'.

UPDATE tempfetchdata_nodes t
SET sensor_nodes_id = sn.sensor_nodes_id
, added = FALSE
FROM sensor_nodes sn
WHERE t.geom IS NOT NULL
AND st_dwithin(sn.geom, t.geom, .0001)
AND origin='OPENAQ';

UPDATE tempfetchdata_nodes t
SET sensor_nodes_id = sn.sensor_nodes_id
, added = FALSE
FROM sensor_nodes sn
WHERE t.sensor_nodes_id is null
AND t.site_name is not null
AND t.source_name is not null
AND t.site_name = sn.site_name
AND t.source_name=sn.source_name
AND origin='OPENAQ';

However, when a sensor node is not found and must be added we are not adding the origin. This means that even after its added it cant be found again and so it will be added again.

-- Create new nodes where they don't exist
WITH sn AS (
INSERT INTO sensor_nodes (
    site_name,
    metadata,
    geom,
    source_name,
    city,
    country,
    ismobile
)
SELECT
    site_name,
    metadata,
    geom,
    source_name,
    city,
    country,
    ismobile
FROM tempfetchdata_nodes t
WHERE t.sensor_nodes_id is NULL
RETURNING *
)
UPDATE tempfetchdata_nodes tf SET sensor_nodes_id = sn.sensor_nodes_id
FROM sn WHERE tf.sensor_nodes_id is null
and row(tf.site_name, tf.geom, tf.source_name) is not distinct
from row(sn.site_name, sn.geom, sn.source_name);

For now I am going to just update that sql statement to include the origin value but we might want to think about a method that does not hardcode the origin or even depend on it.

WITH sn AS (
INSERT INTO sensor_nodes (
    site_name,
    metadata,
    geom,
    source_name,
    city,
    country,
    ismobile,
    origin
)
SELECT
    site_name,
    metadata,
    geom,
    source_name,
    city,
    country,
    ismobile,
    'OPENAQ'
FROM tempfetchdata_nodes t
WHERE t.sensor_nodes_id is NULL
RETURNING *
), inserted AS (
UPDATE tempfetchdata_nodes tf SET
 sensor_nodes_id = sn.sensor_nodes_id
 , added = TRUE
FROM sn
WHERE tf.sensor_nodes_id is null
and row(tf.site_name, tf.geom, tf.source_name) is not distinct from row(sn.site_name, sn.geom, sn.source_name)
)
SELECT COUNT(1) INTO __inserted_nodes
FROM sn;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant