Discussion: FROST-Server and TimescaleDB #538

scottlimmer · 2019-02-25T23:18:43Z

scottlimmer
Feb 25, 2019

Wondering if anyone has had any thoughts on the feasibility of adding TimescaleDB to the FROST-Server schema; the observations tables being the obvious candidate.

I've trialled TimescaleDB on a custom Postgres schema and found the performance to be exponentially better than a traditional database (80M row dataset).

I think there'd be benefits in converting the observations table into a hypertable however I think code review would be required to get the most out of TimescaleDB and its additional time-based functions

hylkevds · 2019-02-26T08:06:56Z

hylkevds
Feb 26, 2019
Maintainer

Timescale came up recently here as well, but we've not had time to look into it.
Have you tried turning the Observations table into a hypertable?
It would be good to know if:

It can be done without any changes to the FROST source
What the performance increase is for different queries (insert, update, delete & get, with and without the appropriate indices)
What changes to FROST should be made to get the most out of Timescale

0 replies

riedel · 2019-04-05T15:21:15Z

riedel
Apr 5, 2019

My feeling is that this makes only sense if FROST could expose some aggregate queries based on time series (see #26), which is outside of the standard.

Particularly a function like time_bucket_gapfill() would be of high value to anyone who wants implement a decent scalable web interface based on SensorThings.

0 replies

hylkevds · 2019-04-06T17:47:48Z

hylkevds
Apr 6, 2019
Maintainer

In the HERACLES project we store and visualise 100Hz data. By now we have nearly two years of data for three 100 Hz accelerometers, and four 40Hz sensors.
You can have a look at the graphs at: https://heracles-kb.server.de/servlet/is/6679/

You are right that aggregates are required to make this work, but there is no need to have this in the API. In almost all our use cases we calculate aggregates using an external tool (https://github.com/hylkevds/SensorThingsProcessor), that pushes the aggregate data back into FROST as a MultiDatastream with Average, Min, Max and Std-dev. Using this it is already trivial to build a decent scalable web interface based on SensorThings.

The problem with making aggregate functions part of the standard is that there are many ways to calculate aggregates, that depend on the exact sampling regime of the sensor. Simply taking the mathematical average of the numbers may work for sensors with a fixed sampling interval, but will not give sensible values in many other cases.

0 replies

chenkianwee · 2020-05-23T16:40:09Z

chenkianwee
May 23, 2020

Hi I have been working on a database project for my laboratory. In our lab, we build our own sensors and distribute it across campus. I realise the database will benefit greatly from adopting the sensorthings api. As mentioned above, I am also interested to integrate timescaledb with the sensorthings data model (especially the observations table).

I have tried to convert the observations table into a hypertable (timescaledb) but it failed as timescaledb requires the primary key to be a timestamp column. I was planning to use the result_time column. However, the observations table uses the id column as the primary key.

As mentioned in hylkevds post it is not required to integrate aggregate functions into the API, which is also not in the standard. However, I thought converting the observations table to hypertable will benefit the backend in maintaining the database and also in exploring the data locally, it might also help with the performance when ingesting and updating data.

It is possible to export the data and perform explorations on the data externally (e.g. in the case of using timescaledb, copy the observations table and migrate it to a hypertable and perform explorations ). It is an extra step, I thought it might be good if the frost-server can provide the option to convert the observations table into a hypertable. Just some thoughts on the subject.

Thus, one of my question is, instead of the id as primary key, is it possible to use a timestamp column as key , how will that affect the performance of frost-server.

0 replies

riedel · 2020-05-23T17:45:15Z

riedel
May 23, 2020

@limond sucessfully adopted timescale DB below frost for us with considerable speed up. We should document our index setup ( @johannes-riesterer )

0 replies

hylkevds · 2020-05-23T17:45:25Z

hylkevds
May 23, 2020
Maintainer

That timescaledb forces the primary key to be the time column surprises me greatly. That makes it useless for any application that gathers data from multiple sources, as a time collision is guaranteed to happen eventually... Unless the restriction is that the time column has to be a part of the primary key, and you can make it a combined PK of the time with another column.

That said, FROST doesn't really care what the DB thinks is the primary key, as long as the ID column is unique. So you are free to change the PK to whatever you see fit, and FROST won't complain until there is a PK violation and the insert fails.

0 replies

chenkianwee · 2020-05-23T18:10:18Z

chenkianwee
May 23, 2020

thanks for the reply. @hylkevds I think it is the case that time column has to be a part of the primary key and you can use another column (https://docs.timescale.com/latest/using-timescaledb/schema-management#indexing-best-practices). But I am not sure, I am new and exploring the implementation of these software stacks.

@riedel it will be useful to see how you guys integrated timescaledb. Thanks. Do let me know where I can reference your integration.

0 replies

riedel · 2020-05-25T11:58:01Z

riedel
May 25, 2020

@chenkianwee you can find our deployment here: https://github.com/SmartAQnet/smartaqnet-infrastructure/tree/cluster-dev

0 replies

limond · 2020-05-25T12:30:41Z

limond
May 25, 2020

@riedel @chenkianwee I was mentioned above, so I would like to add a few words despite me no longer working for TECO.
The README in the repository is probably outdated since I extensively documented the setup and maintenance in the project's internal wiki.
It might help to release these wiki pages and add them to the repository.
However, the index and hypertable setup in use can be found here: https://github.com/SmartAQnet/smartaqnet-infrastructure/blob/cluster-dev/dev/dev-databases/prepare_hypertable.sql

0 replies

riedel · 2020-05-25T12:47:43Z

riedel
May 25, 2020

Thanks @limond, good to hear from you (I was just was about to post the sql file after looking for it myself)

The README in the repository is probably outdated since I extensively documented the setup and maintenance in the project's internal wiki.

I just asked @johannes-riesterer to merge the documentation back, but I think there is not much more on the index/hypertable setup part except for this quote

The preparation differs from the original FROST instructions. The postgis server uses a TimescaleDB instance. In order to initialize the database correctly, point your DOCKER_HOST to the writable worker. Search for the container id of the postgis container (docker ps). You can >execute psql commands like this: docker exec -it psql [...]
Go into the /dev/dev-databases directory of the repository.
Execute the following commands:
docker exec -it psql --username=sensorthings < globals.sql
docker exec -it psql --username=sensorthings < schema.sql
docker exec -it psql --username=sensorthings < prepare_hypertable.sql

These will create and configure database users (globals.sql), the schema of all tables but Observations (schema.sql) and the Observations table as a TimescaleDB hypertable (prepare_hypertable.sql).

0 replies

chenkianwee · 2020-05-25T14:52:05Z

chenkianwee
May 25, 2020

Thanks @limond and @riedel , this is very helpful !! I had a quick look at the information, looks like shld be enough information for my setup. Thanks again.

0 replies

ndevilleBE · 2024-07-16T09:07:07Z

ndevilleBE
Jul 16, 2024

Dear all,
Testing with TimescaleDB, I'm facing an issue with the FROST API performances. I mostly followed @limond documentation to set up my database. Here are the indexes I have

CREATE INDEX observations_idx ON public."OBSERVATIONS" USING btree ("ID", "PHENOMENON_TIME_START");

CREATE INDEX observations_datastream_id_idx ON public."OBSERVATIONS" USING btree ("DATASTREAM_ID", "PHENOMENON_TIME_START", "PHENOMENON_TIME_END");

CREATE INDEX observations_id_idx ON public."OBSERVATIONS" USING btree ("ID", "FEATURE_ID");

If I fetch data using the API including the Feature of Interest ($expand=...), it seems that the API slice the request in different queries:

it queries the Observations based on time -> fast enough with the correct index
base on the features ID from the first query, it queries the FEATURE table then join the OBSERVATIONS table for each observations with:

select "e0"."FEATURE"::jsonb#>'{ coordinates }', "e0"."ID" from "FEATURES" as "e0" left outer join "OBSERVATIONS" as "e1" on "e1"."FEATURE_ID" = "e0"."ID" where "e1"."ID" = $1 order by "e0"."ID" asc offset $2 rows fetch next $3 rows only

The issue with this approach is that it looks into all chunks for each feature! Here is a small sample of the query planning (I have around 1500 "1-week chunk" over 40 years of data).

Limit  (cost=529.00..3073.85 rows=200 width=40)
  ->  Nested Loop  (cost=529.00..15492.71 rows=1176 width=40)
        ->  Merge Append  (cost=528.43..5390.87 rows=1176 width=8)
              Sort Key: e1_1.FEATURE_ID

          ->  Index Only Scan using _hyper_2_657_chunk_observations_id_idx on _hyper_2_657_chunk e1_1  (cost=0.28..4.30 rows=1 width=8)

                Index Cond: ("ID" = 1196511637)
          ->  Index Only Scan using _hyper_2_658_chunk_observations_id_idx on _hyper_2_658_chunk e1_2  (cost=0.28..4.30 rows=1 width=8)

                Index Cond: ("ID" = 1196511637)
          ->  Index Only Scan using _hyper_2_659_chunk_observations_id_idx on _hyper_2_659_chunk e1_3  (cost=0.14..4.16 rows=1 width=8)

                Index Cond: ("ID" = 1196511637)
          ->  Index Only Scan using _hyper_2_660_chunk_observations_id_idx on _hyper_2_660_chunk e1_4  (cost=0.27..4.29 rows=1 width=8)``

The url looks like:

https://sensors.naturalsciences.be/sta/v1.1/Things(1)?$select=Datastreams&$expand=Datastreams($expand=Observations($filter=phenomenonTime%20gt%202024-03-19T12:31:00.000000Z%20and%20phenomenonTime%20lt%202024-03-19T12:33:00.000000Z;$expand=FeatureOfInterest($select=feature/coordinates,@iot.id);[email protected],result,phenomenonTime,resultQuality),ObservedProperty([email protected],name);[email protected],unitOfMeasurement/name,Observations)

Without the timescaleDB hypertable, the query is much faster (30x) as there is only one table to look into.

Any suggestions on possible improvements? New index, improved URL or different chunks size?
Thanks,

0 replies

hylkevds · 2024-07-16T10:41:48Z

hylkevds
Jul 16, 2024
Maintainer

These two indices are the wrong way around:

CREATE INDEX observations_idx ON public."OBSERVATIONS" USING btree ("ID", "PHENOMENON_TIME_START");
CREATE INDEX observations_id_idx ON public."OBSERVATIONS" USING btree ("ID", "FEATURE_ID");

They first separate the observations based on ID, and then group them by PhenomenonTimeStart / FeatureID. But the ID field is already unique, so the second column never takes effect.
Also, do you have an index on just OBSERVATIONS.ID ? Since it seems to have trouble finding your Observation with a given ID.

0 replies

ndevilleBE · 2024-07-16T15:35:48Z

ndevilleBE
Jul 16, 2024

I changed the indexes as you mentioned and added a new one on "ID". The scan of each chunks is 4x faster but still the API answer is slow and that is because each feature is selected separately.
Am I correct to say that if I select 10.000 observations via the API and uses "$expand=Feature" then the API will translate that to 10.000 sql queries? It seems inefficient.
Any possibility to avoid that by modifying the URL or the API behaviour?
Thanks

1 reply

hylkevds Jul 16, 2024
Maintainer

Yes, that is an optimisation that is still on my TODO list, making it so that for single-entity expands a join on the main query is used instead of separate ones. But that is not a trivial change to make.

ndevilleBE · 2024-07-16T17:38:58Z

ndevilleBE
Jul 16, 2024

ok I understand.
Any possibility to expose a view via the API? So the join is done at the database level?

4 replies

hylkevds Jul 18, 2024
Maintainer

The change was not as large as I feared, but not much less complicated: dc76a3c. This will use the main query for single-entity navigation property expands.
It's currently in the develop-2.x branch, and it would be great if you could test it out. You can try the docker tag develop-2.x-2.4.0-SNAPSHOT.

ndevilleBE Jul 22, 2024

Great, thanks for the update.
The first tests show a massive improvement compared to previous versions! Please let me know if/when you can push it to the production version.

ndevilleBE Sep 5, 2024

Dear @hylkevds,
Any update on this update?
Thanks

hylkevds Sep 5, 2024
Maintainer

It's in version 2.4.0, released recently.

ndevilleBE · 2024-09-09T09:06:27Z

ndevilleBE
Sep 9, 2024

Continuing the work on the TimescaleDB and sensorthings optimization.
The issue with Timescale is that it only allows chunks exclusion if the Time is used in the query. If you try to update an Observation based on its ID, it'll be slow as the query planner will perform a sequential scan over each chunk.
My use case is that I need to update the result_quality value of millions of Observations. I try to do a batch update with ?batch. My initial data looks like:
"requests": [ { "id": "1", "atomicityGroup": "Group1", "method": "patch", "url": "Observations(1171241930)", "body": { "resultQuality": 0 } }, { "id": "2", "atomicityGroup": "Group1", "method": "patch", "url": "Observations(1171241960)", "body": { "resultQuality": 0 } },

But this is slow with TimescalDB. So I tried to include in the URL the time information to exclude the unnecessary chunks with:

"url": "Observations?$filter=phenomenonTime gt 2024-03-19T12:31:00Z and phenomenonTime lt 2024-03-19T13:33:00Z and @iot.id eq 1171242119",
With this approach, I have an error message "message": "PATCH only allowed on Entities."

Any suggestion on how to solve this issue?
Many thanks

4 replies

hylkevds Sep 9, 2024
Maintainer

With an index on "ID" it should not be using a sequential scan. It may still do an index-scan on all chunks, but that should not be slow...

ndevilleBE Sep 9, 2024

I do have an index on ID. For some reasons, some chunks are seq scan and other uses the Index Scan. Still, the ideal scenario for me would be to ignore unnecessary chunks. It updates 'only' 12 data per seconds.
Is it possible to combine Patch and Filter operations?

ndevilleBE Sep 16, 2024

@hylkevds Any update by any chance?
I wonder if the limitation to combine patch and time filter operation is something not foreseen by the standard or rather a technical choice?!
Thanks

hylkevds Sep 16, 2024
Maintainer

That's from OData, Entity updates are sent to the canonical URL of the Entity, and that only contains the primary key.

georghas · 2024-09-13T12:26:02Z

georghas
Sep 13, 2024

Einen wunderschönen buongiorno alle mitenand,

I am also attempting to integrate the TimescaleDB postgres extension into the OBSERVATIONS table.
Like @ndevilleBE, I followed @limond's documentation.
So far it works. However it is very slow.
Primarily CreateObservations operations are very slow. (About 20-30 times slower.)

For example, I sent a POST request to /v1.1/CreateObservations with this body:

CreateObservations body

[
  {
    "Datastream": {
      "@iot.id": 1
    },
    "components": [
      "phenomenonTime",
      "result",
      "FeatureOfInterest/id"
    ],
    "dataArray": [
      ["2017-06-29T09:46:51.754Z",2.0953409217499532e-36,1]
    ]
  }
]

In Postgres I enabled both log_statement and auto_explain.log_analyze. Thus the executed statement and an 'explain analyze' of the executed statement are shown. This is the result of the query:

truncated Postgres log

2024-09-13 09:49:55 2024-09-13 07:49:55.798 UTC [40] LOG:  execute S_8: BEGIN
2024-09-13 09:49:55 2024-09-13 07:49:55.814 UTC [40] LOG:  execute <unnamed>: select count(*) from "DATASTREAMS" where "DATASTREAMS"."ID" = $1
2024-09-13 09:49:55 2024-09-13 07:49:55.814 UTC [40] DETAIL:  parameters: $1 = '1'
2024-09-13 09:49:55 2024-09-13 07:49:55.845 UTC [40] LOG:  duration: 0.619 ms  plan:
2024-09-13 09:49:55     Query Text: select count(*) from "DATASTREAMS" where "DATASTREAMS"."ID" = $1
2024-09-13 09:49:55     Query Parameters: $1 = '1'
2024-09-13 09:49:55     Aggregate  (cost=2.29..2.30 rows=1 width=8) (actual time=0.613..0.614 rows=1 loops=1)
2024-09-13 09:49:55       ->  Seq Scan on "DATASTREAMS"  (cost=0.00..2.29 rows=1 width=0) (actual time=0.608..0.609 rows=1 loops=1)
2024-09-13 09:49:55             Filter: ("ID" = '1'::bigint)
2024-09-13 09:49:55             Rows Removed by Filter: 22
2024-09-13 09:49:55 2024-09-13 07:49:55.845 UTC [40] LOG:  execute <unnamed>: select count(*) from "FEATURES" where "FEATURES"."ID" = $1
2024-09-13 09:49:55 2024-09-13 07:49:55.845 UTC [40] DETAIL:  parameters: $1 = '1'
2024-09-13 09:49:55 2024-09-13 07:49:55.890 UTC [40] LOG:  duration: 0.025 ms  plan:
2024-09-13 09:49:55     Query Text: select count(*) from "FEATURES" where "FEATURES"."ID" = $1
2024-09-13 09:49:55     Query Parameters: $1 = '1'
2024-09-13 09:49:55     Aggregate  (cost=2.37..2.38 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=1)
2024-09-13 09:49:55       ->  Index Only Scan using "FEATURES_pkey" on "FEATURES"  (cost=0.15..2.37 rows=1 width=0) (actual time=0.018..0.019 rows=1 loops=1)
2024-09-13 09:49:55             Index Cond: ("ID" = '1'::bigint)
2024-09-13 09:49:55             Heap Fetches: 1
2024-09-13 09:49:55 2024-09-13 07:49:55.894 UTC [40] LOG:  execute <unnamed>: insert into "OBSERVATIONS" ("RESULT_NUMBER", "RESULT_TYPE", "RESULT_JSON", "PHENOMENON_TIME_END", "FEATURE_ID", "PHENOMENON_TIME_START", "RESULT_STRING", "DATASTREAM_ID", "RESULT_BOOLEAN") values ($1, $2, $3::json, $4, $5, $6, $7, $8, $9) returning "OBSERVATIONS"."ID"
2024-09-13 09:49:55 2024-09-13 07:49:55.894 UTC [40] DETAIL:  parameters: $1 = '2.0953409217499532e-36', $2 = '0', $3 = NULL, $4 = '2017-06-29 09:46:51.754+00', $5 = '1', $6 = '2017-06-29 09:46:51.754+00', $7 = '2.0953409217499532E-36', $8 = '1', $9 = NULL
2024-09-13 09:49:55 2024-09-13 07:49:55.905 UTC [40] LOG:  duration: 10.974 ms  plan:
2024-09-13 09:49:55     Query Text: insert into "OBSERVATIONS" ("RESULT_NUMBER", "RESULT_TYPE", "RESULT_JSON", "PHENOMENON_TIME_END", "FEATURE_ID", "PHENOMENON_TIME_START", "RESULT_STRING", "DATASTREAM_ID", "RESULT_BOOLEAN") values ($1, $2, $3::json, $4, $5, $6, $7, $8, $9) returning "OBSERVATIONS"."ID"
2024-09-13 09:49:55     Query Parameters: $1 = '2.0953409217499532e-36', $2 = '0', $3 = NULL, $4 = '2017-06-29 09:46:51.754+00', $5 = '1', $6 = '2017-06-29 09:46:51.754+00', $7 = '2.0953409217499532E-36', $8 = '1', $9 = NULL
2024-09-13 09:49:55     Custom Scan (HypertableModify)  (cost=0.00..0.01 rows=1 width=199) (actual time=8.706..8.709 rows=1 loops=1)
2024-09-13 09:49:55       ->  Insert on "OBSERVATIONS"  (cost=0.00..0.01 rows=1 width=199) (actual time=8.706..8.709 rows=1 loops=1)
2024-09-13 09:49:55             ->  Custom Scan (ChunkDispatch)  (cost=0.00..0.01 rows=1 width=199) (actual time=6.773..6.774 rows=1 loops=1)
2024-09-13 09:49:55                   ->  Result  (cost=0.00..0.01 rows=1 width=199) (actual time=0.417..0.417 rows=1 loops=1)
2024-09-13 09:49:57 2024-09-13 07:49:57.045 UTC [40] LOG:  execute <unnamed>: select "e0"."RESULT_BOOLEAN", "e0"."RESULT_QUALITY", "e0"."PHENOMENON_TIME_START", "e0"."PARAMETERS", "e0"."DATASTREAM_ID", "e0"."RESULT_STRING", "e0"."RESULT_TYPE", "e0"."VALID_TIME_END", "e0"."PHENOMENON_TIME_END", "e0"."FEATURE_ID", "e0"."ID", "e0"."RESULT_JSON", "e0"."RESULT_TIME", "e0"."RESULT_NUMBER", "e0"."VALID_TIME_START" from "OBSERVATIONS" as "e0" where "e0"."ID" = $1 offset $2 rows fetch next $3 rows only
2024-09-13 09:49:57 2024-09-13 07:49:57.045 UTC [40] DETAIL:  parameters: $1 = '2508554', $2 = '0', $3 = '2'
2024-09-13 09:49:57 2024-09-13 07:49:57.269 UTC [40] LOG:  duration: 211.740 ms  plan:
2024-09-13 09:49:57     Query Text: select "e0"."RESULT_BOOLEAN", "e0"."RESULT_QUALITY", "e0"."PHENOMENON_TIME_START", "e0"."PARAMETERS", "e0"."DATASTREAM_ID", "e0"."RESULT_STRING", "e0"."RESULT_TYPE", "e0"."VALID_TIME_END", "e0"."PHENOMENON_TIME_END", "e0"."FEATURE_ID", "e0"."ID", "e0"."RESULT_JSON", "e0"."RESULT_TIME", "e0"."RESULT_NUMBER", "e0"."VALID_TIME_START" from "OBSERVATIONS" as "e0" where "e0"."ID" = $1 offset $2 rows fetch next $3 rows only
2024-09-13 09:49:57     Query Parameters: $1 = '2508554', $2 = '0', $3 = '2'
2024-09-13 09:49:57     Limit  (cost=0.28..5.28 rows=2 width=170) (actual time=154.645..211.726 rows=1 loops=1)
2024-09-13 09:49:57       ->  Append  (cost=0.28..1829.32 rows=731 width=170) (actual time=154.635..211.715 rows=1 loops=1)
2024-09-13 09:49:57             ->  Index Scan using "_hyper_1_1413_chunk_OBSERVATIONS_ID" on _hyper_1_1413_chunk e0_1  (cost=0.28..2.50 rows=1 width=170) (actual time=1.239..1.239 rows=0 loops=1)
2024-09-13 09:49:57                   Index Cond: ("ID" = '2508554'::bigint)
2024-09-13 09:49:57             ->  Index Scan using "_hyper_1_1414_chunk_OBSERVATIONS_ID" on _hyper_1_1414_chunk e0_2  (cost=0.28..2.50 rows=1 width=170) (actual time=0.758..0.758 rows=0 loops=1)
2024-09-13 09:49:57                   Index Cond: ("ID" = '2508554'::bigint)
2024-09-13 09:49:57             ->  Index Scan using "_hyper_1_1415_chunk_OBSERVATIONS_ID" on _hyper_1_1415_chunk e0_3  (cost=0.28..2.50 rows=1 width=170) (actual time=0.726..0.726 rows=0 loops=1)
2024-09-13 09:49:57                   Index Cond: ("ID" = '2508554'::bigint)

[…]

2024-09-13 09:49:57             ->  Index Scan using "_hyper_1_2143_chunk_OBSERVATIONS_ID" on _hyper_1_2143_chunk e0_731  (cost=0.28..2.50 rows=1 width=170) (actual time=0.257..0.257 rows=0 loops=1)
2024-09-13 09:49:57                   Index Cond: ("ID" = '2508554'::bigint)
2024-09-13 09:49:57 2024-09-13 07:49:57.273 UTC [40] LOG:  execute S_3: COMMIT

The actual insert into operation takes 10.974 ms, which is ok.
However afterwards, a select operation is executed. It takes 211.740 ms, which is not ok.
From my understanding, this is because the select only looks for a certain id, without a time constraint. TimescaleDB does not know in which of their chunks this id is located, so all chunks (in my case around 700 chunks) are queried.

This raises three questions:

Why is this select statement executed in the first place? Looking at the program flow, it appears that:

data is inserted
the same data - that was just inserted and even might still be in memory - is queried

Is this not redundant? Should data not be forwarded internally instead of writing it to the database and then querying it again right after?
I ran the same POST request against a FROST server without the TimescaleDB extension on its underlying database. The same select query only took 0.117 ms. I can definitely see why this change might not be important for the default implementation of FROST.

The parameter for fetch next $3 rows only is set to 2. Why is that? "OBSERVATIONS"."ID" is supposed to be unique. So it seems unnecessary to try to fetch more rows than 1, does it not?
Again, this does not seem to make a difference in the default implementation of the FROST database. However, for a database extended with TimescaleDB it can decrease the query time significantly. With a fetch limit of 1, chunks are only queried until a result is found, the remaining chunks are skipped. With a fetch limit of 2 however, all chunks are always queried because there is never a second result. Again, ID is unique after all.
Can a time constraint be added to the query? For example a simple where "e0"."ID" = 888240 and "e0"."PHENOMENON_TIME_START" >= NOW() - INTERVAL '1 YEAR'? This would make TimescaleDB only query chunks within this time range. Though this is probably stupid because you can not assume the date by the id.
You could attach the time to the id internally, changing it to a combined primary key. But that is probably too much of a change and it might even slow down the standard implementation slightly, would it not? So it is not desired, is it?

Admittedly, I know very little about the implementation of the FROST server, so it's much appreciated if somebody could shed some light on this matter.

Muchas Thanks!

27 replies

ndevilleBE Oct 2, 2024

Hello @georghas,
Sorry for the late reply, I hadn't access to the logs: Here is what I see now

2024-10-02 16:11:20.729 CEST [3327398] stauser@sensorthings LOG:  execute <unnamed>: select count(*) from "OBSERVATIONS" where "OBSERVATIONS"."ID" = $1
2024-10-02 16:11:20.729 CEST [3327398] stauser@sensorthings DETAIL:  parameters: $1 = '833908198'
2024-10-02 16:11:20.732 CEST [3327398] stauser@sensorthings LOG:  execute <unnamed>: select count(*) from "OBSERVATIONS" where "OBSERVATIONS"."ID" = $1
2024-10-02 16:11:20.732 CEST [3327398] stauser@sensorthings DETAIL:  parameters: $1 = '833908198'
2024-10-02 16:11:20.735 CEST [3327398] stauser@sensorthings LOG:  execute <unnamed>: select "e0"."RESULT_BOOLEAN", "e0"."RESULT_QUALITY", "e0"."PHENOMENON_TIME_START", "e0"."PARAMETERS", "e0"."DATASTREAM_ID", "e0"."RESULT_STRING", "e0"."RESULT_TYPE", "e0"."VALID_TIME_END", "e0"."PHENOMENON_TIME_END", "e0"."FEATURE_ID", "e0"."ID", "e0"."RESULT_JSON", "e0"."RESULT_TIME", "e0"."RESULT_NUMBER", "e0"."VALID_TIME_START", "e0"."MULTI_DATASTREAM_ID" from "OBSERVATIONS" as "e0" where "e0"."ID" = $1 offset $2 rows fetch next $3 rows only
2024-10-02 16:11:20.735 CEST [3327398] stauser@sensorthings DETAIL:  parameters: $1 = '833908198', $2 = '0', $3 = '1'
2024-10-02 16:11:20.737 CEST [3327398] stauser@sensorthings LOG:  execute <unnamed>: update "OBSERVATIONS" set "RESULT_QUALITY" = $1::json where "OBSERVATIONS"."ID" = $2 returning "OBSERVATIONS"."RESULT_NUMBER", "OBSERVATIONS"."ID", "OBSERVATIONS"."RESULT_JSON", "OBSERVATIONS"."VALID_TIME_START", "OBSERVATIONS"."MULTI_DATASTREAM_ID", "OBSERVATIONS"."PHENOMENON_TIME_END", "OBSERVATIONS"."PHENOMENON_TIME_START", "OBSERVATIONS"."RESULT_STRING", "OBSERVATIONS"."RESULT_TIME", "OBSERVATIONS"."RESULT_BOOLEAN", "OBSERVATIONS"."PARAMETERS", "OBSERVATIONS"."RESULT_TYPE", "OBSERVATIONS"."RESULT_QUALITY", "OBSERVATIONS"."VALID_TIME_END", "OBSERVATIONS"."FEATURE_ID", "OBSERVATIONS"."DATASTREAM_ID"
2024-10-02 16:11:20.737 CEST [3327398] stauser@sensorthings DETAIL:  parameters: $1 = '4', $2 = '833908198'
2024-10-02 16:11:20.738 CEST [3327398] stauser@sensorthings LOG:  execute S_2: COMMIT

georghas Oct 4, 2024

The UPDATE statement does use RETURNING, but there still is a SELECT.
Most peculiar, the SELECT is before the PATCH operation.
@hylkevds could you share your expertise why that is?

hylkevds Oct 4, 2024
Maintainer

Are you using fine-grained security? That may cause extra select queries.
Or JSON-Patch, but that would do a select-for-update...

It also looks like the check if the entity exists is currently done twice, that can be improved. Though the query cache will probably catch that.

ndevilleBE Oct 4, 2024

I use JSON Patch because I need to update lots of data at once. So I POST a JSON file to the ?batch URL.
Regarding the fine-grained security, I mimicked the example here but it is a bit unclear what the exact difference is between persistence_db_username and auth_db_username. Does it affects the queries??

hylkevds Oct 4, 2024
Maintainer

JSON-Patch != JSON-Batch... One if for Patching json, the other for sending batches of requests. You can use JSON-Patch inside a JSON-Batch :)

That docker-compose is using the normal security bits, handling access-rights based on HTTP actions (GET/POST/PATCH/DELETE) and not on data-details, stating that a user is only allowed to access data when it is related to a certain Thing.
The normal authorisation should not cause additional requests though...

The persistence_db and auth_db distinction exist to allow the user tables to be in a different database from the observational data.

Discussion: FROST-Server and TimescaleDB #538

Replies: 17 comments · 36 replies

hylkevds Feb 26, 2019 Maintainer

hylkevds Apr 6, 2019 Maintainer

hylkevds May 23, 2020 Maintainer

hylkevds Jul 16, 2024 Maintainer

hylkevds Jul 16, 2024 Maintainer

hylkevds Jul 18, 2024 Maintainer

hylkevds Sep 5, 2024 Maintainer

hylkevds Sep 9, 2024 Maintainer

hylkevds Sep 16, 2024 Maintainer

hylkevds Oct 4, 2024 Maintainer

hylkevds Oct 4, 2024 Maintainer

Replies: 17 comments 36 replies

hylkevds
Feb 26, 2019
Maintainer

hylkevds
Apr 6, 2019
Maintainer

hylkevds
May 23, 2020
Maintainer

hylkevds
Jul 16, 2024
Maintainer

hylkevds Jul 16, 2024
Maintainer

hylkevds Jul 18, 2024
Maintainer

hylkevds Sep 5, 2024
Maintainer

hylkevds Sep 9, 2024
Maintainer

hylkevds Sep 16, 2024
Maintainer

hylkevds Oct 4, 2024
Maintainer

hylkevds Oct 4, 2024
Maintainer