postgresql - Best way to model state changes for point in time queries

Question

Welcome To Ask or Share your Answers For Others

postgresql - Best way to model state changes for point in time queries

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

postgresql - Best way to model state changes for point in time queries

I'm working on a system that needs to be able to find the "state" of an item at a particular time in history. The state is binary (either on or off). In this case it's to determine where to direct (to a particular "keyspace") a piece of timestamped data as determined by the timestamp of the data. I'm having a hard time deciding what the best way to model the data is.

Method 1 is to use the tstzrange with state being implied by the bounds of the range:

create extension btree_gist;
create table core.range_director (
    range tstzrange,
    directee_id text,
    keyspace text,
    -- allow a directee to be directed to multiple keyspaces at once
    exclude using gist (directee_id with =, keyspace with =, range with &&)
);

insert into core.range_director values
    ('[2021-01-15 00:00:00 -0:00,2021-01-20 00:00:00 -0:00)', 'THING_ID', 'KEYSPACE_1'),
    ('[2021-01-15 00:00:00 -0:00,)', 'THING_ID', 'KEYSPACE_2');

select keyspace from core.range_director
    where directee_id = 'THING_ID' and range_director.range @> '2021-01-15'::timestamptz;
-- returns KEYSPACE_1 and KEYSPACE_2
select keyspace from core.range_director
    where directee_id = 'THING_ID' and range_director.range @> '2021-01-21'::timestamptz;
-- returns KEYSPACE_2

Method 2 is to have explicit state changes:

create table core.status_director (
    status_time timestamptz,
    status text,
    directee_id text,
    keyspace text
); -- not sure what pk to use for this method

insert into core.status_director values
    ('2021-01-15 00:00:00 -0:00','Open','THING_ID','KEYSPACE_1'),
    ('2021-01-20 00:00:00 -0:00','Closed','THING_ID','KEYSPACE_1'),
    ('2021-01-15 00:00:00 -0:00','Open','THING_ID','KEYSPACE_2');

select distinct on(keyspace) keyspace, status from core.status_director
    where directee_id = 'THING_ID'
    and status_time < '2021-01-16'
    order by keyspace, status_time desc;
-- returns KEYSPACE_1:Open KEYSPACE_2:Open

select distinct on(keyspace) keyspace, status from core.status_director
    where directee_id = 'THING_ID'
    and status_time < '2021-01-21'
    order by keyspace, status_time desc;
-- returns KEYSPACE_1:Closed, KEYSPACE_2:Open
-- so, client code has to ensure that it only directs to status=Open keyspaces

Maybe there are other methods that would work as well, but these two seem to make the most sense to me. The benefit of the first method is the really easy query, but the down side is that you now have to update rows to close the state whereas in the second method you can just post new states which seems easier.

The table could conceivable grow into thousands or tens of thousands of rows, but will probably not grow into millions (but does the best method change depending on the expected row count?). I have a couple of similar tables with the same point-in-time "state" queries so it's really important that I get the model for them right.

My instinct is to go with Method 1, but are there any footguns or performance considerations that I'm not thinking of that would urge the use case towards Method 2 (or another method I haven't considered?)

question from:https://stackoverflow.com/questions/66068313/best-way-to-model-state-changes-for-point-in-time-queries

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:01:04+0000

No footguns with Method 1, just great big huge cannons. With that method how do you determine the current status. You need to scan each status change and for each one toggle the status, or perhaps use something like "count(*)%2" odd gives one state even another. What happens if any row gets deleted, or data purged and you do not know how many state transactions there were. With the Method 2 you retrieve the greatest date and directly obtain the status.
For myself I would do Method 3. That being Method1 + Method 2. Yes I would have a date range of the status and the status value itself. That gives me complex historical analysis as I have the complete history as well as direct access to current status at any time.

Categories

postgresql - Best way to model state changes for point in time queries

postgresql - Best way to model state changes for point in time queries

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags