Обсуждение: filtering based on table of start/end times
Hi,
At first glance, this seemed simple to implement, but this is giving me
a bit of a headache.
Say we have a table as follows:
CREATE TABLE voltage_series
( voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), "time" timestamp
withouttime zone NOT NULL, voltage numeric, CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id));
So it contains a time series of voltage measurements. Now suppose we
have another table of start/end times that we'd like to use to filter
out (or keep) records in voltage_series:
CREATE TABLE voltage_log
( record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp without time
zoneNOT NULL, time_end timestamp without time zone NOT NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id));
where each record represents start/end times where the voltage
measurement should be removed/kept. The goal is to retrieve the records
in voltage_series that are not included in any of the periods defined by
the start/end times in voltage_log.
I've looked at the OVERLAPS operator, but it's not evident to me whether
that is the best approach. Any tips would be appreciated.
Cheers,
--
Seb
<br /><div class="moz-cite-prefix">On 11/7/2014 3:12 PM, Seb wrote:<br /></div><blockquote
cite="mid:8761eqbwip.fsf@net82.ceos.umanitoba.ca"type="cite"><pre wrap="">Hi,
At first glance, this seemed simple to implement, but this is giving me
a bit of a headache.
Say we have a table as follows:
CREATE TABLE voltage_series
( voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), "time" timestamp
withouttime zone NOT NULL, voltage numeric, CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id));
So it contains a time series of voltage measurements. Now suppose we
have another table of start/end times that we'd like to use to filter
out (or keep) records in voltage_series:
CREATE TABLE voltage_log
( record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp without time
zoneNOT NULL, time_end timestamp without time zone NOT NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id));
where each record represents start/end times where the voltage
measurement should be removed/kept. The goal is to retrieve the records
in voltage_series that are not included in any of the periods defined by
the start/end times in voltage_log.
I've looked at the OVERLAPS operator, but it's not evident to me whether
that is the best approach. Any tips would be appreciated.
Cheers,
</pre></blockquote><br /> Hello Seb,<br /><br /> Any reason this won't work for you?<br /><br /><blockquote><tt>SELECT
*</tt><br/><tt> FROM voltage_series</tt><br /><tt> WHERE time NOT IN (SELECT DISTINCT time FROM voltage_series,
voltage_logWHERE time BETWEEN time_beg AND time_end);</tt><br /></blockquote><br /> Might not be the fastest way to do
it,if the tables are large. Apologies if I've not understood your question properly.<br /><br /> Regards,<br />
Bryan<br/><br /><br /><br /><pre class="moz-signature" cols="72">--
______________
Postdoctoral Researcher
GA Cooperative Fish & Wildlife Research Unit
Warnell School of Forestry & Natural Resources
University of Georgia
Athens 30602-2152</pre>
I already sent this but used a wrong address. Sorry for the mess.
On 07.11.2014 21:12, Seb wrote:
> Hi,
>
> At first glance, this seemed simple to implement, but this is giving me
> a bit of a headache.
>
> Say we have a table as follows:
>
> CREATE TABLE voltage_series
> (
> voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass),
> "time" timestamp without time zone NOT NULL,
> voltage numeric,
> CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id));
>
> So it contains a time series of voltage measurements. Now suppose we
> have another table of start/end times that we'd like to use to filter
> out (or keep) records in voltage_series:
>
> CREATE TABLE voltage_log
> (
> record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass),
> time_beg timestamp without time zone NOT NULL,
> time_end timestamp without time zone NOT NULL,
> CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id));
>
> where each record represents start/end times where the voltage
> measurement should be removed/kept. The goal is to retrieve the records
> in voltage_series that are not included in any of the periods defined by
> the start/end times in voltage_log.
>
> I've looked at the OVERLAPS operator, but it's not evident to me whether
> that is the best approach. Any tips would be appreciated.
>
> Cheers,
>
Something like this should work:
SELECT *
FROM voltage_series AS vs
LEFT JOIN voltage_log vl ON vs.time BETWEEN vl.time_beg AND vl.time_end
WHERE vl.id IS NULL
This is untested, but I think it should work.
greetings
Tim
On Sat, 08 Nov 2014 11:26:53 +0100,
Tim Schumacher <tim@bandenkrieg.hacked.jp> wrote:
> I already sent this but used a wrong address. Sorry for the mess.
> On 07.11.2014 21:12, Seb wrote:
>> Hi,
>> At first glance, this seemed simple to implement, but this is giving
>> me a bit of a headache.
>> Say we have a table as follows:
>> CREATE TABLE voltage_series ( voltage_record_id integer NOT NULL
>> DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass),
>> "time" timestamp without time zone NOT NULL, voltage numeric,
>> CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id));
>> So it contains a time series of voltage measurements. Now suppose we
>> have another table of start/end times that we'd like to use to filter
>> out (or keep) records in voltage_series:
>> CREATE TABLE voltage_log ( record_id integer NOT NULL DEFAULT
>> nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp
>> without time zone NOT NULL, time_end timestamp without time zone NOT
>> NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id));
>> where each record represents start/end times where the voltage
>> measurement should be removed/kept. The goal is to retrieve the
>> records in voltage_series that are not included in any of the periods
>> defined by the start/end times in voltage_log.
>> I've looked at the OVERLAPS operator, but it's not evident to me
>> whether that is the best approach. Any tips would be appreciated.
>> Cheers,
> Something like this should work:
> SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON vs.time
> BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL
> This is untested, but I think it should work.
Thank you all for your suggestions. The above proved very fast with
the millions of records and several other joins involved.
--
Seb
On 07.11.2014 21:12, Seb wrote:
> Hi,
>
> At first glance, this seemed simple to implement, but this is giving me
> a bit of a headache.
>
> Say we have a table as follows:
>
> CREATE TABLE voltage_series
> (
> voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass),
> "time" timestamp without time zone NOT NULL,
> voltage numeric,
> CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id));
>
> So it contains a time series of voltage measurements. Now suppose we
> have another table of start/end times that we'd like to use to filter
> out (or keep) records in voltage_series:
>
> CREATE TABLE voltage_log
> (
> record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass),
> time_beg timestamp without time zone NOT NULL,
> time_end timestamp without time zone NOT NULL,
> CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id));
>
> where each record represents start/end times where the voltage
> measurement should be removed/kept. The goal is to retrieve the records
> in voltage_series that are not included in any of the periods defined by
> the start/end times in voltage_log.
>
> I've looked at the OVERLAPS operator, but it's not evident to me whether
> that is the best approach. Any tips would be appreciated.
>
> Cheers,
>
Something like this should work:
SELECT *
FROM voltage_series AS vs
LEFT JOIN voltage_log vl ON vs.time BETWEEN vl.time_beg AND vl.time_end
WHERE vl.id IS NULL
This is untested, but I think it should work.
greetings
Tim
On Sun, 09 Nov 2014 12:43:22 -0600, Seb <spluque@gmail.com> wrote: > On Sat, 08 Nov 2014 11:26:53 +0100, > Tim Schumacher <tim@bandenkrieg.hacked.jp> wrote: [...] >> Something like this should work: >> SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON >> vs.time BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL >> This is untested, but I think it should work. > Thank you all for your suggestions. The above proved very fast with > the millions of records and several other joins involved. Sorry to come back with a related issue, which is proving troublesome. There's another log table, that looks just like voltage_log, but has an additional column with an integer indicating what problem occurred during the period: CREATE TABLE voltage_diagnostic_log ( record_id serial, time_beg timestamp without time zone NOT NULL, time_end timestamp without time zone NOT NULL, diagnosticinteger, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); So that a view can be built for the voltage_series table where its columns can be adjusted based on the diagnostic integer (there are other columns besides voltage in the actual table), if the time stamp falls within a period of the voltage_diagnostic_log table. To illustrate, the view needs to be able to have field definitions such as (pseudo-code): CASE WHEN diagnostic=1 THEN voltage * 0.88 ELSE voltage END AS voltage_corrected, CASE WHEN diagnostic=2 THEN pressure - 2.5 ELSE pressure END AS pressure_corrected, The problem is that each record in voltage_series can have several matching records in voltage_diagnostic_log. Any insights welcome! -- Seb
On Wed, 12 Nov 2014 15:49:57 -0600, Seb <spluque@gmail.com> wrote: > Sorry to come back with a related issue, which is proving troublesome. > There's another log table, that looks just like voltage_log, but has > an additional column with an integer indicating what problem occurred > during the period: > CREATE TABLE voltage_diagnostic_log ( record_id serial, time_beg > timestamp without time zone NOT NULL, time_end timestamp without time > zone NOT NULL, diagnostic integer, CONSTRAINT voltage_log_pkey PRIMARY > KEY (record_id)); [...] For posterity's sake, the only solution I was able to find was to first create a view with a separate boolean column for each diagnostic value via crosstab(). From there it was possible to use a WITH subquery to remove rows with a particular diagnostic value (as suggested previously), and then have the main SELECT statement do a second left join to the crosstab view so that it could make use of the rest of the boolean columns as sources for CASE statements for each field. Cheers, -- Seb