Обсуждение: Table design question...

Поиск
Список
Период
Сортировка

Table design question...

От
Joerg Hessdoerfer
Дата:
Hi, all!

Ok, I've banged my head on this, and would like to hear some opinion from the 
list (I'm just short of trying it, though I don't have the hardware yet).

We have along running data logging application, where we essentially get data 
records at varying speeds from a facility.
The data can be treated as arrays of floats, arrays of strings or binary 
dumps, where each element in the arrays represent one value.
Additionally, there is the possibility of having a varying number of 
'multisampler' values in one packet, where each value itself is an array. 
Note, that in one packet the arrays need not be of same length!

So, this gives me at least 2 tables for the data, one that can handle the 
'scalar' data (float, string, binary) and one that needs an array for every 
value (floats only).

All data will have two time stamps (facility and receive times), as data may 
be sent off-line to the logger.

So, all this boils down to the simple question: Is it better (in terms of 
indices) to have a seperate table for the time stamps and join the data to it 
via an foreign key id field, or have both timestamps in each data table?
What if I would create more data tables? 

Please keep in mind, that I expect to have multi-million records in each 
table ;-)

Thx,Joerg
-- 
Leading SW developer  - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW:  http://www.sea-gmbh.com



Re: Table design question...

От
Joerg Hessdoerfer
Дата:
Thank you for your help,

On Tuesday 06 August 2002 11:46, you wrote:
> On Tuesday 06 Aug 2002 10:17 am, Joerg Hessdoerfer wrote:
>
> [ data logging with variable formats for data some of which might be arrays
> for a single item - nice :-/ ]
>

Yep. This is what it boils down to - and noone can tell me in advance what 
the data type for each entry really is... it can even change mid-operation 
(now how's that for design!).

> > So, all this boils down to the simple question: Is it better (in terms of
> > indices) to have a seperate table for the time stamps and join the data
> > to it via an foreign key id field, or have both timestamps in each data
> > table? What if I would create more data tables?
>
> First instinct is to have timestamps, data-type and reference in one table
> and separate tables for each data-type for the data.
> Second instinct - have one table with null columns for unused types and
> take a hit on the wasted space.
>
> But - it really depends on how you're going to use the data. Will you be
> listing all entries between time X and Y or viewing entries of a specific
> type? Do you have any feel for what the common operations are going to be?
>
> - Richard Huxton

Well, the access is mainly 'export dumps' in a special format, where varying 
data items need to be exported from between two time stamps. So there was my 
idea using the arrays coming from. I'd have something like

CREATE TABLE data_float(
times_id    int4 references times(id),
names   int4[],
values  float[]
);

Where names would index into a table with the name->id mapping, because I 
could throw out all unneeded data items after the SELECT(), thus having much 
less data overhead (and much less records, we're talking ~500 data items per 
packet - varying, of course ;-).

But is the above better than 

CREATE TABLE data_float(
scet    timestamp without time zone,
recv    timestamp without time zone,
names   int4[],
values  float[]
);

given I have at least two data tables (to acommodate for the 'array in one 
value' data items, which are much less common (1/250, roughly)).

Thanks,Joerg
-- 
Leading SW developer  - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW:  http://www.sea-gmbh.com


Re: Table design question...

От
Richard Huxton
Дата:
On Tuesday 06 Aug 2002 11:35 am, Joerg Hessdoerfer wrote:
> Thank you for your help,
>
> On Tuesday 06 August 2002 11:46, you wrote:
> > On Tuesday 06 Aug 2002 10:17 am, Joerg Hessdoerfer wrote:
> >
> > [ data logging with variable formats for data some of which might be
> > arrays for a single item - nice :-/ ]
>
> Yep. This is what it boils down to - and noone can tell me in advance what
> the data type for each entry really is... it can even change mid-operation
> (now how's that for design!).

> Well, the access is mainly 'export dumps' in a special format, where
> varying data items need to be exported from between two time stamps. So
> there was my idea using the arrays coming from. I'd have something like
>
> CREATE TABLE data_float(
> times_id    int4 references times(id),
> names   int4[],
> values  float[]
> );

> But is the above better than
>
> CREATE TABLE data_float(
> scet    timestamp without time zone,
> recv    timestamp without time zone,
> names   int4[],
> values  float[]
> );
>
> given I have at least two data tables (to acommodate for the 'array in one
> value' data items, which are much less common (1/250, roughly)).

If your exports are all of one type (i.e. all arrays or all integer or all
float) I'd go for (B) since that makes things nice and easy. On the other
hand, if you want to export different types in one query you'll need to do a
union unless you put all possible fields in one table:

CREATE TABLE data_all ( scet        timestamp, recv    timestamp, data_type    char(1) CHECK (data_type IN
('i','f','a')),intval    int4, floatval    float, arr_names    int4[], arr_values    int4[] 
);

If you go the route of (A) or (B) I'd run a few tests to see which is more
efficient. You can always hide the implementation details behind a view and
some functions.

- Richard Huxton


Re: Table design question...

От
Joerg Hessdoerfer
Дата:
Hi again,

On Tuesday 06 August 2002 14:59, you wrote:
[...]
> If your exports are all of one type (i.e. all arrays or all integer or all
> float) I'd go for (B) since that makes things nice and easy. On the other
> hand, if you want to export different types in one query you'll need to do
> a union unless you put all possible fields in one table:
>
> CREATE TABLE data_all (
>   scet        timestamp,
>   recv    timestamp,
>   data_type    char(1) CHECK (data_type IN ('i','f','a')),
>   intval    int4,
>   floatval    float,
>   arr_names    int4[],
>   arr_values    int4[]
> );
>
> If you go the route of (A) or (B) I'd run a few tests to see which is more
> efficient. You can always hide the implementation details behind a view and
> some functions.
>
> - Richard Huxton
>
[...]

Yes, I'm currently going the testing route - my initial idea was also 
something like your table def above, but this rules itself out (I really 
think 500,000,000 rows is a bit too much ;-). Don't get me wrong, I think a 
'big iron' could manage that easily, but I need to make do with a quite 
common PeeCee (running Linux, that is)...

Thanks again for your input,Joerg
-- 
Leading SW developer  - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW:  http://www.sea-gmbh.com