Обсуждение: Shall I use PostgreSQL Array Type in The Following Case

Поиск

Список

Период

Сортировка

Shall I use PostgreSQL Array Type in The Following Case

От

Yan Cheng Cheok

Дата:

04 января 2010 г., 21:14:37

I realize there is Array data type for PostgreSQL.

http://www.postgresql.org/docs/8.1/interactive/arrays.html

Currently, I need to use database to store measurement result of a semiconductor factory.

They are producing semicondutor units. Every semicondutor units can have variable number of measurement parameters.

I plan to design the table in the following way.

    SemicondutorComponent
    =====================
    ID |


    Measurement
    =================
    ID | Name | Value | SemicondutorComponent_ID

Example of data :

    SemicondutorComponent
    =====================
    1 |
    2 |

    Measurement
    =================
    1 | Width       | 0.001 | 1
    2 | Height      | 0.021 | 1
    3 | Thickness   | 0.022 | 1
    4 | Pad0_Length | 0.031 | 1
    5 | Pad1_Width  | 0.041 | 1
    6 | Width       | 0.001 | 2
    7 | Height      | 0.021 | 2
    8 | Thickness   | 0.022 | 2
    9 | Pad0_Length | 0.031 | 2
    10| Pad1_Width  | 0.041 | 2
    11| Pad2_Width  | 0.041 | 2
    12| Lead0_Width | 0.041 | 2

Assume a factory is producing 24 million units in 1 day

SemicondutorComponent table will have 24 million rows in 1 day

Assume one SemicondutorComponent unit is having 50 measurement parameters. (can be more or can be less, depending on
SemicondutorComponenttype) 

Measurement table will have 24 * 50 million rows in 1 day

Is it efficient to design that way?

**I wish to have super fast write speed, and reasonable fast read speed from the database.**

Or shall I make use of PostgreSQL Array facility?

    SemicondutorComponent
    =====================
    ID | Array_of_measurement_name | Array_of_measurement_value

Re: Shall I use PostgreSQL Array Type in The Following Case

От

Sam Mason

Дата:

05 января 2010 г., 10:25:11

On Mon, Jan 04, 2010 at 05:12:56PM -0800, Yan Cheng Cheok wrote:
> Measurement table will have 24 * 50 million rows in 1 day

> Is it efficient to design that way?
>
> **I wish to have super fast write speed, and reasonable fast read speed from the database.**

When writing software there's (almost) always a trade-off between
development time and resulting performance.  If you want best
performance, I'd go for a table per "unit type" but this obviously
requires more implementation effort to maintain all these tables.

The data rates you're talking about means that you're going to have to
put quite a bit of effort into performance, doing a simple EAV style
solution you suggested isn't going to scale very well.  I'd guess
you're talking about a minimum of 70GB of data per day for your initial
suggestion, whereas a table per unit type will take it to about 10% of
this.

> Or shall I make use of PostgreSQL Array facility?

That may help a bit, but read performance is going to be pretty bad.

--
  Sam  http://samason.me.uk/

Re: Shall I use PostgreSQL Array Type in The Following Case

От

Grzegorz Jaśkiewicz

Дата:

05 января 2010 г., 10:37:18

the array type is first and foremost the storage type. It can be
managed to be used to add 3rd dimension of access to database, but
that will always come with certain price.

Re: Shall I use PostgreSQL Array Type in The Following Case

От

"Gauthier, Dave"

Дата:

05 января 2010 г., 11:01:01

I actually tackled a problem very much like this in the distant past with a different DB.  I think one of the practical
questionsyou have to ask is whether or not you really need all that detailed data, or would storing summarized data
serve.

If (for example) the components are fabricated on wafers and in "lots", you might just want your DB loading script to
determine/calculatethe count, min, max, median, avg, standard deviation, 6-sigma,  etc... for each parametric test
(e.g."width") per wafer/lot and store that in your DB.  If the end users are going to be looking at wafer/lot stats for
thesemillions of pieces of data as opposed to individual chips, you'll actually be serving them better because the stat
datawill be "pre-calculated" (if you will) and query out a lot faster.  And you won't be pounding your DB so hard in
theprocess. 

Something to think about !




-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Yan Cheng Cheok
Sent: Monday, January 04, 2010 8:13 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Shall I use PostgreSQL Array Type in The Following Case

I realize there is Array data type for PostgreSQL.

http://www.postgresql.org/docs/8.1/interactive/arrays.html

Currently, I need to use database to store measurement result of a semiconductor factory.

They are producing semicondutor units. Every semicondutor units can have variable number of measurement parameters.

I plan to design the table in the following way.

    SemicondutorComponent
    =====================
    ID |


    Measurement
    =================
    ID | Name | Value | SemicondutorComponent_ID

Example of data :

    SemicondutorComponent
    =====================
    1 |
    2 |

    Measurement
    =================
    1 | Width       | 0.001 | 1
    2 | Height      | 0.021 | 1
    3 | Thickness   | 0.022 | 1
    4 | Pad0_Length | 0.031 | 1
    5 | Pad1_Width  | 0.041 | 1
    6 | Width       | 0.001 | 2
    7 | Height      | 0.021 | 2
    8 | Thickness   | 0.022 | 2
    9 | Pad0_Length | 0.031 | 2
    10| Pad1_Width  | 0.041 | 2
    11| Pad2_Width  | 0.041 | 2
    12| Lead0_Width | 0.041 | 2

Assume a factory is producing 24 million units in 1 day

SemicondutorComponent table will have 24 million rows in 1 day

Assume one SemicondutorComponent unit is having 50 measurement parameters. (can be more or can be less, depending on
SemicondutorComponenttype) 

Measurement table will have 24 * 50 million rows in 1 day

Is it efficient to design that way?

**I wish to have super fast write speed, and reasonable fast read speed from the database.**

Or shall I make use of PostgreSQL Array facility?

    SemicondutorComponent
    =====================
    ID | Array_of_measurement_name | Array_of_measurement_value







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Shall I use PostgreSQL Array Type in The Following Case

Shall I use PostgreSQL Array Type in The Following Case

Re: Shall I use PostgreSQL Array Type in The Following Case

Re: Shall I use PostgreSQL Array Type in The Following Case

Re: Shall I use PostgreSQL Array Type in The Following Case