Обсуждение: Partitioned Database and Choosing Subtables

Поиск
Список
Период
Сортировка

Partitioned Database and Choosing Subtables

От
Bill Thoen
Дата:
I've got a ver 8.4.5 partitioned data base with records organized by US
state, so the partitions are set up by state. When I query this database
and include  the key field that tells postgres what partition you ,
everything works as I expect. It searches only the specified partition,
and it's fast . But that's only if I use a constant, like this:

SELECT lions,  tigers, bears FROM WildLife
WHERE state_pt = 'CO';

What I want to be able to do is put this key value in a table and PG
look in whatever partition the column specifies, like so:

SELECT lions,  tigers, bears, statecode FROM WildLife
WHERE state_pt = statecode;

However when I try anything other than a constant, it search EVERY
partition, sequentially, which is not what I want it to do. So is there
any way to specify the partition to search using a variable/column name?

--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961

Re: Partitioned Database and Choosing Subtables

От
Christophe Pettus
Дата:
On Mar 14, 2011, at 8:30 PM, Bill Thoen wrote:

> I've got a ver 8.4.5 partitioned data base with records organized by US state, so the partitions are set up by state.
WhenI query this database and include  the key field that tells postgres what partition you , everything works as I
expect.It searches only the specified partition, and it's fast . But that's only if I use a constant, like this: 
>
> SELECT lions,  tigers, bears FROM WildLife
> WHERE state_pt = 'CO';
>
> What I want to be able to do is put this key value in a table and PG look in whatever partition the column specifies,
likeso: 
>
> SELECT lions,  tigers, bears, statecode FROM WildLife
> WHERE state_pt = statecode;
>
> However when I try anything other than a constant, it search EVERY partition, sequentially, which is not what I want
itto do. So is there any way to specify the partition to search using a variable/column name? 

To answer the specific question you ask, you can always tell Postgres to search a particular child table:

    SELECT lions, tigers, bears FROM Wildlife_CA ...

But your example seems somewhat unclear to me.  Are both "state_pt" and "statecode" columns in Wildlife?  If so,
Postgresis going to have to search every partition, because it can't just from the partition constraint know which
entrieswill match and which do not until it looks inside every record. 

Or did you mean 'statecode' to be a column in a different table, on which you're joining?

--
-- Christophe Pettus
   xof@thebuild.com


Re: Partitioned Database and Choosing Subtables

От
"Igor Neyman"
Дата:

> -----Original Message-----
> From: Bill Thoen [mailto:bthoen@gisnet.com]
> Sent: Monday, March 14, 2011 11:31 PM
> To: pgsql-general@postgresql.org
> Subject: Partitioned Database and Choosing Subtables
>
> I've got a ver 8.4.5 partitioned data base with records
> organized by US state, so the partitions are set up by state.
> When I query this database and include  the key field that
> tells postgres what partition you , everything works as I
> expect. It searches only the specified partition, and it's
> fast . But that's only if I use a constant, like this:
>
> SELECT lions,  tigers, bears FROM WildLife WHERE state_pt = 'CO';
>
> What I want to be able to do is put this key value in a table
> and PG look in whatever partition the column specifies, like so:
>
> SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
> state_pt = statecode;
>
> However when I try anything other than a constant, it search
> EVERY partition, sequentially, which is not what I want it to
> do. So is there any way to specify the partition to search
> using a variable/column name?
>
> --
> *Bill Thoen*
> GISnet - www.gisnet.com
> 303-786-9961
>

Try using "dynamic" sql:

EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
state_pt = ' || statecode INTO ...;

See:
http://www.postgresql.org/docs/8.4/interactive/plpgsql-statements.html


Regards,
Igor Neyman


Re: Partitioned Database and Choosing Subtables

От
Bill Thoen
Дата:
On 3/15/2011 12:02 PM, Igor Neyman wrote:
>
>
>> -----Original Message-----
>> From: Bill Thoen [mailto:bthoen@gisnet.com]
>> Sent: Monday, March 14, 2011 11:31 PM
>> To: pgsql-general@postgresql.org
>> Subject: Partitioned Database and Choosing Subtables
>>
>> I've got a ver 8.4.5 partitioned data base with records
>> organized by US state, so the partitions are set up by state.
>> When I query this database and include  the key field that
>> tells postgres what partition you , everything works as I
>> expect. It searches only the specified partition, and it's
>> fast . But that's only if I use a constant, like this:
>>
>> SELECT lions,  tigers, bears FROM WildLife WHERE state_pt = 'CO';
>>
>> What I want to be able to do is put this key value in a table
>> and PG look in whatever partition the column specifies, like so:
>>
>> SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
>> state_pt = statecode;
>>
>> However when I try anything other than a constant, it search
>> EVERY partition, sequentially, which is not what I want it to
>> do. So is there any way to specify the partition to search
>> using a variable/column name?
>>
>> --
>> *Bill Thoen*
>> GISnet - www.gisnet.com
>> 303-786-9961
>>
> Try using "dynamic" sql:
>
> EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
> state_pt = ' || statecode INTO ...;
>
> See:
> http://www.postgresql.org/docs/8.4/interactive/plpgsql-statements.html
>
Thanks Igor. It was a nice try -- and I thought it would work, but the
Planner had other plans. Basically, I tried the interactive method using
a PREPARE statement with one text parammeter followed by an EXECUTE
statement. Unfortunately, the query still went rummaging across the
entire database sequentially.

To recap, (I still hope there's a chance for a workable answer) here's a
simple contrived scenario

Incidents Table
-----------
total  integer,          -- total number of human:critter encounters
statecode char(2),       -- by state
outcome char(20)         -- outcome of incident {fatal, serious, minor,
close call}


Wildlife Table           -- the Partitioned table. Uses state_pt to
separate data tables.
                             table  1: state_pt='AK'
                             table  2: state_pt='AL'
                             ...
                             table 50: state_pt='WY'
-----------------
lions integer,           -- total population of lions in the state
tigers integer,          -- total tigers
bears integer,           -- total bears
outcome char(20),        -- outcome of incident {fatal, serious, minor,
close call}
... more attributes...
state_pt                 -- key index that separates the sub tables
inthe partitioned datbase


Some incidents table data:

total |   statecode  | outcome
------+--------------+-------------
    7  |     CA       |  fatal
   12  |     CA       |  close call
    3  |     CO       |  fatal
   16  |     CO       |  close call
    4  |     WY       |  minor



If I do the following it finds the data and goes directly to the right
table.

     SELECT lions, tigers, bears FROM Wildlife
     WHERE state_pt = 'CO';

Or this:
\set theState '\''CO'\''


     SELECT lions, tigers, bears FROM Wildlife
     WHERE state_pt = :theState ;

It works perfectly. But when I tried this:

PREPARE foo (text) AS
     SELECT lions, tigers, bears FROM Wildlife
     WHERE state_pt = $1;

EXECUTE foo( 'CO' )

It sequentially scanned everything, so I didn't even try the scenario
I'm trying to create, which is using the results of a select to supply
the key code to the Planner as to what partition it should search for
reach record.

SELECT lions, tigers , bears FROM Wildlife a JOIN incidents b ON
a.outcome=b.outcome
WHERE state_pt=statecode;

Apparently, the Planner is pretty fussy about how you refer to one of
your subtables, but I almost always use queries that involve multiple
states,and the knowledge of what state table a particular record needs
to access is carried as one of that record's attributes. I'm surprised
that there's so little info on this. I tseems like an obvious
application  of partitioned databases, but maybe I just don't understand
how it should be used?

It seems that the only variables it likes are host variables, and there
maybe something down that road...
Anywya the quuestion is still open if anyone has some good ideas.

Re: Partitioned Database and Choosing Subtables

От
Alban Hertroys
Дата:
>> Try using "dynamic" sql:
>>
>> EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
>> state_pt = ' || statecode INTO ...;

> Thanks Igor. It was a nice try -- and I thought it would work, but the Planner had other plans. Basically, I tried
theinteractive method using a PREPARE statement with one text parammeter followed by an EXECUTE statement.
Unfortunately,the query still went rummaging across the entire database sequentially. 


That's not the same as using dynamic SQL. In fact, that would behave just like your previous, problematic, query.

Dynamic SQL makes that the planner sees a new query each time, that needs to be planned from scratch.

OTOH, using prepared statements the query plan is stored with placeholders for the parameter values at the moment the
PREPAREexecutes. That query plan is already set in stone - it will be a generic plan that can't be optimised for
differentvalues of parameters anymore. Not what you want. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d8069c0235881427912932!



Re: Partitioned Database and Choosing Subtables

От
Bill Thoen
Дата:
On 3/16/2011 12:40 AM, Alban Hertroys wrote:
>>> Try using "dynamic" sql:
>>>
>>> EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
>>> state_pt = ' || statecode INTO ...;
>> Thanks Igor. It was a nice try -- and I thought it would work, but the Planner had other plans. Basically, I tried
theinteractive method using a PREPARE statement with one text parammeter followed by an EXECUTE statement.
Unfortunately,the query still went rummaging across the entire database sequentially. 
>
> That's not the same as using dynamic SQL. In fact, that would behave just like your previous, problematic, query.
>
> Dynamic SQL makes that the planner sees a new query each time, that needs to be planned from scratch.
>
Thanks for explaining this. I think I can just see what you mean. I can
also see that I'm way out of my depth here. It's just so much fun when
you mix deadlines with ignorance.

> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.

Good point. I'm feeling a bit chainsaw-ish right now.

--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961