Обсуждение: multiple table scan performance

Поиск
Список
Период
Сортировка

multiple table scan performance

От
Samuel Gendler
Дата:
I've got some functionality that necessarily must scan a relatively large table.  Even worse, the total workload is actually 3 similar, but different queries, each of which requires a table scan.  They all have a resultset that has the same structure, and all get inserted into a temp table.  Is there any performance benefit to revamping the workload such that it issues a single:

insert into (...) select ... UNION select ... UNION select

as opposed to 3 separate "insert into () select ..." statements.

I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the staging system at the moment.  Also, it requires revamping a fair bit of code, so I figured it never hurts to ask.  I don't have a sense of whether postgres is able to parallelize multiple subqueries via a single scan

Re: multiple table scan performance

От
Claudio Freire
Дата:
On Tue, Mar 29, 2011 at 7:16 PM, Samuel Gendler
<sgendler@ideasculptor.com> wrote:
> Is there any performance benefit to revamping the workload such that it issues
> a single:
> insert into (...) select ... UNION select ... UNION select
> as opposed to 3 separate "insert into () select ..." statements.

I wouldn't expect any difference - if you used UNION ALL (union will
be equivalent to insert into ()  select DISTINCT ...)

Re: multiple table scan performance

От
Marti Raudsepp
Дата:
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler <sgendler@ideasculptor.com> wrote:
> I've got some functionality that necessarily must scan a relatively large table

> Is there any performance benefit to revamping the workload such that it issues
> a single:
> insert into (...) select ... UNION select ... UNION select
> as opposed to 3 separate "insert into () select ..." statements.

Apparently not, as explained by Claudio Freire. This seems like missed
opportunity for the planner, however. If it scanned all three UNION
subqueries in parallel, the synchronized seqscans feature would kick
in and the physical table would only be read once, instead of 3 times.

(I'm assuming that seqscan disk access is your bottleneck)

You can trick Postgres (8.3.x and newer) into doing it in parallel
anyway: open 3 separate database connections and issue each of these
'INSERT INTO ... SELECT' parts separately.  This way all the queries
should execute in about 1/3 the time, compared to running them in one
session or with UNION ALL.

Regards,
Marti

Re: multiple table scan performance

От
Samuel Gendler
Дата:


On Tue, Mar 29, 2011 at 5:05 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler <sgendler@ideasculptor.com> wrote:

You can trick Postgres (8.3.x and newer) into doing it in parallel
anyway: open 3 separate database connections and issue each of these
'INSERT INTO ... SELECT' parts separately.  This way all the queries
should execute in about 1/3 the time, compared to running them in one
session or with UNION ALL.

That's a good idea, but forces a lot of infrastructural change on me.  I'm inserting into a temp table, then deleting everything from another table before copying over.  I could insert into an ordinary table, but then I've got to deal with ensuring that everything is properly cleaned up, etc.  Since nothing is actually blocked, waiting for the queries to return, I think I'll just let them churn for now. It won't make much difference in production, where the whole table will fit easily into cache.  I just wanted things to be faster in my dev environment.

 

Regards,
Marti

Re: multiple table scan performance

От
Craig James
Дата:
On 3/29/11 3:16 PM, Samuel Gendler wrote:
> I've got some functionality that necessarily must scan a relatively large table.  Even worse, the total workload is
actually3 similar, but different queries, each of which requires a table scan.  They all have a resultset that has the
samestructure, and all get inserted into a temp table.  Is there any performance benefit to revamping the workload such
thatit issues a single: 
>
> insert into (...) select ... UNION select ... UNION select
>
> as opposed to 3 separate "insert into () select ..." statements.
>
> I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the
stagingsystem at the moment.  Also, it requires revamping a fair bit of code, so I figured it never hurts to ask.  I
don'thave a sense of whether postgres is able to parallelize multiple subqueries via a single scan 
You don't indicate how complex your queries are.  If it's just a single table and the conditions are relatively simple,
couldyou do something like this? 

   insert into (...) select ... where (...) OR (...) OR (...)

Craig