Обсуждение: Get Postgres to use multiple proc cores?

Поиск
Список
Период
Сортировка

Get Postgres to use multiple proc cores?

От
"Jonathan Hoover"
Дата:
With my hard drive issues from an earlier email solved (thanks all) I am now running queries against my 65M rows. I
noticedin atop that postmaster is using 100% of just one processor core. Is there any way to tell it to use both (or at
leastsome of the other one). System is RHEL5, PG is version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC)4.1.2 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM. 

The query in question has been running for 15 minutes, and the only thing atop is reporting in the "red" is the
processor.Although, just now the proc has died down and the drive is apparently now "red" (the main delay point). Here
isthe query (any help on improving it?) 

------------------------------
|       a     |      b       |
--------------|---------------
|guid vc(32)  | guid vc(32)  |
|unitid vc(15)| unitid vc(15)|
|ftime ts     |              |
|source vc(10)|              |
------------------------------
|65,000,000 rw| 500,000 rows |
------------------------------

SELECT DISTINCT guid FROM b
WHERE guid NOT IN (SELECT DISTINCT guid FROM a) AND
  unitid IN (SELECT DISTINCT unitid FROM a)


Eventually, I want to project b.unitid and group by it, with a count(b.guid).

Re: Get Postgres to use multiple proc cores?

От
Scott Marlowe
Дата:
On Fri, Nov 5, 2010 at 8:59 AM, Jonathan  Hoover <jhoover@yahoo-inc.com> wrote:
> With my hard drive issues from an earlier email solved (thanks all) I am now running queries against my 65M rows. I
noticedin atop that postmaster is using 100% of just one processor core. Is there any way to tell it to use both (or at
leastsome of the other one). System is RHEL5, PG is version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC)4.1.2 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM. 
>
> The query in question has been running for 15 minutes, and the only thing atop is reporting in the "red" is the
processor.Although, just now the proc has died down and the drive is apparently now "red" (the main delay point). Here
isthe query (any help on improving it?) 

A group by is almost always faster than a distinct.

And a single query uses a single core, sorry.

Re: Get Postgres to use multiple proc cores?

От
"Oliveiros d'Azevedo Cristina"
Дата:
Scott,

I'd like to take this chance to ask another (related though) question.

What is faster? A JOIN or a WHERE a IN (SELECT ... )

I've heard that the nested subquery has a tendency to be slower, but I'd
like to check it with people from the list.
Is one faster than the other? Or, all in all, they run at the same speed ?

Best,
Oliveiros

----- Original Message -----
From: "Scott Marlowe" <scott.marlowe@gmail.com>
To: "Jonathan Hoover" <jhoover@yahoo-inc.com>
Cc: <pgsql-admin@postgresql.org>
Sent: Friday, November 05, 2010 3:17 PM
Subject: Re: [ADMIN] Get Postgres to use multiple proc cores?


On Fri, Nov 5, 2010 at 8:59 AM, Jonathan  Hoover <jhoover@yahoo-inc.com>
wrote:
> With my hard drive issues from an earlier email solved (thanks all) I am
> now running queries against my 65M rows. I noticed in atop that postmaster
> is using 100% of just one processor core. Is there any way to tell it to
> use both (or at least some of the other one). System is RHEL5, PG is
> version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
> 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM.
>
> The query in question has been running for 15 minutes, and the only thing
> atop is reporting in the "red" is the processor. Although, just now the
> proc has died down and the drive is apparently now "red" (the main delay
> point). Here is the query (any help on improving it?)

A group by is almost always faster than a distinct.

And a single query uses a single core, sorry.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: Get Postgres to use multiple proc cores?

От
Scott Marlowe
Дата:
On Fri, Nov 5, 2010 at 9:29 AM, Oliveiros d'Azevedo Cristina
<oliveiros.cristina@marktest.pt> wrote:
> Scott,
>
> I'd like to take this chance to ask another (related though) question.
>
> What is faster? A JOIN or a WHERE a IN (SELECT ... )
>
> I've heard that the nested subquery has a tendency to be slower, but I'd
> like to check it with people from the list.
> Is one faster than the other? Or, all in all, they run at the same speed ?

Usually the join is faster.  But there's history to consider.  Back in
the olden days, it was almost always faster to use a join than a where
in.  Then where in got a lot of work, and it's a lot faster.  So, it
depends a bit on which version your running and size of the data set
in the in.  Most of the time it's pretty close, but for larger numbers
of rows in the subselect, the join is usually faster.

Of course the only way to know for sure is to test your query and see.
Make sure to run it multiple times to overcome differences in caching.

Re: Get Postgres to use multiple proc cores?

От
"Kevin Grittner"
Дата:
"Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>
wrote:

> What is faster? A JOIN or a WHERE a IN (SELECT ... )

On 8.4 and later an EXISTS or NOT EXISTS will use semi-join or
anti-join (respectively).  These should usually be much faster than
the IN (SELECT DISTINCT ... ) technique.  Be careful of the
difference in semantics between NOT EXISTS and NOT IN if any of the
columns involved in the matching can be null -- they are *not* the
same thing.

-Kevin