Interesting group by behavior

Поиск
Список
Период
Сортировка
От Culberson, Philip
Тема Interesting group by behavior
Дата
Msg-id 83CC9259BB25D31186F400A0C9975D9C9CE216@datmail02.dat.com
обсуждение исходный текст
Список pgsql-general
Greetings:

I am somewhat baffled by the horrendous time difference between doing two
separate queries versus a single query with a group by clause.  The table in
question has ~2.5 million rows, of which 847,324 are in the date range
selected.  An explain plan on all three queries show that the cost is the
same for each, although the select .. group by takes 50 times longer.

Below I've included output from both PSQL and the postmaster log.  Note that
the select .. group by query had a 100% buffer hit rate but a huge jump in
the "Direct blocks" reads and writes.

I have two question's:

1) What is the difference between the "Shared/Local/Direct" block stats?
2) Is the long running "select .. group by" an undocumented feature (bug)?

Regards,
Phil Culberson

test=> \d posting
Table    = posting
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| date                             | date                             |
4 |
| comp_id                          | int4                             |
4 |
| posting_type                     | char()                           |
1 |
| equip_type                       | char()                           |
2 |
| partial_flag                     | bool                             |
1 |
| origin_id                        | int4                             |
4 |
| dest_id                          | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+
Indices:  date_idx
          dest_idx
          orig_idx

test=> \d date_idx
Table    = date_idx
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| date                             | date                             |
4 |
+----------------------------------+----------------------------------+-----
--+



Log output
==========
991006.12:36:15.659  [1321] StartTransactionCommand
991006.12:36:15.660  [1321] query: SELECT COUNT(posting_type) FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'L';
991006.12:36:15.667  [1321] ProcessQuery
! system usage stats:
!       41.747049 elapsed 39.280000 user 0.240000 system sec
!       [243.440000 user 2.450000 sys total]
!       1/0 [19/6] filesystem blocks in/out
!       474/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [11/13] messages rcvd/sent
!       475/503 [7472/3142] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:       1800 read,          0 written, buffer hit rate
= 99.79%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
991006.12:36:57.408  [1321] CommitTransactionCommand

Elapsed time: ~42 seconds

PSQL output
===========
SELECT COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'L';
 count
------
805124
(1 row)


Log output
==========
991006.12:36:57.418  [1321] StartTransactionCommand
991006.12:36:57.418  [1321] query: SELECT COUNT(posting_type) FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'E';
991006.12:36:57.424  [1321] ProcessQuery
! system usage stats:
!       27.584268 elapsed 27.550000 user 0.000000 system sec
!       [271.000000 user 2.450000 sys total]
!       0/0 [19/6] filesystem blocks in/out
!       0/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [12/14] messages rcvd/sent
!       0/528 [7473/3670] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
991006.12:37:25.004  [1321] CommitTransactionCommand

Elapsed time: ~28 seconds

PSQL output
===========
SELECT COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'E';
count
-----
42200
(1 row)


Log output
==========
991006.12:37:25.014  [1321] StartTransactionCommand
991006.12:37:25.015  [1321] query: SELECT posting_type, COUNT(posting_type)
FROM posting WHERE date BETWEEN 19990801::date AND 19990831::date GROUP BY
posting_type;
991006.12:37:25.019  [1321] ProcessQuery
! system usage stats:
!       3515.408886 elapsed 3508.790000 user 2.050000 system sec
!       [3779.800000 user 4.510000 sys total]
!       4/19 [23/25] filesystem blocks in/out
!       0/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [13/15] messages rcvd/sent
!       41/34863 [7515/38533] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:      10137 read,      10964 written
991006.13:36:00.425  [1321] CommitTransactionCommand

Elapsed time: ~58 MINUTES 35 SECONDS!

PSQL output
===========
SELECT posting_type, COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date
GROUP BY posting_type;
posting_type| count
------------+------
E           | 42200
L           |805124
(2 rows)

В списке pgsql-general по дате отправления:

Предыдущее
От: Max Buvry
Дата:
Сообщение: [Q] TRIGGER-RULE
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] what does this mean - pg_psort.1922.0 in database directory?]