Performance of tqueue.c's tuple remapping logic
От | Tom Lane |
---|---|
Тема | Performance of tqueue.c's tuple remapping logic |
Дата | |
Msg-id | 32763.1469821037@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
$SUBJECT sucks. Create a table containing lots of composite arrays: regression=# create table foo (f1 int8_tbl[]); CREATE TABLE regression=# insert into foo select array[row(1,2),row(3,4)]::int8_tbl[] from generate_series (1,10000000); INSERT 0 10000000 regression=# vacuum analyze foo; VACUUM Establish a baseline for how long it takes to scan this table: regression=# explain analyze select f1 from foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Seq Scanon foo (cost=0.00..263935.06 rows=10000006 width=101) (actual time=0.027..1461.236 rows=10000000 loops=1)Planning time:0.149 msExecution time: 1996.995 ms (3 rows) ... or select a non-composite value out of it: regression=# explain analyze select f1[1].q1 from foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------------Seq Scanon foo (cost=0.00..263935.06 rows=10000006 width=8) (actual time=1.122..3736.121 rows=10000000 loops=1)Planning time:0.077 msExecution time: 4285.872 ms (3 rows) Now let's try those same queries in parallel mode: regression=# set force_parallel_mode to 1; SET regression=# explain analyze select f1[1].q1 from foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------Gather (cost=1000.00..1264935.66 rows=10000006 width=8) (actual time=11.402..12753.782 rows=10000000 loops=1) Workers Planned:1 Workers Launched: 1 Single Copy: true -> Seq Scan on foo (cost=0.00..263935.06 rows=10000006 width=8) (actualtime=0.182..4523.724 rows=10000000 loops=1)Planning time: 0.081 msExecution time: 13736.321 ms (7 rows) regression=# explain analyze select f1 from foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------Gather (cost=1000.00..1264935.66 rows=10000006 width=101) (actual time=6.659..22693.798 rows=10000000 loops=1) Workers Planned:1 Workers Launched: 1 Single Copy: true -> Seq Scan on foo (cost=0.00..263935.06 rows=10000006 width=101) (actualtime=0.780..2579.707 rows=10000000 loops=1)Planning time: 0.073 msExecution time: 25925.709 ms (7 rows) So, having to do record-type remapping in the fairly trivial case of two-element composite arrays nearly doubles the already rather excessive runtime for a parallel query returning lots of data. Just to add insult to injury, the backend's memory consumption bloats to something over 5.5G during that last query. Which is not terribly surprising given all the cavalier use of TopMemoryContext in tqueue.c. Since the entire point of parallel query is to deal with large data volumes, I think this means that tqueue.c is simply unfit for production use as it stands. I propose to undertake a thorough code review, and possibly significant rewrite, of tqueue.c over the next few days. regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: David FetterДата:
Сообщение: Re: "Strong sides of MySQL" talk from PgDay16Russia, translated
Следующее
От: Peter EisentrautДата:
Сообщение: Re: "Strong sides of MySQL" talk from PgDay16Russia, translated