Обсуждение: Why Postgres use a little memory on Windows.
Hello
I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram
explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;
Here is result
Nested Loop (cost=319.42..4879348246.58 rows=32820035265 width=1525) (actual time=64656.747..5594654.189 rows=3617090 loops=1) -> Index Scan using sym_data_pkey on sym_data d (cost=0.00..3671742.82 rows=3867095 width=1525) (actual time=9.775..12465.153 rows=3866359 loops=1) Filter: ((channel_id)::text = 'sale_transaction'::text) -> Bitmap Heap Scan on sym_data_gap g (cost=319.42..1133.51 rows=8487 width=8) (actual time=1.438..1.439 rows=1 loops=3866359) Recheck Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id)) Filter: (g.status = 'GP'::bpchar) -> Bitmap Index Scan on sym_data_gap_pkey (cost=0.00..317.30 rows=8487 width=0) (actual time=1.436..1.436 rows=1 loops=3866359) Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
http://explain.depesz.com/s/c3DT
I have run vaccum full. Here is my PostgreSQL config
shared_buffers = 2GB
work_mem = 64MB
maintenance_work_mem = 1GB
wal_buffers = 256
effective_cache_size = 4GB
checkpoint_segments = 256
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 256
random_page_cost = 3.5
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 250
max_locks_per_transaction = 2000
work_mem = 64MB
maintenance_work_mem = 1GB
wal_buffers = 256
effective_cache_size = 4GB
checkpoint_segments = 256
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 256
random_page_cost = 3.5
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 250
max_locks_per_transaction = 2000
When I check taskmanager, I found postgres process is user 4-5MB
What happened with my PostgreSQL. Please help me
Thank you in advance.
Tuan Hoang Anh
On 02/20/2016 08:46 AM, tuanhoanganh wrote: > Hello > > I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram > > explain analyze select d.data_id, d.table_name, d.event_type, > d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, > d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' > from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id > between g.start_id and g.end_id where d.channel_id='sale_transaction' > order by d.data_id asc; Took liberty of reformatting the above here: http://sqlformat.darold.net/ EXPLAIN ANALYZE SELECT d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' FROM sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP' AND d.data_id BETWEEN g.start_id AND g.end_id WHERE d.channel_id = 'sale_transaction' ORDER BY d.data_id ASC; The thing that stands out to me is that I do not see that sym_data and sym_data_gp are actually joined on anything. Also is it possible to see the schema definitions for the two tables? > > Here is result > > Nested Loop (cost=319.42..4879348246.58 rows=32820035265 width=1525) (actual time=64656.747..5594654.189 rows=3617090loops=1) > -> Index Scan using sym_data_pkey on sym_data d (cost=0.00..3671742.82 rows=3867095 width=1525) (actual time=9.775..12465.153rows=3866359 loops=1) > Filter: ((channel_id)::text = 'sale_transaction'::text) > -> Bitmap Heap Scan on sym_data_gap g (cost=319.42..1133.51 rows=8487 width=8) (actual time=1.438..1.439 rows=1 loops=3866359) > Recheck Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id)) > Filter: (g.status = 'GP'::bpchar) > -> Bitmap Index Scan on sym_data_gap_pkey (cost=0.00..317.30 rows=8487 width=0) (actual time=1.436..1.436 rows=1loops=3866359) > Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id)) > > http://explain.depesz.com/s/c3DT > > > I have run vaccum full. Here is my PostgreSQL config > > shared_buffers = 2GB > work_mem = 64MB > maintenance_work_mem = 1GB > wal_buffers = 256 > effective_cache_size = 4GB > checkpoint_segments = 256 > wal_level = hot_standby > max_wal_senders = 5 > wal_keep_segments = 256 > random_page_cost = 3.5 > autovacuum_vacuum_threshold = 1000 > autovacuum_analyze_threshold = 250 > max_locks_per_transaction = 2000 > > When I check taskmanager, I found postgres process is user 4-5MB > > What happened with my PostgreSQL. Please help me > > Thank you in advance. > > Tuan Hoang Anh > > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > Took liberty of reformatting the above here: > ... > FROM > sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP' > AND d.data_id BETWEEN g.start_id > AND g.end_id > WHERE > d.channel_id = 'sale_transaction' > ORDER BY > d.data_id ASC; > The thing that stands out to me is that I do not see that sym_data and > sym_data_gp are actually joined on anything. The "d.data_id BETWEEN g.start_id AND g.end_id" part is a join condition ... but not one that can be handled by either hash or merge join, because those require simple equality join conditions. So the nestloop plan shown here is really about as good as you're going to get without redesigning the query and/or the data representation. It looks like the bitmap heap scan generally returns exactly one row for each outer row, which makes me wonder if the BETWEEN couldn't be replaced with some sort of equality. But that might take some rethinking of the data. regards, tom lane
On Sat, Feb 20, 2016 at 7:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: ..... > FROM > sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP' > AND d.data_id BETWEEN g.start_id > AND g.end_id ..... > The thing that stands out to me is that I do not see that sym_data and > sym_data_gp are actually joined on anything. Yes they are, although the formatting hid it somehow. It is a classic, data_gap defines intervals via start+end id over data, he wants to join every data with the corresponding gap. It is a hard optimization problem without knowing more of the data distributions, maybe the interval types and ginindexes can help him. When faced with this kind of structure, depending on the data distribution, I've solved it via two paralell queries ( gap sorted by start plus end, data sorted by id, sweep them in paralell joining by code, typical tape-update problem, works like a charm for non-overlapping ranges and even for some overlapping ones with a couple of queues ) . And he seems to want all of the data ( sometime this goes faster if you can add a couple of range conditions for data.id / gap.start/end_id. > Also is it possible to see the schema definitions for the two tables? My bet is on somethink like data.id ~serial primary key, gap.start/end_id foreign key to that. Francisco Olarte.
On Sat, Feb 20, 2016 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It looks like the bitmap heap scan generally returns exactly one row for > each outer row, which makes me wonder if the BETWEEN couldn't be replaced > with some sort of equality. Mm, I'm not good reading explains, but that seems to confirm my suspicion that gaps partition the id range in non overlapping ranges. > But that might take some rethinking of the data. If id is a series, gap defines a range, he can do something with an auxiliary table, like select start as a, 0 as b from gaps where status = 'GP' union all select id as a,1 as b from data union all end-1 as a, 2 as b from gaps where status='gp' -- to end-1 to make intervals half open. order by a,b which would give all the ids in a with b=1 surrounded by (0,2) when valid and by (2,0) when invalid. and then, with a creative window clause or a small function, filter that and join with data.id. I suppose adding a third c column, null on b=1 and =b on b=0/2 and selecting the previous non-null in the sequence could do it, but it's somehow above my window-fu, I'm more of a code gouy and would do it with two nested loops on a function. Francisco Olarte.
On 02/20/2016 10:39 AM, Francisco Olarte wrote: > On Sat, Feb 20, 2016 at 7:13 PM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: > ..... >> FROM >> sym_data d INNER JOIN sym_data_gap g ON g.status = 'GP' >> AND d.data_id BETWEEN g.start_id >> AND g.end_id > ..... >> The thing that stands out to me is that I do not see that sym_data and >> sym_data_gp are actually joined on anything. > > Yes they are, although the formatting hid it somehow. > > It is a classic, data_gap defines intervals via start+end id over > data, he wants to join every data with the corresponding gap. It is a > hard optimization problem without knowing more of the data > distributions, maybe the interval types and ginindexes can help him. > When faced with this kind of structure, depending on the data > distribution, I've solved it via two paralell queries ( gap sorted by > start plus end, data sorted by id, sweep them in paralell joining by > code, typical tape-update problem, works like a charm for > non-overlapping ranges and even for some overlapping ones with a > couple of queues ) . And he seems to want all of the data ( sometime > this goes faster if you can add a couple of range conditions for > data.id / gap.start/end_id. Thanks to you and Tom for enlightening me. I am going to have to spend some time puzzling this out to convert what you have shown into something that I can wrap my head around. > >> Also is it possible to see the schema definitions for the two tables? > > My bet is on somethink like data.id ~serial primary key, > gap.start/end_id foreign key to that. > > Francisco Olarte. > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Feb 20, 2016 at 8:46 AM, tuanhoanganh <hatuan05@gmail.com> wrote: > Hello > > I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram 9.0 is no longer supported. You should work toward upgrading to a newer version. It might not solve this problem, but it would give you better tools for diagnosing the problem. Which is a pretty good step toward solving it. > When I check taskmanager, I found postgres process is user 4-5MB Other people have explained the details of how the query is being run and why it is being run that way. But I would like to take a step back from that, and tell you that the reason that PostgreSQL is not using more memory, is that it doesn't think that using more memory would help. Cheers, Jeff
Thanks for all help of everyone.
I have tried to change effective_cache_size = 24GB and it run well.
Tuan Hoang Anh