Обсуждение: [GENERAL] Bad planning data resulting in OOM killing of postgres
I'm having trouble with purges related to a large table. The delete query consumes ram until postgres crashes due to OOM. I have a very large table called log_raw. There are half a dozen related tables, such as 'urls' and 'titles'. log_raw.urlid = urls.urlid and urls contains the text of the various urls, for example. Each time I try to purge these side tables the unit OOM's. psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid ); QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────── Delete on titles (cost=22166473.44..24850954.67 rows=870382 width=12) -> Hash Anti Join (cost=22166473.44..24850954.67 rows=870382 width=12) Hash Cond: (titles.titleid = log_raw.titleid) -> Seq Scan on titles (cost=0.00..17871.64 rows=870664 width=10) -> Hash (cost=12744792.64..12744792.64 rows=542011264 width=10) -> Seq Scan on log_raw (cost=0.00..12744792.64 rows=542011264 width=10) (6 rows) psql:postgres@cipafilter = select count(*) from (select titleid from log_raw group by titleid) as a; count ──────── 872210 (1 row) cipafilter=# select n_distinct from pg_stats where tablename = 'log_raw' and attname = 'titleid'; n_distinct ------------ 282 (1 row) The planning data is wildly low for each of these fields, and I wonder if because of that error the planner thinks it can keep all these id's in ram while it works. Analyze doesn't fix it. Increasing the statistics target improves the data in n_distinct but not considerably, as increasing it 3 or 4 fold leads to it still being wildly off. ALTER TABLE set n_distinct doesn't seem to be used by the planner as it doesn't change any of the plans I've generated or seem to be taken into account in the row estimates. I'm out of ideas. Anybody have any ideas? -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
Hi
2017-02-13 18:40 GMT+01:00 David Hinkle <hinkle@cipafilter.com>:
I'm having trouble with purges related to a large table. The delete
query consumes ram until postgres crashes due to OOM. I have a very
large table called log_raw. There are half a dozen related tables,
such as 'urls' and 'titles'. log_raw.urlid = urls.urlid and urls
contains the text of the various urls, for example.
Each time I try to purge these side tables the unit OOM's.
psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS
( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid );
QUERY PLAN
──────────────────────────────────────────────────────────── ─────────────────────────────
Delete on titles (cost=22166473.44..24850954.67 rows=870382 width=12)
-> Hash Anti Join (cost=22166473.44..24850954.67 rows=870382 width=12)
Hash Cond: (titles.titleid = log_raw.titleid)
-> Seq Scan on titles (cost=0.00..17871.64 rows=870664 width=10)
-> Hash (cost=12744792.64..12744792.64 rows=542011264 width=10)
-> Seq Scan on log_raw (cost=0.00..12744792.64
rows=542011264 width=10)
(6 rows)
what is your work_mem setting?
Regards
Pavel
psql:postgres@cipafilter = select count(*) from (select titleid from
log_raw group by titleid) as a;
count
────────
872210
(1 row)
cipafilter=# select n_distinct from pg_stats where tablename =
'log_raw' and attname = 'titleid';
n_distinct
------------
282
(1 row)
The planning data is wildly low for each of these fields, and I wonder
if because of that error the planner thinks it can keep all these id's
in ram while it works. Analyze doesn't fix it. Increasing the
statistics target improves the data in n_distinct but not
considerably, as increasing it 3 or 4 fold leads to it still being
wildly off. ALTER TABLE set n_distinct doesn't seem to be used by the
planner as it doesn't change any of the plans I've generated or seem
to be taken into account in the row estimates. I'm out of ideas.
Anybody have any ideas?
--
David Hinkle
Senior Software Developer
Phone: 800.243.3729x3000
Email: hinkle@cipafilter.com
Hours: Mon-Fri 8:00AM-5:00PM (CT)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Feb 13, 2017 at 9:40 AM, David Hinkle <hinkle@cipafilter.com> wrote:
I'm having trouble with purges related to a large table. The delete
query consumes ram until postgres crashes due to OOM. I have a very
large table called log_raw. There are half a dozen related tables,
such as 'urls' and 'titles'. log_raw.urlid = urls.urlid and urls
contains the text of the various urls, for example.
Each time I try to purge these side tables the unit OOM's.
psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS
( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid );
QUERY PLAN
──────────────────────────────────────────────────────────── ─────────────────────────────
Delete on titles (cost=22166473.44..24850954.67 rows=870382 width=12)
-> Hash Anti Join (cost=22166473.44..24850954.67 rows=870382 width=12)
Hash Cond: (titles.titleid = log_raw.titleid)
-> Seq Scan on titles (cost=0.00..17871.64 rows=870664 width=10)
-> Hash (cost=12744792.64..12744792.64 rows=542011264 width=10)
-> Seq Scan on log_raw (cost=0.00..12744792.64
rows=542011264 width=10)
(6 rows)
psql:postgres@cipafilter = select count(*) from (select titleid from
log_raw group by titleid) as a;
count
────────
872210
(1 row)
cipafilter=# select n_distinct from pg_stats where tablename =
'log_raw' and attname = 'titleid';
n_distinct
------------
282
(1 row)
The planning data is wildly low for each of these fields, and I wonder
if because of that error the planner thinks it can keep all these id's
in ram while it works. Analyze doesn't fix it. Increasing the
statistics target improves the data in n_distinct but not
considerably, as increasing it 3 or 4 fold leads to it still being
wildly off. ALTER TABLE set n_distinct doesn't seem to be used by the
planner as it doesn't change any of the plans I've generated or seem
to be taken into account in the row estimates. I'm out of ideas.
Anybody have any ideas?
Your data on log_raw.titleid is probably clustered, so that any given page of the table all has the same value for titleid. This really messes up the sampling algorithm used by ANALYZE. To overcome that, you would have to increase the statistics target by 3 or 4 orders of magnitude, not a factor of 3 or 4.
However, that doesn't seem to be the actual problem. Surprisingly enough, a hash anti-join doesn't automatically de-duplicate the hash table as it is being built. So n_distinct correctly does not have an influence on the estimated RAM usage, because it doesn't influence the actual ram usage either.
It sounds like your work_mem is set way too high. What is it set to? And what version of PostgreSQL are you using?
Cheers,
Jeff
Thanks guys, here's the information you requested: psql:postgres@cipafilter = show work_mem; work_mem ────────── 10MB (1 row) psql:postgres@cipafilter = select version(); version ─────────────────────────────────────────────────────────────────────────────────────────────────────── PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit (1 row) On Mon, Feb 13, 2017 at 1:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 13, 2017 at 9:40 AM, David Hinkle <hinkle@cipafilter.com> wrote: >> >> I'm having trouble with purges related to a large table. The delete >> query consumes ram until postgres crashes due to OOM. I have a very >> large table called log_raw. There are half a dozen related tables, >> such as 'urls' and 'titles'. log_raw.urlid = urls.urlid and urls >> contains the text of the various urls, for example. >> >> Each time I try to purge these side tables the unit OOM's. >> >> psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS >> ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid ); >> QUERY PLAN >> >> ───────────────────────────────────────────────────────────────────────────────────────── >> Delete on titles (cost=22166473.44..24850954.67 rows=870382 width=12) >> -> Hash Anti Join (cost=22166473.44..24850954.67 rows=870382 >> width=12) >> Hash Cond: (titles.titleid = log_raw.titleid) >> -> Seq Scan on titles (cost=0.00..17871.64 rows=870664 >> width=10) >> -> Hash (cost=12744792.64..12744792.64 rows=542011264 width=10) >> -> Seq Scan on log_raw (cost=0.00..12744792.64 >> rows=542011264 width=10) >> (6 rows) >> >> psql:postgres@cipafilter = select count(*) from (select titleid from >> log_raw group by titleid) as a; >> count >> ──────── >> 872210 >> (1 row) >> >> cipafilter=# select n_distinct from pg_stats where tablename = >> 'log_raw' and attname = 'titleid'; >> n_distinct >> ------------ >> 282 >> (1 row) >> >> The planning data is wildly low for each of these fields, and I wonder >> if because of that error the planner thinks it can keep all these id's >> in ram while it works. Analyze doesn't fix it. Increasing the >> statistics target improves the data in n_distinct but not >> considerably, as increasing it 3 or 4 fold leads to it still being >> wildly off. ALTER TABLE set n_distinct doesn't seem to be used by the >> planner as it doesn't change any of the plans I've generated or seem >> to be taken into account in the row estimates. I'm out of ideas. >> Anybody have any ideas? > > > > Your data on log_raw.titleid is probably clustered, so that any given page > of the table all has the same value for titleid. This really messes up the > sampling algorithm used by ANALYZE. To overcome that, you would have to > increase the statistics target by 3 or 4 orders of magnitude, not a factor > of 3 or 4. > > However, that doesn't seem to be the actual problem. Surprisingly enough, a > hash anti-join doesn't automatically de-duplicate the hash table as it is > being built. So n_distinct correctly does not have an influence on the > estimated RAM usage, because it doesn't influence the actual ram usage > either. > > It sounds like your work_mem is set way too high. What is it set to? And > what version of PostgreSQL are you using? > > Cheers, > > Jeff -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
On Mon, Feb 13, 2017 at 11:53 AM, David Hinkle <hinkle@cipafilter.com> wrote:
Thanks guys, here's the information you requested:
psql:postgres@cipafilter = show work_mem;
work_mem
──────────
10MB
(1 row)
OK, new theory then. Do you have triggers on or foreign key constraints to the table you are deleting from? It queues up each deleted row to go back and fire the trigger or validate the constraint at the end of the statement. You might need to drop the constraint, or delete in smaller batches by adding some kind of dummy condition to the WHERE clause which you progressively move.
Or select the rows you want to keep into a new table, and then drop the old one, rename the new one, and rebuild any constraints or indexes and other dependencies. This can be pretty annoying if there a lot of them.
Cheers,
Jeff
Thanks Jeff, No triggers or foreign key constrains: psql:postgres@cipafilter = \d+ titles Table "public.titles" Column │ Type │ Modifiers │ Storage │ Stats target │ Description ─────────┼───────────────────┼──────────────────────────────────────────────────────────┼──────────┼──────────────┼───────────── title │ character varying │ │ extended │ │ titleid │ integer │ not null default nextval('titles_titleid_seq'::regclass) │ plain │ │ Indexes: "titles_pkey" PRIMARY KEY, btree (titleid) "titles_md5_title_idx" btree (md5(title::text)) Do you see anything in there that would be problematic? On Mon, Feb 13, 2017 at 2:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 13, 2017 at 11:53 AM, David Hinkle <hinkle@cipafilter.com> > wrote: >> >> Thanks guys, here's the information you requested: >> >> psql:postgres@cipafilter = show work_mem; >> work_mem >> ────────── >> 10MB >> (1 row) > > > > OK, new theory then. Do you have triggers on or foreign key constraints to > the table you are deleting from? It queues up each deleted row to go back > and fire the trigger or validate the constraint at the end of the statement. > You might need to drop the constraint, or delete in smaller batches by > adding some kind of dummy condition to the WHERE clause which you > progressively move. > > Or select the rows you want to keep into a new table, and then drop the old > one, rename the new one, and rebuild any constraints or indexes and other > dependencies. This can be pretty annoying if there a lot of them. > > Cheers, > > Jeff -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle <hinkle@cipafilter.com> wrote:
Thanks Jeff,
No triggers or foreign key constrains:
psql:postgres@cipafilter = \d+ titles
Table "public.titles"
Column │ Type │ Modifiers
│ Storage │ Stats target │ Description
─────────┼───────────────────┼────────────────────────────── ────────────────────────────┼─ ─────────┼──────────────┼───── ────────
title │ character varying │
│ extended │ │
titleid │ integer │ not null default
nextval('titles_titleid_seq'::regclass) │ plain │ │
Indexes:
"titles_pkey" PRIMARY KEY, btree (titleid)
"titles_md5_title_idx" btree (md5(title::text))
Do you see anything in there that would be problematic?
I'm out of ideas here. What happens if you just select the rows, rather than deleting them? Does it have memory problems then? If not, can you post the explain (analyze, buffers) of doing that?
Cheers,
Jeff
psql:postgres@cipafilter = EXPLAIN (ANALYZE, BUFFERS) select titleid from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid ); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Nope, that pops too. The query runs for a long time at a somewhat normal rate of ram consumption, using ~1G of RSS then suddenly spikes to about 6G, at which point the OOM killer pops it. Box has 8G of ram and 4G of swap. On Mon, Feb 13, 2017 at 3:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle <hinkle@cipafilter.com> > wrote: >> >> Thanks Jeff, >> >> No triggers or foreign key constrains: >> >> psql:postgres@cipafilter = \d+ titles >> Table "public.titles" >> Column │ Type │ Modifiers >> │ Storage │ Stats target │ Description >> >> ─────────┼───────────────────┼──────────────────────────────────────────────────────────┼──────────┼──────────────┼───────────── >> title │ character varying │ >> │ extended │ │ >> titleid │ integer │ not null default >> nextval('titles_titleid_seq'::regclass) │ plain │ │ >> Indexes: >> "titles_pkey" PRIMARY KEY, btree (titleid) >> "titles_md5_title_idx" btree (md5(title::text)) >> >> Do you see anything in there that would be problematic? > > > > I'm out of ideas here. What happens if you just select the rows, rather > than deleting them? Does it have memory problems then? If not, can you > post the explain (analyze, buffers) of doing that? > > Cheers, > > Jeff -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
I managed to get this version to finish: psql:postgres@cipafilter = explain (ANALYZE, BUFFERS) select count(*) from (select titleid from log_raw group by titleid) as a; QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Aggregate (cost=14099827.15..14099827.16 rows=1 width=0) (actual time=248805.549..248805.549 rows=1 loops=1) Buffers: shared hit=598 read=7324082 dirtied=34 -> HashAggregate (cost=14099820.80..14099823.62 rows=282 width=4) (actual time=248504.756..248760.382 rows=874750 loops=1) Group Key: log_raw.titleid Buffers: shared hit=598 read=7324082 dirtied=34 -> Seq Scan on log_raw (cost=0.00..12744792.64 rows=542011264 width=4) (actual time=0.002..145554.907 rows=544654818 loops=1) Buffers: shared hit=598 read=7324082 dirtied=34 Planning time: 0.072 ms Execution time: 248807.285 ms (9 rows) On Mon, Feb 13, 2017 at 3:47 PM, David Hinkle <hinkle@cipafilter.com> wrote: > psql:postgres@cipafilter = EXPLAIN (ANALYZE, BUFFERS) select titleid > from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE > log_raw.titleid = titles.titleid ); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > Nope, that pops too. The query runs for a long time at a somewhat > normal rate of ram consumption, using ~1G of RSS then suddenly spikes > to about 6G, at which point the OOM killer pops it. Box has 8G of ram > and 4G of swap. > > On Mon, Feb 13, 2017 at 3:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle <hinkle@cipafilter.com> >> wrote: >>> >>> Thanks Jeff, >>> >>> No triggers or foreign key constrains: >>> >>> psql:postgres@cipafilter = \d+ titles >>> Table "public.titles" >>> Column │ Type │ Modifiers >>> │ Storage │ Stats target │ Description >>> >>> ─────────┼───────────────────┼──────────────────────────────────────────────────────────┼──────────┼──────────────┼───────────── >>> title │ character varying │ >>> │ extended │ │ >>> titleid │ integer │ not null default >>> nextval('titles_titleid_seq'::regclass) │ plain │ │ >>> Indexes: >>> "titles_pkey" PRIMARY KEY, btree (titleid) >>> "titles_md5_title_idx" btree (md5(title::text)) >>> >>> Do you see anything in there that would be problematic? >> >> >> >> I'm out of ideas here. What happens if you just select the rows, rather >> than deleting them? Does it have memory problems then? If not, can you >> post the explain (analyze, buffers) of doing that? >> >> Cheers, >> >> Jeff > > > > -- > David Hinkle > > Senior Software Developer > > Phone: 800.243.3729x3000 > > Email: hinkle@cipafilter.com > > Hours: Mon-Fri 8:00AM-5:00PM (CT) -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
I manually updated the pg_statistics data by literally set it to an appropriate amount, and the planner picked a new plan and the new plan worked. Any idea what I should do about this? Is manually updating these values my best bet? psql:daemon@cipafilter = update pg_statistic set stadistinct = 850000 where starelid = 16678; psql:daemon@cipafilter = EXPLAIN (ANALYZE, BUFFERS) delete from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid ); QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Delete on titles (cost=111982485.81..117653772.60 rows=26291 width=12) (actual time=726777.769..726777.769 rows=0 loops=1) Buffers: shared hit=769918 read=7323762 dirtied=1, temp read=1944017 written=2177366 -> Merge Anti Join (cost=111982485.81..117653772.60 rows=26291 width=12) (actual time=726777.766..726777.766 rows=0 loops=1) Merge Cond: (titles.titleid = log_raw.titleid) Buffers: shared hit=769918 read=7323762 dirtied=1, temp read=1944017 written=2177366 -> Index Scan using titles_pkey on titles (cost=0.42..44451.84 rows=876291 width=10) (actual time=0.006..316.375 rows=874829 loops=1) Buffers: shared hit=769000 dirtied=1 -> Materialize (cost=109899462.67..112609518.99 rows=542011264 width=10) (actual time=545806.339..687770.786 rows=425297005 loops=1) Buffers: shared hit=918 read=7323762, temp read=1944017 written=2177366 -> Sort (cost=109899462.67..111254490.83 rows=542011264 width=10) (actual time=545806.336..652402.716 rows=425297005 loops=1) Sort Key: log_raw.titleid Sort Method: external merge Disk: 11013224kB Buffers: shared hit=918 read=7323762, temp read=1944017 written=2177366 -> Seq Scan on log_raw (cost=0.00..12744792.64 rows=542011264 width=10) (actual time=0.003..274920.970 rows=544670242 loops=1) Buffers: shared hit=918 read=7323762 Planning time: 0.158 ms Execution time: 727533.213 ms (17 rows) On Mon, Feb 13, 2017 at 3:57 PM, David Hinkle <hinkle@cipafilter.com> wrote: > I managed to get this version to finish: > > psql:postgres@cipafilter = explain (ANALYZE, BUFFERS) select count(*) > from (select titleid from log_raw group by titleid) as a; > QUERY PLAN > ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > Aggregate (cost=14099827.15..14099827.16 rows=1 width=0) (actual > time=248805.549..248805.549 rows=1 loops=1) > Buffers: shared hit=598 read=7324082 dirtied=34 > -> HashAggregate (cost=14099820.80..14099823.62 rows=282 width=4) > (actual time=248504.756..248760.382 rows=874750 loops=1) > Group Key: log_raw.titleid > Buffers: shared hit=598 read=7324082 dirtied=34 > -> Seq Scan on log_raw (cost=0.00..12744792.64 > rows=542011264 width=4) (actual time=0.002..145554.907 rows=544654818 > loops=1) > Buffers: shared hit=598 read=7324082 dirtied=34 > Planning time: 0.072 ms > Execution time: 248807.285 ms > (9 rows) > > On Mon, Feb 13, 2017 at 3:47 PM, David Hinkle <hinkle@cipafilter.com> wrote: >> psql:postgres@cipafilter = EXPLAIN (ANALYZE, BUFFERS) select titleid >> from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE >> log_raw.titleid = titles.titleid ); >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> >> Nope, that pops too. The query runs for a long time at a somewhat >> normal rate of ram consumption, using ~1G of RSS then suddenly spikes >> to about 6G, at which point the OOM killer pops it. Box has 8G of ram >> and 4G of swap. >> >> On Mon, Feb 13, 2017 at 3:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle <hinkle@cipafilter.com> >>> wrote: >>>> >>>> Thanks Jeff, >>>> >>>> No triggers or foreign key constrains: >>>> >>>> psql:postgres@cipafilter = \d+ titles >>>> Table "public.titles" >>>> Column │ Type │ Modifiers >>>> │ Storage │ Stats target │ Description >>>> >>>> ─────────┼───────────────────┼──────────────────────────────────────────────────────────┼──────────┼──────────────┼───────────── >>>> title │ character varying │ >>>> │ extended │ │ >>>> titleid │ integer │ not null default >>>> nextval('titles_titleid_seq'::regclass) │ plain │ │ >>>> Indexes: >>>> "titles_pkey" PRIMARY KEY, btree (titleid) >>>> "titles_md5_title_idx" btree (md5(title::text)) >>>> >>>> Do you see anything in there that would be problematic? >>> >>> >>> >>> I'm out of ideas here. What happens if you just select the rows, rather >>> than deleting them? Does it have memory problems then? If not, can you >>> post the explain (analyze, buffers) of doing that? >>> >>> Cheers, >>> >>> Jeff >> >> >> >> -- >> David Hinkle >> >> Senior Software Developer >> >> Phone: 800.243.3729x3000 >> >> Email: hinkle@cipafilter.com >> >> Hours: Mon-Fri 8:00AM-5:00PM (CT) > > > > -- > David Hinkle > > Senior Software Developer > > Phone: 800.243.3729x3000 > > Email: hinkle@cipafilter.com > > Hours: Mon-Fri 8:00AM-5:00PM (CT) -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
David Hinkle <hinkle@cipafilter.com> writes: > Thanks guys, here's the information you requested: > psql:postgres@cipafilter = show work_mem; > work_mem > ────────── > 10MB > (1 row) [ squint... ] It should absolutely not have tried to hash a 500M-row table if it thought work_mem was only 10MB. I wonder if there's an integer-overflow problem or something like that. regards, tom lane
I wrote: > David Hinkle <hinkle@cipafilter.com> writes: >> Thanks guys, here's the information you requested: >> psql:postgres@cipafilter = show work_mem; >> work_mem >> ────────── >> 10MB >> (1 row) > [ squint... ] It should absolutely not have tried to hash a 500M-row > table if it thought work_mem was only 10MB. I wonder if there's an > integer-overflow problem or something like that. Ah, nah, scratch that: I was thinking of the hash aggregation case, where there's no run-time ability to spill to disk so the planner will not risk using hash aggregation if it estimates the hash table would exceed work_mem. Hash joins do have the ability to restrict memory consumption by increasing the number of batches, so the planner doesn't worry about it in that case. I think what must be happening is that there's some one value of log_raw.titleid that occurs a ridiculous number of times, so that the executor is unable to split up that particular hash bucket, leading to OOM when it tries to load all those rows to process the hash bucket. The planner does attempt to estimate the worst-case bucket size, but in what now seems like brain fade, it doesn't do more with that information than charge an appropriate number of tuple comparisons. That would somewhat discourage use of a hash join, but a merge join on this many tuples would be pretty expensive too, so it's not overly surprising that it went with hashing anyway. I am thinking we ought to fix it so it rejects (or at least heavily penalizes) a hash join if it estimates that the rows containing the inner side's most common value wouldn't all fit in work_mem. What's not completely certain however is whether this diagnosis is accurate for your case, or whether the proposed remedy would fix it. It would be useful to see the contents of pg_stats.most_common_freqs for log_raw.titleid. regards, tom lane
On Mon, Feb 13, 2017 at 03:47:08PM -0600, David Hinkle wrote: > Nope, that pops too. The query runs for a long time at a somewhat > normal rate of ram consumption, using ~1G of RSS then suddenly spikes > to about 6G, at which point the OOM killer pops it. Box has 8G of ram > and 4G of swap. By any chance: - when it happens has the kernel considered using swap ? - which kernel are you running ? There's been (for some workloads) massive problems with RAM exhaustion / swapping / OOM killer going wild with 4.7/4.8/some 4.9 kernels. Just a shot in the dark, Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Wed, Feb 15, 2017 at 01:04:51PM +0100, Karsten Hilbert wrote: > > Nope, that pops too. The query runs for a long time at a somewhat > > normal rate of ram consumption, using ~1G of RSS then suddenly spikes > > to about 6G, at which point the OOM killer pops it. Box has 8G of ram > > and 4G of swap. > > By any chance: > > - when it happens has the kernel considered using swap ? > > - which kernel are you running ? > > There's been (for some workloads) massive problems with RAM > exhaustion / swapping / OOM killer going wild with > 4.7/4.8/some 4.9 kernels. I guess what I'm trying to say is that it may actually not be PostgreSQL's fault but rather the kernel invoking the OOM killer way prematurely. Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Thanks for your help! Karsten: The system does fill up swap before it blows up. This particular model has 8G of ram and 4G of swap and runs kernel 4.4.0-53-generic #74~14.04.1-Ubuntu. Tom, there are three columns in this table that exhibit the problem, here is the statistics data after an analyze, and the real data to compare it to. cipafilter=# select attname, n_distinct, most_common_freqs from pg_stats where tablename = 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname = 'hostid'); attname | n_distinct | most_common_freqs ---------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- urlid | 38963 | {0.0188,0.009,0.00853333,0.0079,0.00686667,0.0068,0.00616667,0.0061,0.00606667,0.005,0.0049,0.00483333,0.00463333,0.00456667,0.0044,0.0044,0.0039,0.0032,0.00263333,0.00263333,0.00256667,0.00256667,0.00246667,0.0023,0.0023,0.00223333,0.00203333,0.002,0.002,0.002,0.0019,0.00186667,0.00183333,0.0018,0.0018,0.00176667,0.00176667,0.00176667,0.00176667,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.0016,0.00156667,0.00153333,0.0015,0.00146667,0.00143333,0.0014,0.0014,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00133333,0.0013,0.0013,0.0013,0.0013,0.00126667,0.00126667,0.00123333,0.00123333,0.00123333,0.00123333,0.0012,0.0012,0.00113333,0.0011,0.00106667,0.00106667,0.001,0.001,0.001,0.001,0.000966667,0.000966667,0.000966667,0.000966667,0.000966667,0.000933333,0.000933333,0.000933333,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.000866667,0.000866667,0.000866667,0.000866667} hostid | 2478 | {0.2109,0.0330667,0.0321667,0.0245,0.0223,0.0177667,0.0165,0.0152,0.0151333,0.014,0.0132,0.0115667,0.00993333,0.00963333,0.00873333,0.00853333,0.00853333,0.00746667,0.0074,0.00653333,0.0065,0.0065,0.00646667,0.00513333,0.00506667,0.00503333,0.00496667,0.00493333,0.0049,0.00486667,0.00483333,0.00453333,0.0044,0.00433333,0.00426667,0.00413333,0.00396667,0.00386667,0.00383333,0.00363333,0.00363333,0.00363333,0.00346667,0.00323333,0.00323333,0.0032,0.00316667,0.00303333,0.00303333,0.0029,0.00286667,0.00273333,0.00273333,0.00266667,0.00263333,0.00263333,0.0026,0.0026,0.0025,0.00243333,0.00243333,0.0024,0.00236667,0.00226667,0.00223333,0.00216667,0.00213333,0.0021,0.0021,0.00206667,0.00206667,0.00203333,0.002,0.002,0.00193333,0.0019,0.0019,0.00186667,0.00186667,0.00186667,0.0018,0.0018,0.0018,0.00176667,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00166667,0.00163333,0.00163333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667} titleid | 292 | {0.767167} (3 rows) I have to patch the pg_stats table to get postgres to run the following queries without crashing: cipafilter=# UPDATE pg_statistic AS s cipafilter-# SET stadistinct = (select reltuples from pg_class where relname = 'titles') cipafilter-# FROM pg_class c, pg_attribute a where c.oid = s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and cipafilter-# relname = 'log_raw' and attname = 'titleid'; UPDATE 1 cipafilter=# UPDATE pg_statistic AS s cipafilter-# SET stadistinct = (select reltuples from pg_class where relname = 'urls') cipafilter-# FROM pg_class c, pg_attribute a where c.oid = s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and cipafilter-# relname = 'log_raw' and attname = 'urlid'; UPDATE 1 cipafilter=# UPDATE pg_statistic AS s cipafilter-# SET stadistinct = (select reltuples from pg_class where relname = 'hosts') cipafilter-# FROM pg_class c, pg_attribute a where c.oid = s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and cipafilter-# relname = 'log_raw' and attname = 'hostid'; UPDATE 1 cipafilter=# select attname, n_distinct from pg_stats where tablename = 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname = 'hostid'); attname | n_distinct ---------+------------- urlid | 1.51452e+08 hostid | 303756 titleid | 879485 cipafilter=# select titleid, count(titleid) from log_raw group by titleid order by count(titleid) desc limit 10; titleid | count ---------+----------- 1 | 423848049 49547 | 403432 238 | 188151 12 | 151640 5 | 149524 6196 | 139445 32014 | 123828 200 | 88682 58921 | 86451 10 | 84264 (10 rows) cipafilter=# select urlid, count(urlid) from log_raw group by urlid order by count(urlid) desc limit 10; urlid | count --------+---------- 129991 | 10843088 1 | 4953757 21 | 4345503 2765 | 4266981 12 | 3654127 920 | 3609054 1135 | 3562185 20 | 3495023 283567 | 3019675 2494 | 2655301 (10 rows) cipafilter=# select hostid, count(hostid) from log_raw group by hostid order by count(hostid) desc limit 10; hostid | count --------+----------- 7 | 117342686 5 | 18016481 53 | 17408992 57 | 12947564 543 | 12698269 1 | 10068246 127544 | 8746204 27 | 8618595 40 | 8507278 36 | 7424412 (10 rows) Fun fact, hostid 7 is 'google.com'. On Wed, Feb 15, 2017 at 6:11 AM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Wed, Feb 15, 2017 at 01:04:51PM +0100, Karsten Hilbert wrote: > >> > Nope, that pops too. The query runs for a long time at a somewhat >> > normal rate of ram consumption, using ~1G of RSS then suddenly spikes >> > to about 6G, at which point the OOM killer pops it. Box has 8G of ram >> > and 4G of swap. >> >> By any chance: >> >> - when it happens has the kernel considered using swap ? >> >> - which kernel are you running ? >> >> There's been (for some workloads) massive problems with RAM >> exhaustion / swapping / OOM killer going wild with >> 4.7/4.8/some 4.9 kernels. > > I guess what I'm trying to say is that it may actually not be > PostgreSQL's fault but rather the kernel invoking the OOM > killer way prematurely. > > Karsten > -- > GPG key ID E4071346 @ eu.pool.sks-keyservers.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
Maybe also useful to know: cipafilter=# select reltuples from pg_class where relname = 'log_raw'; reltuples ------------- 5.40531e+08 (1 row) On Wed, Feb 15, 2017 at 7:55 PM, David Hinkle <hinkle@cipafilter.com> wrote: > Thanks for your help! > > Karsten: The system does fill up swap before it blows up. This > particular model has 8G of ram and 4G of swap and runs kernel > 4.4.0-53-generic #74~14.04.1-Ubuntu. > > Tom, there are three columns in this table that exhibit the problem, > here is the statistics data after an analyze, and the real data to > compare it to. > > cipafilter=# select attname, n_distinct, most_common_freqs from > pg_stats where tablename = 'log_raw' and (attname = 'urlid' or attname > = 'titleid' or attname = 'hostid'); > attname | n_distinct | > > > > > > > most_common_freqs > ---------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > urlid | 38963 | > {0.0188,0.009,0.00853333,0.0079,0.00686667,0.0068,0.00616667,0.0061,0.00606667,0.005,0.0049,0.00483333,0.00463333,0.00456667,0.0044,0.0044,0.0039,0.0032,0.00263333,0.00263333,0.00256667,0.00256667,0.00246667,0.0023,0.0023,0.00223333,0.00203333,0.002,0.002,0.002,0.0019,0.00186667,0.00183333,0.0018,0.0018,0.00176667,0.00176667,0.00176667,0.00176667,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.0016,0.00156667,0.00153333,0.0015,0.00146667,0.00143333,0.0014,0.0014,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00133333,0.0013,0.0013,0.0013,0.0013,0.00126667,0.00126667,0.00123333,0.00123333,0.00123333,0.00123333,0.0012,0.0012,0.00113333,0.0011,0.00106667,0.00106667,0.001,0.001,0.001,0.001,0.000966667,0.000966667,0.000966667,0.000966667,0.000966667,0.000933333,0.000933333,0.000933333,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.000866667,0.000866667,0.000866667,0.000866667} > hostid | 2478 | > {0.2109,0.0330667,0.0321667,0.0245,0.0223,0.0177667,0.0165,0.0152,0.0151333,0.014,0.0132,0.0115667,0.00993333,0.00963333,0.00873333,0.00853333,0.00853333,0.00746667,0.0074,0.00653333,0.0065,0.0065,0.00646667,0.00513333,0.00506667,0.00503333,0.00496667,0.00493333,0.0049,0.00486667,0.00483333,0.00453333,0.0044,0.00433333,0.00426667,0.00413333,0.00396667,0.00386667,0.00383333,0.00363333,0.00363333,0.00363333,0.00346667,0.00323333,0.00323333,0.0032,0.00316667,0.00303333,0.00303333,0.0029,0.00286667,0.00273333,0.00273333,0.00266667,0.00263333,0.00263333,0.0026,0.0026,0.0025,0.00243333,0.00243333,0.0024,0.00236667,0.00226667,0.00223333,0.00216667,0.00213333,0.0021,0.0021,0.00206667,0.00206667,0.00203333,0.002,0.002,0.00193333,0.0019,0.0019,0.00186667,0.00186667,0.00186667,0.0018,0.0018,0.0018,0.00176667,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00166667,0.00163333,0.00163333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667} > titleid | 292 | {0.767167} > (3 rows) > > I have to patch the pg_stats table to get postgres to run the > following queries without crashing: > > cipafilter=# UPDATE pg_statistic AS s > cipafilter-# SET stadistinct = (select reltuples from pg_class > where relname = 'titles') > cipafilter-# FROM pg_class c, pg_attribute a where c.oid = > s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and > cipafilter-# relname = 'log_raw' and attname = 'titleid'; > UPDATE 1 > cipafilter=# UPDATE pg_statistic AS s > cipafilter-# SET stadistinct = (select reltuples from pg_class > where relname = 'urls') > cipafilter-# FROM pg_class c, pg_attribute a where c.oid = > s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and > cipafilter-# relname = 'log_raw' and attname = 'urlid'; > UPDATE 1 > cipafilter=# UPDATE pg_statistic AS s > cipafilter-# SET stadistinct = (select reltuples from pg_class > where relname = 'hosts') > cipafilter-# FROM pg_class c, pg_attribute a where c.oid = > s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and > cipafilter-# relname = 'log_raw' and attname = 'hostid'; > UPDATE 1 > > cipafilter=# select attname, n_distinct from pg_stats where tablename > = 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname = > 'hostid'); > attname | n_distinct > ---------+------------- > urlid | 1.51452e+08 > hostid | 303756 > titleid | 879485 > > cipafilter=# select titleid, count(titleid) from log_raw group by > titleid order by count(titleid) desc limit 10; > titleid | count > ---------+----------- > 1 | 423848049 > 49547 | 403432 > 238 | 188151 > 12 | 151640 > 5 | 149524 > 6196 | 139445 > 32014 | 123828 > 200 | 88682 > 58921 | 86451 > 10 | 84264 > (10 rows) > > cipafilter=# select urlid, count(urlid) from log_raw group by urlid > order by count(urlid) desc limit 10; > urlid | count > --------+---------- > 129991 | 10843088 > 1 | 4953757 > 21 | 4345503 > 2765 | 4266981 > 12 | 3654127 > 920 | 3609054 > 1135 | 3562185 > 20 | 3495023 > 283567 | 3019675 > 2494 | 2655301 > (10 rows) > > cipafilter=# select hostid, count(hostid) from log_raw group by hostid > order by count(hostid) desc limit 10; > hostid | count > --------+----------- > 7 | 117342686 > 5 | 18016481 > 53 | 17408992 > 57 | 12947564 > 543 | 12698269 > 1 | 10068246 > 127544 | 8746204 > 27 | 8618595 > 40 | 8507278 > 36 | 7424412 > (10 rows) > > Fun fact, hostid 7 is 'google.com'. > > On Wed, Feb 15, 2017 at 6:11 AM, Karsten Hilbert > <Karsten.Hilbert@gmx.net> wrote: >> On Wed, Feb 15, 2017 at 01:04:51PM +0100, Karsten Hilbert wrote: >> >>> > Nope, that pops too. The query runs for a long time at a somewhat >>> > normal rate of ram consumption, using ~1G of RSS then suddenly spikes >>> > to about 6G, at which point the OOM killer pops it. Box has 8G of ram >>> > and 4G of swap. >>> >>> By any chance: >>> >>> - when it happens has the kernel considered using swap ? >>> >>> - which kernel are you running ? >>> >>> There's been (for some workloads) massive problems with RAM >>> exhaustion / swapping / OOM killer going wild with >>> 4.7/4.8/some 4.9 kernels. >> >> I guess what I'm trying to say is that it may actually not be >> PostgreSQL's fault but rather the kernel invoking the OOM >> killer way prematurely. >> >> Karsten >> -- >> GPG key ID E4071346 @ eu.pool.sks-keyservers.net >> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > > > -- > David Hinkle > > Senior Software Developer > > Phone: 800.243.3729x3000 > > Email: hinkle@cipafilter.com > > Hours: Mon-Fri 8:00AM-5:00PM (CT) -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
David Hinkle <hinkle@cipafilter.com> writes: > Tom, there are three columns in this table that exhibit the problem, > here is the statistics data after an analyze, and the real data to > compare it to. > attname | n_distinct | most_common_freqs > titleid | 292 | {0.767167} Ouch. That's saying there's some single value of titleid that accounts for more than three-quarters of the entries ... does that square with reality? That'd certainly explain why a hash join goes nuts. regards, tom lane
Yep, 420ish million records out of 540 million records have a titleid of 1. There are about 880,000 other unique values, but most of the records are 1. Of course, n_distinct is only 292. I'm surprised it's not eliminating the duplicates while it builds that hash table. This is what I'm doing for a work around right now. Getting n_distinct right seems to be preventing the system from breaking. It's going to be executed once a week during the weekly maintenance. It's setting the n_distinct of each column to the number of rows in the associated table. CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column varchar, _string_table varchar) RETURNS real AS $$ DECLARE _cnt REAL; BEGIN SELECT reltuples INTO _cnt from pg_class where relname = _string_table; EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column || ' SET (n_distinct=' || _cnt || ')'; RETURN _cnt; END $$ LANGUAGE plpgsql; select patch_ndistinct('log_raw', 'titleid', 'titles'); select patch_ndistinct('log_raw', 'urlid', 'urls'); select patch_ndistinct('log_raw', 'hostid', 'hosts'); ANALYZE log_raw; On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Hinkle <hinkle@cipafilter.com> writes: >> Tom, there are three columns in this table that exhibit the problem, >> here is the statistics data after an analyze, and the real data to >> compare it to. > >> attname | n_distinct | most_common_freqs > >> titleid | 292 | {0.767167} > > Ouch. That's saying there's some single value of titleid that accounts > for more than three-quarters of the entries ... does that square with > reality? That'd certainly explain why a hash join goes nuts. > > regards, tom lane -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
I guess this doesn't work, latest test run crashed. It still uses the bad plan for the hostid column even after n_distinct is updated. cipafilter=# select attname, n_distinct from pg_stats where tablename cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname = cipafilter(# 'hostid'); attname | n_distinct ---------+------------- urlid | 1.51625e+08 hostid | 304710 titleid | 886499 (3 rows) cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid ); QUERY PLAN ----------------------------------------------------------------------------------------- Delete on hosts (cost=22249475.67..74844813.47 rows=1 width=12) -> Hash Anti Join (cost=22249475.67..74844813.47 rows=1 width=12) Hash Cond: (hosts.hostid = log_raw.hostid) -> Seq Scan on hosts (cost=0.00..5017.10 rows=304710 width=10) -> Hash (cost=12799395.52..12799395.52 rows=543645052 width=10) -> Seq Scan on log_raw (cost=0.00..12799395.52 rows=543645052 width=10) (6 rows) I guess I will also try throwing in 'set enable_hashjoin = false;' and see if that gets these purges to go. On Thu, Feb 16, 2017 at 2:22 PM, David Hinkle <hinkle@cipafilter.com> wrote: > Yep, 420ish million records out of 540 million records have a titleid > of 1. There are about 880,000 other unique values, but most of the > records are 1. Of course, n_distinct is only 292. I'm surprised > it's not eliminating the duplicates while it builds that hash table. > > This is what I'm doing for a work around right now. Getting > n_distinct right seems to be preventing the system from breaking. > It's going to be executed once a week during the weekly maintenance. > It's setting the n_distinct of each column to the number of rows in > the associated table. > > CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column > varchar, _string_table varchar) > RETURNS real AS > $$ > DECLARE _cnt REAL; > BEGIN > SELECT reltuples INTO _cnt from pg_class where relname = _string_table; > EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column > || ' SET (n_distinct=' || _cnt || ')'; > RETURN _cnt; > END > $$ LANGUAGE plpgsql; > select patch_ndistinct('log_raw', 'titleid', 'titles'); > select patch_ndistinct('log_raw', 'urlid', 'urls'); > select patch_ndistinct('log_raw', 'hostid', 'hosts'); > ANALYZE log_raw; > > On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> David Hinkle <hinkle@cipafilter.com> writes: >>> Tom, there are three columns in this table that exhibit the problem, >>> here is the statistics data after an analyze, and the real data to >>> compare it to. >> >>> attname | n_distinct | most_common_freqs >> >>> titleid | 292 | {0.767167} >> >> Ouch. That's saying there's some single value of titleid that accounts >> for more than three-quarters of the entries ... does that square with >> reality? That'd certainly explain why a hash join goes nuts. >> >> regards, tom lane > > > > -- > David Hinkle > > Senior Software Developer > > Phone: 800.243.3729x3000 > > Email: hinkle@cipafilter.com > > Hours: Mon-Fri 8:00AM-5:00PM (CT) -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@cipafilter.com Hours: Mon-Fri 8:00AM-5:00PM (CT)
On Thu, Feb 16, 2017 at 3:27 PM, David Hinkle <hinkle@cipafilter.com> wrote:
I guess this doesn't work, latest test run crashed. It still uses the
bad plan for the hostid column even after n_distinct is updated.
cipafilter=# select attname, n_distinct from pg_stats where tablename
cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
or attname =
cipafilter(# 'hostid');
attname | n_distinct
---------+-------------
urlid | 1.51625e+08
hostid | 304710
titleid | 886499
(3 rows)
cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
QUERY PLAN
------------------------------------------------------------ -----------------------------
Delete on hosts (cost=22249475.67..74844813.47 rows=1 width=12)
-> Hash Anti Join (cost=22249475.67..74844813.47 rows=1 width=12)
Hash Cond: (hosts.hostid = log_raw.hostid)
-> Seq Scan on hosts (cost=0.00..5017.10 rows=304710 width=10)
-> Hash (cost=12799395.52..12799395.52 rows=543645052 width=10)
-> Seq Scan on log_raw (cost=0.00..12799395.52
rows=543645052 width=10)
(6 rows)
I guess I will also try throwing in 'set enable_hashjoin = false;' and
see if that gets these purges to go.
Another option would be to force the de-dup to happen, with:
explain with t as (select distinct hostid from log_raw) delete from hosts where not exists (select 1 from t where t.hostid=hosts.hostid)
That way you can use the hash join without running out of memory, in case the hash join is actually faster than the merge join. Also, it just seems cleaner than fiddling with enable_* parameters and then having to remember to reset them when done.
Cheers,
Jeff