Обсуждение: Why does the query planner use two full indexes, when a dedicated partial index exists?
Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
Dear All, I've just joined this list, and I'd like to request some advice. I have a table (1 GB in size) with 24 columns, and 5.6 million rows. Of these, we're interested in two columns, parcel_id_code, and exit_state. parcel_id_code has a fairly uniform distribution of integers from 1-99999, it's never null. exit_state has 3 possible values, 1,2 and null. Almost all the rows are 1, about 0.1% have the value 2, and only 153 rows are null The query I'm trying to optimise looks like this: SELECT * from tbl_tracker WHERE parcel_id_code='53030' AND exit_state IS NULL; So, I have a partial index: "tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE exit_state IS NULL which works fine if it's the only index. BUT, for other queries (unrelated to this question), I also have to have full indexes on these columns: "tbl_tracker_exit_state_idx" btree (exit_state) "tbl_tracker_parcel_id_code_idx" btree (parcel_id_code) The problem is, when I now run my query, the planner ignores the dedicated index "tbl_tracker_performance_1_idx", and instead uses both of the full indexes... resulting in a much much slower query (9ms vs 0.08ms). A psql session is below. This shows that, if I force the planner to use the partial index, by dropping the others, then it's fast. But as soon as I put the full indexes back (which I need for other queries), the query planner chooses them instead, and is slow. Thanks very much for your help, Richard fsc_log => \d tbl_tracker Column | Type | Modifiers ---------------------+--------------------------+------------------ id | bigint | not null default nextval('master_id_seq'::regclass) dreq_timestamp_1 | timestamp with time zone | barcode_1 | character varying(13) | barcode_2 | character varying(13) | barcode_best | character varying(13) | entrance_point | character varying(13) | induct | character varying(5) | entrance_state_x | integer | dreq_count | integer | parcel_id_code | integer | host_id_code | bigint | original_dest | integer | drep_timestamp_n | timestamp with time zone | actual_dest | integer | exit_state | integer | chute | integer | original_dest_state | integer | srep_timestamp | timestamp with time zone | asn | character varying(9) | is_asn_token | boolean | track_state | integer | warning | boolean | Indexes: "tbl_tracker_pkey" PRIMARY KEY, btree (id) CLUSTER "tbl_tracker_barcode_best_idx" btree (barcode_best) "tbl_tracker_chute_idx" btree (chute) "tbl_tracker_drep_timestamp_n_idx" btree (drep_timestamp_n) WHERE drep_timestamp_n IS NOT NULL "tbl_tracker_dreq_timestamp_1_idx" btree (dreq_timestamp_1) WHERE dreq_timestamp_1 IS NOT NULL "tbl_tracker_exit_state_idx" btree (exit_state) "tbl_tracker_parcel_id_code_idx" btree (parcel_id_code) "tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE exit_state IS NULL "tbl_tracker_performance_2_idx" btree (host_id_code, id) "tbl_tracker_performance_3_idx" btree (srep_timestamp) WHERE exit_state = 1 AND srep_timestamp IS NOT NULL "tbl_tracker_srep_timestamp_idx" btree (srep_timestamp) WHERE srep_timestamp IS NOT NULL fsc_log=> explain analyse select * from tbl_tracker where parcel_id_code='53030' AND exit_state IS NULL; QUERY PLAN ----------------------------------------------------------------------- Bitmap Heap Scan on tbl_tracker (cost=8.32..10.84 rows=1 width=174) (actual time=9.334..9.334 rows=0 loops=1) Recheck Cond: ((parcel_id_code = 53030) AND (exit_state IS NULL)) -> BitmapAnd (cost=8.32..8.32 rows=1 width=0) (actual time=9.329..9.329 rows=0 loops=1) -> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx (cost=0.00..3.67 rows=57 width=0) (actual time=0.026..0.026 rows=65 loops=1) Index Cond: (parcel_id_code = 53030) -> Bitmap Index Scan on tbl_tracker_exit_state_idx (cost=0.00..4.40 rows=150 width=0) (actual time=9.289..9.289 rows=93744 loops=1) Index Cond: (exit_state IS NULL) Total runtime: 9.366 ms (8 rows) fsc_log=> drop index tbl_tracker_exit_state_idx; DROP INDEX fsc_log=> explain analyse select * from tbl_tracker where parcel_id_code='53030' AND exit_state IS NULL; QUERY PLAN ---------------------------------------------------------------------------------------- Bitmap Heap Scan on tbl_tracker (cost=3.67..145.16 rows=1 width=174) (actual time=0.646..0.646 rows=0 loops=1) Recheck Cond: (parcel_id_code = 53030) Filter: (exit_state IS NULL) -> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx (cost=0.00..3.67 rows=57 width=0) (actual time=0.024..0.024 rows=65 loops=1) Index Cond: (parcel_id_code = 53030) Total runtime: 0.677 ms (6 rows) fsc_log=> drop index tbl_tracker_parcel_id_code_idx; DROP INDEX fsc_log=> explain analyse select * from tbl_tracker where parcel_id_code='53030' AND exit_state IS NULL; QUERY PLAN -------------------------------------------------------------------------- Index Scan using tbl_tracker_performance_1_idx on tbl_tracker (cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0 loops=1) Index Cond: (parcel_id_code = 53030) Total runtime: 0.080 ms (3 rows) Server hardware: 8 core, 2.5 GHz, 24 GB, SSD in RAID-1. Postgresql config (non-default): version | PostgreSQL 9.1.6 on x86_64 checkpoint_segments | 128 client_encoding | UTF8 commit_delay | 50000 commit_siblings | 5 default_statistics_target | 5000 effective_cache_size | 12000MB lc_collate | en_GB.UTF-8 lc_ctype | en_GB.UTF-8 log_line_prefix | %t log_min_duration_statement | 50 maintenance_work_mem | 2GB max_connections | 100 max_stack_depth | 4MB port | 5432 random_page_cost | 2.5 server_encoding | UTF8 shared_buffers | 6000MB ssl | on standard_conforming_strings | off temp_buffers | 128MB TimeZone | GB wal_buffers | 16MB work_mem | 256MB
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Sergey Konoplev
Дата:
Hi, On Wed, Dec 19, 2012 at 1:13 PM, Richard Neill <rn214@richardneill.org> wrote: > Index Scan using tbl_tracker_performance_1_idx on tbl_tracker > (cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0 > loops=1) > Index Cond: (parcel_id_code = 53030) It looks like your index is bloated. Have you had a lot of updates/deletes on rows with exit_state is null? Try to reindex tbl_tracker_performance_1_idx. To reindex it without locks create a new index with temporary name concurrently, delete the old one and rename the new one using the old name. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
On 19/12/12 22:59, Sergey Konoplev wrote: > On Wed, Dec 19, 2012 at 1:13 PM, Richard Neill <rn214@richardneill.org> wrote: >> Index Scan using tbl_tracker_performance_1_idx on tbl_tracker >> (cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0 >> loops=1) >> Index Cond: (parcel_id_code = 53030) > > It looks like your index is bloated. Have you had a lot of > updates/deletes on rows with exit_state is null? > > Try to reindex tbl_tracker_performance_1_idx. > > To reindex it without locks create a new index with temporary name > concurrently, delete the old one and rename the new one using the old > name. > Hi Sergey, Thanks for your suggestion. Yes, I can see what you mean: over the 3 weeks during which we deployed the system, every single row has at one point had the exit_state as null, before being updated. Essentially, as time moves on, new rows are added, initially with exit_state null, then a few minutes later we update them to exit_state 1, then a few weeks later, they are removed. [Explanation: the system tracks books around a physical sortation machine; the sorter uses a "parcel_id_code" which (for some really daft reason suffers wraparound at 99999, i.e. about every 3 hours), books whose exit_state is null are those which are still on the sortation machine; once they exit, the state is either 1 (successful delivery) or 2 (collision, and down the dump chute).] BUT.... * The reindex solution doesn't work. I just tried it, and the query planner is still using the wrong indexes. * If the tbl_tracker_performance_1_idx had indeed become bloated, wouldn't that have meant that when the query planner was forced to use it (by deleting the alternative indexes), it would have been slow? Also, I thought that reindex wasn't supposed to be needed in normal operation. Best wishes, Richard
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Sergey Konoplev
Дата:
On Wed, Dec 19, 2012 at 3:49 PM, Richard Neill <rn214@richardneill.org> wrote: > * The reindex solution doesn't work. I just tried it, and the query planner > is still using the wrong indexes. Can you show the explain analyze with tbl_tracker_performance_1_idx straight after reindex (eg. before it has been bloated again)? > * If the tbl_tracker_performance_1_idx had indeed become bloated, wouldn't > that have meant that when the query planner was forced to use it (by > deleting the alternative indexes), it would have been slow? It is hard to say. There might be a bloating threshold after with it will be slow. Also it depends on the index column values. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
Thanks for your help, On 20/12/12 00:08, Sergey Konoplev wrote: > On Wed, Dec 19, 2012 at 3:49 PM, Richard Neill <rn214@richardneill.org> wrote: >> * The reindex solution doesn't work. I just tried it, and the query planner >> is still using the wrong indexes. > > Can you show the explain analyze with tbl_tracker_performance_1_idx > straight after reindex (eg. before it has been bloated again)? Sure. Just done it now... the system has been fairly lightly loaded for the last few hours - though I did have to change the specific number of the parcel_id_code in the query. fsc_log=> explain analyse select * from tbl_tracker where parcel_id_code=92223 and exit_state is null; QUERY PLAN ----------------------------------------------------------- Index Scan using tbl_tracker_exit_state_idx on tbl_tracker (cost=0.00..6.34 rows=1 width=174) (actual time=0.321..1.871 rows=1 loops=1) Index Cond: (exit_state IS NULL) Filter: (parcel_id_code = 92223) Total runtime: 1.905 ms (4 rows) And now, force it, by dropping the other index (temporarily): fsc_log=> drop index tbl_tracker_exit_state_idx; DROP INDEX fsc_log=> explain analyse select * from tbl_tracker where parcel_id_code=92223 and exit_state is null; QUERY PLAN --------------------------------------------------------------------- Index Scan using tbl_tracker_performance_1_idx on tbl_tracker (cost=0.00..7.78 rows=1 width=174) (actual time=0.040..0.041 rows=1 loops=1) Index Cond: (parcel_id_code = 92223) Total runtime: 0.077 ms (3 rows) As far as I can tell, the query planner really is just getting it wrong. BTW, there is a significant effect on speed caused by running the same query twice (it pulls stuff from disk into the OS disk-cache), but I've already accounted for this. Richard
Richard Neill <rn214@richardneill.org> writes: > The problem is, when I now run my query, the planner ignores the > dedicated index "tbl_tracker_performance_1_idx", and instead uses both > of the full indexes... resulting in a much much slower query (9ms vs > 0.08ms). > A psql session is below. This shows that, if I force the planner to use > the partial index, by dropping the others, then it's fast. But as soon > as I put the full indexes back (which I need for other queries), the > query planner chooses them instead, and is slow. [ experiments with a similar test case ... ] I think the reason why the planner is overestimating the cost of using the partial index is that 9.1 and earlier fail to account for the partial-index predicate when estimating the number of index rows that will be visited. Because the partial-index predicate is so highly selective in this case, that results in a significant overestimate of how much of the index will be traversed. We fixed this for 9.2 in http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=21a39de5809cd3050a37d2554323cc1d0cbeed9d but did not want to risk back-patching such a behavioral change. If you're stuck on 9.1 you might want to think about applying that as a local patch though. (BTW, the "fudge factor" change in that patch has been criticized recently; we've changed it again already for 9.3 and might choose to back-patch that into 9.2.3. But it's the rest of it that you care about anyway.) regards, tom lane
On Wednesday, December 19, 2012, Richard Neill wrote:
Thanks for your help,
On 20/12/12 00:08, Sergey Konoplev wrote:On Wed, Dec 19, 2012 at 3:49 PM, Richard Neill <rn214@richardneill.org> wrote:* The reindex solution doesn't work. I just tried it, and the query planner
is still using the wrong indexes.
It switched to a better one of the wrong indices, though, and got several times faster.
How did it get so bloated in the first place? Is the table being updated so rapidly that the statistics might be wrong even immediately after analyze finishes?
In any case, I can't get it to prefer the full index in 9.1.6 at all. The partial index wins hands down unless the table is physically clustered by the parcel_id_code column. In which that case, the partial index wins by only a little bit.
This is what I did for the table:
create table tbl_tracker as select case when random()<0.001 then 2 else case when random()< 0.00003 then NULL else 1 end end as exit_state, (random()*99999)::int as parcel_id_code from generate_series(1,5000000) ;
Cheers,
Jeff
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
Dear Tom, Thanks very much for your advice. >> A psql session is below. This shows that, if I force the planner to use >> the partial index, by dropping the others, then it's fast. But as soon >> as I put the full indexes back (which I need for other queries), the >> query planner chooses them instead, and is slow. > > [ experiments with a similar test case ... ] I think the reason why the > planner is overestimating the cost of using the partial index is that > 9.1 and earlier fail to account for the partial-index predicate when > estimating the number of index rows that will be visited. Because the > partial-index predicate is so highly selective in this case, that > results in a significant overestimate of how much of the index will be > traversed. I think that seems likely to me. I'll try out 9.2 and see if it helps. As it's a production server, I have to wait for some downtime, probably Friday night before I can find out - will report back. Best wishes, Richard
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
Dear Jeff, Thanks for your help, > * The reindex solution doesn't work. I just tried it, and > the query planner > is still using the wrong indexes. > > > It switched to a better one of the wrong indices, though, and got > several times faster. > I think that this is a red herring. The switching between the two "wrong" indices seems to be caused by non-uniformity in the parcel_id_code: although it's distributed fairly well across 1-99999, it's not perfect. As for the speed-up, I think that's mostly caused by the fact that running "Analyse" is pulling the entire table (and the relevant index) into RAM and flushing other things out of that cache. > How did it get so bloated in the first place? Is the table being > updated so rapidly that the statistics might be wrong even immediately > after analyze finishes? I don't think it is. We're doing about 10 inserts and 20 updates per second on that table. But when I tested it, production had stopped for the night - so the system was quiescent between the analyse and the select. > In any case, I can't get it to prefer the full index in 9.1.6 at all. > The partial index wins hands down unless the table is physically > clustered by the parcel_id_code column. In which that case, the partial > index wins by only a little bit. Interesting that you should say that... the original setup script did choose to cluster the table on that column. Also, I wonder whether it matters which order the indexes are created in? Best wishes, Richard
Richard Neill <rn214@richardneill.org> writes: > Also, I wonder whether it matters which order the indexes are created in? IIRC, if the estimated costs of using two different indexes come out the same (to within 1% or so), then the planner keeps the first-generated path, which will result in preferring the index with smaller OID. This effect doesn't apply to your problem query though, since we can see from the drop-experiments that the estimated costs are quite a bit different. A more likely explanation if you see some effect that looks like order dependency is that the more recently created index has accumulated less bloat, and thus has a perfectly justifiable cost advantage. regards, tom lane
Jeff Janes <jeff.janes@gmail.com> writes: > In any case, I can't get it to prefer the full index in 9.1.6 at all. The > partial index wins hands down unless the table is physically clustered by > the parcel_id_code column. In which that case, the partial index wins by > only a little bit. > This is what I did for the table: > create table tbl_tracker as select case when random()<0.001 then 2 else > case when random()< 0.00003 then NULL else 1 end end as exit_state, > (random()*99999)::int as parcel_id_code from generate_series(1,5000000) ; What I did to try to duplicate Richard's situation was to create a test table in which all the exit_state values were NULL, then build the index, then UPDATE all but a small random fraction of the rows to 1, then vacuum. This results in a rather bloated partial index, but I think that's probably what he's got given that every record initially enters the table with NULL exit_state. It would take extremely frequent vacuuming to keep the partial index from accumulating a lot of dead entries. In this scenario, with 9.1, I got overly large estimates for the cost of using the partial index; which matches up with his reports. regards, tom lane
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
Dear Tom, Thanks againg for your help on this. On 20/12/12 03:06, Tom Lane wrote: > Richard Neill <rn214@richardneill.org> writes: >> The problem is, when I now run my query, the planner ignores the >> dedicated index "tbl_tracker_performance_1_idx", and instead uses both >> of the full indexes... resulting in a much much slower query (9ms vs >> 0.08ms). > I've now installed 9.2. As you said, thanks to the change in 9.2 it initially prefers the partial index. BUT, after 1 cycle of inserting 500k rows, then deleting them all, then starting to insert again, I find that the planner has reverted to the former bad behaviour. Reindexing only takes a couple of seconds, and restores correctness. What's going on? Do I need to run reindex in a cron-job? I thought that reindex wasn't "normally" needed, and that index bloat happened only after every row had changed value hundreds of times. Thanks, Richard --------------------- Here's the same session again. [Please ignore the dreq_1_timestamp check - I mistakenly failed to simplify it out of the query, and now that I reindexed, I can't redo the experiment. I don't think it makes any difference.] fsc_log=> explain analyse select * from tbl_tracker WHERE parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 > timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours'); QUERY PLAN --------------------------------------------------------------- Bitmap Heap Scan on tbl_tracker (cost=17.35..19.86 rows=1 width=174) (actual time=8.056..8.056 rows=0 loops=1) Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 90820)) Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp without time zone) -> BitmapAnd (cost=17.35..17.35 rows=1 width=0) (actual time=8.053..8.053 rows=0 loops=1) -> Bitmap Index Scan on tbl_tracker_exit_state_idx (cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277 loops=1) Index Cond: (exit_state IS NULL) -> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx (cost=0.00..8.73 rows=58 width=0) (actual time=0.025..0.025 rows=72 loops=1) Index Cond: (parcel_id_code = 90820) Total runtime: 8.090 ms (9 rows) fsc_log=> REINDEX index tbl_tracker_performance_1_idx; #This only took a couple of seconds to do. fsc_log=> explain analyse select * from tbl_tracker WHERE parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 > timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours'); QUERY PLAN --------------------------------------------------------------- Index Scan using tbl_tracker_performance_1_idx on tbl_tracker (cost=0.00..5.27 rows=1 width=174) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (parcel_id_code = 90820) Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp without time zone) Total runtime: 0.047 ms (4 rows)
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
On 21/12/12 02:34, Richard Neill wrote: > > Reindexing only takes a couple of seconds, and restores correctness. > Interestingly, the partial index (after reindexing) is only 16kB in size; whereas the table is 1.1 GB, and the normal single-column indexes are about 250MB in size. In terms of what's physically happening in reality, - tbl_tracker keeps a record of all books that move through the system over a period of one month (at a rate of about 20/second, or 1 million/day), after which they are deleted. - the partial index, tbl_tracker_performance_1_idx tracks only those books which are currently "in flight" - books remain in flight for about 200 seconds as they go round the machine. (While in flight, these have exit_state = NULL) - the partial index is used to overcome a design defect(*) in the sorter machine, namely that it doesn't number each book uniquely, but wraps the parcel_id_code every few hours. Worse, some books can remain on the sorter for several days (if they jam), so the numbering isn't a clean "wraparound", but more like a fragmented (and occasionally lossy) filesystem. - What I'm trying to do is trace the history of the books through the system and assign each one a proper unique id. So, if I see a book with "parcel_id_code = 37", is it a new book (after pid wrap), or is it the same book I saw 1 minute ago, that hasn't exited the sorter? So... is there some way to, for example, set a trigger that will reindex every time the index exceeds 1000 rows? Richard (*)Readers of The Daily WTF might appreciate another curious anomaly: this machine originally had an RS-232 port; it now uses ethernet, but TxD and RxD use different TCP sockets on different network ports!
On Thursday, December 20, 2012, Tom Lane wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> In any case, I can't get it to prefer the full index in 9.1.6 at all. The
> partial index wins hands down unless the table is physically clustered by
> the parcel_id_code column. In which that case, the partial index wins by
> only a little bit.
> This is what I did for the table:
> create table tbl_tracker as select case when random()<0.001 then 2 else
> case when random()< 0.00003 then NULL else 1 end end as exit_state,
> (random()*99999)::int as parcel_id_code from generate_series(1,5000000) ;
What I did to try to duplicate Richard's situation was to create a test
table in which all the exit_state values were NULL, then build the
index, then UPDATE all but a small random fraction of the rows to 1,
then vacuum. This results in a rather bloated partial index, but I
think that's probably what he's got given that every record initially
enters the table with NULL exit_state. It would take extremely frequent
vacuuming to keep the partial index from accumulating a lot of dead
entries.
I played with this scenario too, but still the only way I could get it to spurn the partial index is if I rebuild the full one (to remove the bloat) but not the partial one. But still, the cost were always in the 8 to 11 range for either index with default cost settings. It is hard to imagine the amount of bloat needed to drive it up to 5000, like in his initial report before he rebuilt it.
Cheers,
Jeff
On Thursday, December 20, 2012, Richard Neill wrote:
Dear Tom,
Thanks againg for your help on this.
On 20/12/12 03:06, Tom Lane wrote:Richard Neill <rn214@richardneill.org> writes:The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).
I've now installed 9.2. As you said, thanks to the change in 9.2 it initially prefers the partial index.
BUT, after 1 cycle of inserting 500k rows, then deleting them all, then starting to insert again, I find that the planner has reverted to the former bad behaviour.
Presumably the real work load has this type of turn over happen one row at a time, rather than all in one giant mass update transaction, right? That makes a big difference in the way space is re-used.
Reindexing only takes a couple of seconds, and restores correctness.
Even your slow query is pretty fast. If you can't afford that, can you afford to take an exclusive lock for a couple of seconds every few minutes?
What's going on? Do I need to run reindex in a cron-job? I thought that reindex wasn't "normally" needed, and that index bloat happened only after every row had changed value hundreds of times.
The partial index is highly leveraged. If every tuple in the table is updated once, that amounts to every tuple in the index being updated 25,000 times.
For the same reason, it is probably not getting vacuum often enough. The default settings have the table vacuumed once 20% of its rows turns over, but that means the partial index has been turned over many many times. You could crank down the auto-vacuum settings for that table, or run manual vacuum with a cron job.
Vacuum will not unbloat the index, but if you run it often enough it will keep the bloat from getting to bad in the first place.
But what I think I'd do is change one of your full indexes to contain the other column as well, and get rid of the partial index. It might not be quite as efficient as the partial index might theoretically be, but it should be pretty good and also be less fragile.
-> Bitmap Index Scan on tbl_tracker_exit_state_idx (cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277 loops=1)
This is finding 100 times more rows than it thinks it will. If that could be fixed, surely this plan would not look as good. But then, it would probably just switch to another plan that is not the one you want, either.
Cheers,
Jeff
On Thursday, December 20, 2012, Richard Neill wrote:
- What I'm trying to do is trace the history of the books
through the system and assign each one a proper unique id.
So, if I see a book with "parcel_id_code = 37",
is it a new book (after pid wrap), or is it the same book I saw 1
minute ago, that hasn't exited the sorter?
I'm not sure how you are implementing this goal, but I don't think it is best done by looping over all books (presumably from some other table?) and issuing an individual query for each one, if that is what you are doing. Some kind of bulk join would probably be more efficient.
Cheers,
Jeff
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
> I've now installed 9.2. As you said, thanks to the change in 9.2 it > initially prefers the partial index. > > BUT, after 1 cycle of inserting 500k rows, then deleting them all, > then starting to insert again, I find that the planner has reverted > to the former bad behaviour. > > > Presumably the real work load has this type of turn over happen one row > at a time, rather than all in one giant mass update transaction, right? > That makes a big difference in the way space is re-used. Sorry - I meant a "real" workload here. I replayed a whole day's worth of real data into the DB, and that's what I meant by a cycle. Everything was row-at-a-time. (It currently takes about an hour to do this) > > Reindexing only takes a couple of seconds, and restores correctness. > > > Even your slow query is pretty fast. If you can't afford that, can you > afford to take an exclusive lock for a couple of seconds every few minutes? Yes, I can. If that's the root cause, I'll do that. But it seems to me that I've stumbled upon some rather awkward behaviour that I need to understand fully, and if the index is bloating that badly and that quickly, then perhaps it's a PG bug (or at least cause for a logfile warning). BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today. It still only has 252 matching rows. > What's going on? Do I need to run reindex in a cron-job? I thought > that reindex wasn't "normally" needed, and that index bloat happened > only after every row had changed value hundreds of times. > > > The partial index is highly leveraged. If every tuple in the table is > updated once, that amounts to every tuple in the index being updated > 25,000 times. How so? That sounds like O(n_2) behaviour. > > For the same reason, it is probably not getting vacuum often enough. > The default settings have the table vacuumed once 20% of its rows > turns over, but that means the partial index has been turned over many > many times. You could crank down the auto-vacuum settings for that > table, or run manual vacuum with a cron job. > > Vacuum will not unbloat the index, but if you run it often enough it > will keep the bloat from getting too bad in the first place. Thanks. I've reduced autovacuum_vacuum_scale_factor from 0.2 to 0.05 (and set autovacuum_analyze_scale_factor = 0.05 for good measure) As I understand it, both of these can run in parallel, and I have 7 cores usually idle, while the other is maxed out. > But what I think I'd do is change one of your full indexes to contain > the other column as well, and get rid of the partial index. It might > not be quite as efficient as the partial index might theoretically be, > but it should be pretty good and also be less fragile. I'll try that. Thanks, Richard
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
On 21/12/12 05:15, Jeff Janes wrote: > > > - What I'm trying to do is trace the history of the books > through the system and assign each one a proper unique id. > So, if I see a book with "parcel_id_code = 37", > is it a new book (after pid wrap), or is it the same book I saw 1 > minute ago, that hasn't exited the sorter? > > I'm not sure how you are implementing this goal, but I don't think it is > best done by looping over all books (presumably from some other table?) > and issuing an individual query for each one, if that is what you are > doing. Some kind of bulk join would probably be more efficient. It would be nice to do a bulk join, but it's not possible: the query is time sensitive. Consider: id/pkey pid timestamp exit_state destination 1 77 -24 hours 1 212 2 77 -18 hours 1 213 3 77 -12 hours 1 45 4 77 -6 hours 1 443 5 77 0 hours null [in future...] 5 77 0 hours 1 92 6 77 4 hours null At time +5 minutes, I receive a report that a book with parcel_id 77 has successfully been delivered to destination 92. So, what I have to do is: * First, find the id of the most recent book which had pid=77 and where the exit state is null. (hopefully, but not always, this yields exactly one row, which in this case is id=5) * Then update the table to set the destination to 92, where the id=5. It's a rather cursed query, because: - the sorter machine doesn't give me full info in each message, only deltas, and I have to reconstruct the global state. - pids are reused within hours, but don't increase monotonically, (more like drawing repeatedly from a shuffled deck, where cards are only returned to the deck sporadically. - some pids get double-reported - 1% of books fall off the machine, or get stuck on it. - occasionally, messages are lost. - the sorter state isn't self-consistent (it can be restarted) The tracker table is my attempt to consistently combine all the state we know, and merge in the deltas as we receive messages from the sorter machine. It ends up reflecting reality about 99% of the time. Richard
Dear All, I think periodic reindex may be the solution. Even after reducing the autovacuum fraction to 0.05, the index still seems to bloat. After another couple of days runtime, the index is using 11MB, and I get a query that takes 2.448ms. Then I reindex (takes about 3 sec), and the index falls to 16kB, and the query takes 0.035ms. So... problem solved for me: I just have to reindex every few hours. BUT, this suggests a few remaining things: 1. The documentation still suggests that reindex should not be needed in "normal" operation... is this true? Or are the docs wrong? Or have I got such an edge case? Does this suggest that an auto-reindexer would be a useful feature? 2. Is there any way to force the planner to use (or ignore) a specific index, for testing purposes, short of actually dropping the index? This would be very useful for debugging, especially given that query plans can only really be fully tested on production systems, and that dropping indexes is rather a bad thing to do when live operation is simultaneously happening on that server! Thanks again for your help. Best wishes, Richard fsc_log=> explain analyse select * from tbl_tracker WHERE parcel_id_code='32453' AND exit_state IS NULL; QUERY PLAN ------------------------------------------------------ Bitmap Heap Scan on tbl_tracker (cost=20.81..23.32 rows=1 width=174) (actual time=2.408..2.408 rows=1 loops=1) Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 32453)) -> BitmapAnd (cost=20.81..20.81 rows=1 width=0) (actual time=2.403..2.403 rows=0 loops=1) -> Bitmap Index Scan on tbl_tracker_exit_state_idx (cost=0.00..9.25 rows=132 width=0) (actual time=2.378..2.378 rows=5 loops=1) Index Cond: (exit_state IS NULL) -> Bitmap Index Scan on tbl_tracker_parcel_id_code_idx (cost=0.00..11.30 rows=62 width=0) (actual time=0.022..0.022 rows=65 loops=1) Index Cond: (parcel_id_code = 32453) Total runtime: 2.448 ms fsc_log => REINDEX; fsc_log=> explain analyse select * from tbl_tracker WHERE parcel_id_code='32453' AND exit_state IS NULL; QUERY PLAN ------------------------------------------------- Index Scan using tbl_tracker_performance_1_idx on tbl_tracker (cost=0.00..5.27 rows=1 width=174) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (parcel_id_code = 32453) Total runtime: 0.035 ms (3 rows)
On Mon, Dec 24, 2012 at 06:37:11PM +0000, Richard Neill wrote: > [...] > So... problem solved for me: I just have to reindex every few hours. > BUT, this suggests a few remaining things: > [...] > 2. Is there any way to force the planner to use (or ignore) a > specific index, for testing purposes, short of actually dropping the > index? > This would be very useful for debugging, especially given that query > plans can only really be fully tested on production systems, and > that dropping indexes is rather a bad thing to do when live > operation is simultaneously happening on that server! I believe that: BEGIN; drop index .... explain analyze ... explain analyze ... ROLLBACK; will do what you want. IIUC Postgres allows DDL within transactions and thus be rolled back and the operations within the transaction aren't visible to your other queries running outside the transaction. http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis and http://www.postgresql.org/docs/9.2/static/sql-dropindex.html -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111
On Saturday, December 22, 2012, Richard Neill wrote:
The partial index is highly leveraged. If every tuple in the table is
updated once, that amounts to every tuple in the index being updated
25,000 times.
How so? That sounds like O(n_2) behaviour.
If the table has 5 million rows while the index has 200 (active) rows at any given time, then to update every row in the table to null and back again would be 100% turn over of the table. But each such change would lead to an addition and then a deletion from the index. So 100% turnover of the table would be a 5 million / 200 = 25,000 fold turn of the index.
There is some code that allows a btree index entry to get killed (and so the slot to be reused) without any vacuum, if a scan follows that entry and finds the corresponding tuple in the table no longer visible to anyone. I have not examined this code, and don't know whether it is doing its job but just isn't enough to prevent the bloat, or if for some reason it is not applicable to your situation.
Cheers,
Jeff
On Monday, December 24, 2012, John Rouillard wrote:
On Mon, Dec 24, 2012 at 06:37:11PM +0000, Richard Neill wrote:
> [...]
> So... problem solved for me: I just have to reindex every few hours.
> BUT, this suggests a few remaining things:
> [...]
> 2. Is there any way to force the planner to use (or ignore) a
> specific index, for testing purposes, short of actually dropping the
> index?
> This would be very useful for debugging, especially given that query
> plans can only really be fully tested on production systems, and
> that dropping indexes is rather a bad thing to do when live
> operation is simultaneously happening on that server!
I believe that:
BEGIN;
drop index ....
explain analyze ...
explain analyze ...
ROLLBACK;
There are two cautions here. One is that doing the drop index takes an access exclusive lock on the table, and so brings all other connections to a screeching halt. That is not much nicer to do on a production system than actually dropping the index, so don't dilly-dally around before doing the rollback. rollback first, then ruminate on the results of the explain.
Also, this will forcibly cancel any autovacuums occurring on the table. I think one of the reasons he needs to reindex so much is that he is already desperately short of vacuuming behavior.
Cheers,
Jeff
On Monday, December 24, 2012, Richard Neill wrote:
Dear All,
I think periodic reindex may be the solution. Even after reducing the autovacuum fraction to 0.05, the index still seems to bloat.
Since your index is so tiny compared to the table, I'd lower it even more. 0.0001, maybe.
However, vacuums can't overlap a table, so you can't get the table to be vacuumed more often than the time it takes to run the vacuum (which took around 4 minutes at default vacuum cost settings in a toy system) and so you may need to lower the autovacuum_vacuum_cost_delay to 0 for this table. (I suspect it is all in cache anyway, so the default settings are too pessimistic.) Finally, you might need to lower autovacuum_naptime, because the usually table also won't be vacuumed any more often than that.
1. The documentation still suggests that reindex should not be needed in "normal" operation... is this true? Or are the docs wrong? Or have I got such an edge case?
Your case seems pretty far out there to me.
Cheers,
Jeff
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
> The partial index is highly leveraged. If every tuple in the > table is updated once, that amounts to every tuple in the index > being updated 25,000 times. > > How so? That sounds like O(n_2) behaviour. > > If the table has 5 million rows while the index has 200 (active) rows at > any given time, then to update every row in the table to null and back > again would be 100% turn over of the table. But each such change would > lead to an addition and then a deletion from the index. So 100% > turnover of the table would be a 5 million / 200 = 25,000 fold turn of > the index. Sorry, I was being dense. I misread that as: "every time a single tuple in the table is updated, the entire index (every row) is updated". Yes, of course your explanation makes sense. > > There is some code that allows a btree index entry to get killed (and so > the slot to be reused) without any vacuum, if a scan follows that entry > and finds the corresponding tuple in the table no longer visible to > anyone. I have not examined this code, and don't know whether it is > doing its job but just isn't enough to prevent the bloat, or if for some > reason it is not applicable to your situation. > It looks like my solution is going to be a REINDEX invoked from cron, or maybe just every 100k inserts. In terms of trying to improve this behaviour for other PG users in the future, are there any more diagnostics I can do for you? Having found a special case, I'd like to help permanently resolve it if I can. Thanks very much again. Best wishes, Richard
On Thursday, December 20, 2012, Jeff Janes wrote:
On Thursday, December 20, 2012, Richard Neill wrote:
-> Bitmap Index Scan on tbl_tracker_exit_state_idx (cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277 loops=1)This is finding 100 times more rows than it thinks it will. If that could be fixed, surely this plan would not look as good. But then, it would probably just switch to another plan that is not the one you want, either.
I guess the issue here is that the histogram postgres uses to estimate the number of rows that will be found is based on visible rows, and it is correctly estimating the number of visible rows that will be found. And that is the relevant thing to pass up to a higher join for its estimation. But for estimating the number of blocks a given index scan will access, the right thing would be the number of tuples visited, not the number of them found to be visible. So that is where this plan goes systematically wrong.
I guess the correct thing would be for postgres to keep two histograms, one of all tuples and one of all visible tuples, and to produce different selectivity estimates for different purposes. But I don't see that change getting made. It is only meaningful in cases where there is a fundamental skew in distribution between visible tuples and invisible-but-as-yet-unvacuumed tuples.
I think that that fundamental skew is the source of both the underestimation of the bitmap scan cost, and overestimation of the partial index scan (although I can't get it to overestimate that be anywhere near the amount you were seeing).
I still think your best bet is to get rid of the partial index and trade the full one on (parcel_id_code) for one on (parcel_id_code,exit_state). I think that will be much less fragile than reindexing in a cron job.
Cheers,
Jeff
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
On 27/12/12 16:17, Jeff Janes wrote: > > I still think your best bet is to get rid of the partial index and trade > the full one on (parcel_id_code) for one on (parcel_id_code,exit_state). > I think that will be much less fragile than reindexing in a cron job. > So, at the moment, I have 3 indexes: full: parcel_id_code full: exit_state full: parcel_id_code where exit state is null Am I right that when you suggest just a single, joint index (parcel_id_code,exit_state) instead of all 3 of the others, it will allow me to optimally run all of the following? : 1. SELECT * from tbl_tracker WHERE parcel_id_code=22345 AND exit_state IS NULL (this is the one we've been discussing) 2. SELECT * from tbl_tracker where parcel_id_code=44533 3. SELECT * from tbl_tracker where exit_code = 2 (2 and 3 are examples of queries I need to run for other purposes, unrelated to this thread, but which use the other indexes.). Thanks, Richard
Richard Neill <rn214@richardneill.org> writes: > So, at the moment, I have 3 indexes: > full: parcel_id_code > full: exit_state > full: parcel_id_code where exit state is null > Am I right that when you suggest just a single, joint index > (parcel_id_code,exit_state) > instead of all 3 of the others, I think he was just recommending replacing the first and third indexes. > it will allow me to optimally run all of the following? : > 1. SELECT * from tbl_tracker WHERE parcel_id_code=22345 AND exit_state > IS NULL > 2. SELECT * from tbl_tracker where parcel_id_code=44533 > 3. SELECT * from tbl_tracker where exit_code = 2 You will need an index with exit_state as the first column to make #3 perform well --- at least, assuming that an index is going to help at all anyway. The rule of thumb is that if a query is going to fetch more than a few percent of a table, an index is not useful because it's going to be touching most table pages anyway, so a seqscan will win. I've forgotten now what you said the stats for exit_code values other than null were, but it's quite possible that an index is useless for #3. These considerations are mostly covered in the manual: http://www.postgresql.org/docs/9.2/static/indexes.html regards, tom lane
On Thursday, December 27, 2012, Richard Neill wrote:
On 27/12/12 16:17, Jeff Janes wrote:
I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state).
I think that will be much less fragile than reindexing in a cron job.
So, at the moment, I have 3 indexes:
full: parcel_id_code
full: exit_state
full: parcel_id_code where exit state is null
Am I right that when you suggest just a single, joint index
(parcel_id_code,exit_state)
instead of all 3 of the others,
No, just instead of 1 and 3. You still need an index on (exit_state) in order to efficiently satisfy query 3 below.
Alternative, you could keep index 1, and replace 2 and 3 with one on (exit_state, parcel_id_code). And in fact this might be the better way to go, because a big problem you are facing is that the (exit_state) index is looking falsely attractive, and the easiest way to overcome that is to get rid of that index and replace it with one that can do everything that it can do, but more.
Theoretically there is technique called "loose scan" or "skip scan" which could allow you to make one index, (exit_state, parcel_id_code) to replace all 3 of the above, but postgres does not yet implement that technique. I think there is a way to achieve the same thing using recursive sql. But I doubt it would be worth it, as too much index maintenance is not your root problem.
3. SELECT * from tbl_tracker where exit_code = 2
Cheers,
Jeff
Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
От
Richard Neill
Дата:
> The partial index is highly leveraged. If every tuple in the > table is updated once, that amounts to every tuple in the index > being updated 25,000 times. > > How so? That sounds like O(n_2) behaviour. > > If the table has 5 million rows while the index has 200 (active) rows at > any given time, then to update every row in the table to null and back > again would be 100% turn over of the table. But each such change would > lead to an addition and then a deletion from the index. So 100% > turnover of the table would be a 5 million / 200 = 25,000 fold turn of > the index. Sorry, I was being dense. I misread that as: "every time a single tuple in the table is updated, the entire index (every row) is updated". Yes, of course your explanation makes sense. > > There is some code that allows a btree index entry to get killed (and so > the slot to be reused) without any vacuum, if a scan follows that entry > and finds the corresponding tuple in the table no longer visible to > anyone. I have not examined this code, and don't know whether it is > doing its job but just isn't enough to prevent the bloat, or if for some > reason it is not applicable to your situation. > It looks like my solution is going to be a REINDEX invoked from cron, or maybe just every 100k inserts. In terms of trying to improve this behaviour for other PG users in the future, are there any more diagnostics I can do for you? Having found a special case, I'd like to help permanently resolve it if I can. Thanks very much again. Best wishes, Richard
On Thursday, December 20, 2012, Jeff Janes wrote:
On Thursday, December 20, 2012, Tom Lane wrote:
What I did to try to duplicate Richard's situation was to create a test
table in which all the exit_state values were NULL, then build the
index, then UPDATE all but a small random fraction of the rows to 1,
then vacuum. This results in a rather bloated partial index, but I
think that's probably what he's got given that every record initially
enters the table with NULL exit_state. It would take extremely frequent
vacuuming to keep the partial index from accumulating a lot of dead
entries.
Once I cranked up default_statistics_target, I could start reproducing the very high estimates (5000) for the partial index in 9.1.
As you say, switching to 9.2 or above lowers it quite a bit, I still get some pretty high estimates, ~100 when 8 would be more accurate.
The problem is in genericcostestimate
if (index->pages > 1 && index->tuples > 1)
numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);
The index->pages should probably not include index pages which are empty. Even with aggressive vacuuming, most of the pages in the partial index seem to be empty at any given time.
However, I don't know if that number is exposed readily. And it seems to be updated too slowly to be useful, if pg_freespace is to be believed.
But I wonder if it couldn't be clamped to so that we there can be no more pages than there are tuples.
numIndexPages = ceil(numIndexTuples * Min(1,index->pages / index->tuples));
Cheers,
Jeff