Обсуждение: multi column query
Hi
When I update a table that has 20 columns and the where clause includes 16 of the columns (this is a data warehousing type update on aggregate fields),
The bitmap scan is not used by the optimizer. The table is indexed on 3 of the 20 fields. The update takes really long to finish (on a 6 million row table)
Do I need to do some “magic” with configuration to turn on bitmap scans.
On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote: > Hi > > When I update a table that has 20 columns and the where clause includes > 16 of the columns (this is a data warehousing type update on aggregate > fields), > > The bitmap scan is not used by the optimizer. The table is indexed on 3 > of the 20 fields. The update takes really long to finish (on a 6 million > row table) > > Do I need to do some "magic" with configuration to turn on bitmap scans. No. What's explain analyze of the query show? What's it doing now? Seqscan? You might try set enable_seqscan=off and see what that does. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to? And the output of \d chkpfw_tr_dy_dimension. The
costfor that index scan looks way too high.
And please reply-all so that the list is included.
> -----Original Message-----
> From: Sriram Dandapani [mailto:sdandapani@counterpane.com]
> Sent: Wednesday, April 12, 2006 7:48 PM
> To: Jim Nasby
> Subject: RE: [PERFORM] multi column query
>
>
> I executed enable_seqscan=off and then ran an explain plan on
> the query
>
> UPDATE chkpfw_tr_dy_dimension
> SET summcount = a.summcount + b.summcount,
> bytes = a.bytes + b.bytes,
> duration = a.duration + b.duration
> from chkpfw_tr_dy_dimension a,
> c_chkpfw_dy_tr_updates b
> WHERE a.firstoccurrence = b.firstoccurrence
> AND a.customerid_id = b.customerid_id
> AND a.sentryid_id = b.sentryid_id
> AND a.node_id = b.node_id
> AND a.interface_id = b.interface_id
> AND a.source_id = b.source_id
> AND a.destination_id = b.destination_id
> AND a.sourceport_id = b.sourceport_id
> AND a.destinationport_id = b.destinationport_id
> AND a.inoutbound_id = b.inoutbound_id
> AND a.action_id = b.action_id
> AND a.protocol_id = b.protocol_id
> AND a.service_id = b.service_id
> AND a.sourcezone_id = b.sourcezone_id
> AND a.destinationzone_id =
> b.destinationzone_id;
>
>
>
> Here is the query plan
>
>
> "Nested Loop (cost=200000036.18..221851442.39 rows=1 width=166)"
> " -> Merge Join (cost=100000036.18..121620543.75 rows=1 width=96)"
> " Merge Cond: (("outer".firstoccurrence =
> "inner".firstoccurrence) AND ("outer".sentryid_id =
> "inner".sentryid_id)
> AND ("outer".node_id = "inner".node_id))"
> " Join Filter: (("outer".customerid_id = "inner".customerid_id)
> AND ("outer".interface_id = "inner".interface_id) AND
> ("outer".source_id
> = "inner".source_id) AND ("outer".destination_id =
> "inner".destination_id) AND ("outer".sourceport_id = "inner".s (..)"
> " -> Index Scan using chkpfw_tr_dy_idx1 on
> chkpfw_tr_dy_dimension a (cost=0.00..21573372.84 rows=6281981
> width=88)"
> " -> Sort (cost=100000036.18..100000037.38 rows=480
> width=136)"
> " Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id"
> " -> Seq Scan on c_chkpfw_dy_tr_updates b
> (cost=100000000.00..100000014.80 rows=480 width=136)"
> " -> Seq Scan on chkpfw_tr_dy_dimension
> (cost=100000000.00..100168078.81 rows=6281981 width=70)"
>
> -----Original Message-----
> From: Jim C. Nasby [mailto:jnasby@pervasive.com]
> Sent: Wednesday, April 12, 2006 5:44 PM
> To: Sriram Dandapani
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] multi column query
>
> On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote:
> > Hi
> >
> > When I update a table that has 20 columns and the where clause
> includes
> > 16 of the columns (this is a data warehousing type update
> on aggregate
> > fields),
> >
> > The bitmap scan is not used by the optimizer. The table is
> indexed on
> 3
> > of the 20 fields. The update takes really long to finish (on a 6
> million
> > row table)
> >
> > Do I need to do some "magic" with configuration to turn on bitmap
> scans.
>
> No. What's explain analyze of the query show? What's it doing now?
> Seqscan? You might try set enable_seqscan=off and see what that does.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
Hi Jim
The problem is fixed. The destination table that was being updated had 3
separate indexes. I combined them to a multi-column index and the effect
was amazing.
Thanks for your input
Sriram
-----Original Message-----
From: Jim Nasby [mailto:jnasby@pervasive.com]
Sent: Thursday, April 13, 2006 9:42 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: RE: [PERFORM] multi column query
You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to?
And the output of \d chkpfw_tr_dy_dimension. The cost for that index
scan looks way too high.
And please reply-all so that the list is included.
> -----Original Message-----
> From: Sriram Dandapani [mailto:sdandapani@counterpane.com]
> Sent: Wednesday, April 12, 2006 7:48 PM
> To: Jim Nasby
> Subject: RE: [PERFORM] multi column query
>
>
> I executed enable_seqscan=off and then ran an explain plan on
> the query
>
> UPDATE chkpfw_tr_dy_dimension
> SET summcount = a.summcount + b.summcount,
> bytes = a.bytes + b.bytes,
> duration = a.duration + b.duration
> from chkpfw_tr_dy_dimension a,
> c_chkpfw_dy_tr_updates b
> WHERE a.firstoccurrence = b.firstoccurrence
> AND a.customerid_id = b.customerid_id
> AND a.sentryid_id = b.sentryid_id
> AND a.node_id = b.node_id
> AND a.interface_id = b.interface_id
> AND a.source_id = b.source_id
> AND a.destination_id = b.destination_id
> AND a.sourceport_id = b.sourceport_id
> AND a.destinationport_id = b.destinationport_id
> AND a.inoutbound_id = b.inoutbound_id
> AND a.action_id = b.action_id
> AND a.protocol_id = b.protocol_id
> AND a.service_id = b.service_id
> AND a.sourcezone_id = b.sourcezone_id
> AND a.destinationzone_id =
> b.destinationzone_id;
>
>
>
> Here is the query plan
>
>
> "Nested Loop (cost=200000036.18..221851442.39 rows=1 width=166)"
> " -> Merge Join (cost=100000036.18..121620543.75 rows=1 width=96)"
> " Merge Cond: (("outer".firstoccurrence =
> "inner".firstoccurrence) AND ("outer".sentryid_id =
> "inner".sentryid_id)
> AND ("outer".node_id = "inner".node_id))"
> " Join Filter: (("outer".customerid_id = "inner".customerid_id)
> AND ("outer".interface_id = "inner".interface_id) AND
> ("outer".source_id
> = "inner".source_id) AND ("outer".destination_id =
> "inner".destination_id) AND ("outer".sourceport_id = "inner".s (..)"
> " -> Index Scan using chkpfw_tr_dy_idx1 on
> chkpfw_tr_dy_dimension a (cost=0.00..21573372.84 rows=6281981
> width=88)"
> " -> Sort (cost=100000036.18..100000037.38 rows=480
> width=136)"
> " Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id"
> " -> Seq Scan on c_chkpfw_dy_tr_updates b
> (cost=100000000.00..100000014.80 rows=480 width=136)"
> " -> Seq Scan on chkpfw_tr_dy_dimension
> (cost=100000000.00..100168078.81 rows=6281981 width=70)"
>
> -----Original Message-----
> From: Jim C. Nasby [mailto:jnasby@pervasive.com]
> Sent: Wednesday, April 12, 2006 5:44 PM
> To: Sriram Dandapani
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] multi column query
>
> On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote:
> > Hi
> >
> > When I update a table that has 20 columns and the where clause
> includes
> > 16 of the columns (this is a data warehousing type update
> on aggregate
> > fields),
> >
> > The bitmap scan is not used by the optimizer. The table is
> indexed on
> 3
> > of the 20 fields. The update takes really long to finish (on a 6
> million
> > row table)
> >
> > Do I need to do some "magic" with configuration to turn on bitmap
> scans.
>
> No. What's explain analyze of the query show? What's it doing now?
> Seqscan? You might try set enable_seqscan=off and see what that does.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>