Обсуждение: Operator performance question
Hi all, I need your help on a small performance problem. I have a table of which I have to do a bunch of counts of various conditions. The worst case scenario where I have to iterate over every record in the table performs just a little bit too slow (800ms). That particular query will be hit a lot (it will be on the index of our web app). PostgreSQL uses a sequential scan (it should IMO) - I think my bottleneck is in the operators on the various columns. My queries look like this: SELECT COUNT(NULLIF(max_persons BETWEEN 5 AND 8, false)) AS "persons 5-8", -- And other variations COUNT(NULLIF(country_id = 74, false)) AS "LOCATION_NETHERLANDS", -- Basically for every country in Europe COUNT(NULLIF(specifications & '00000000000000000000000001000000', 0::bit(32))) AS "washing machine", -- And a bunch more of these; the bit mask is almost fully covered COUNT(*) AS all FROM table; The plan is: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=7371.23..7371.55 rows=1 width=18) (actual time=803.374..803.376 rows=1 loops=1) -> Seq Scan on fewo_property_location (cost=0.00..828.84 rows=41538 width=18) (actual time=0.036..147.211 rows=41492 loops=1) Filter: ((location_id IS NOT NULL) AND (property_state_id = 3)) Total runtime: 804.398 ms (4 rows) The table definition is like: Column | Type | Modifiers -------------------+----------+---------------------- property_id | integer | not null property_state_id | integer | not null location_id | integer | min_persons | smallint | not null max_persons | smallint | not null specifications | bit(32) | default (0)::bit(32) country_id | integer | Indexes: "fewo_property_location_pkey" PRIMARY KEY, btree (property_id) "fewo_property_location_country_idx" btree (country_id) WHERE location_id IS NOT NULL "fewo_property_location_country_location_idx" btree (country_id, location_id) CLUSTER "fewo_property_location_location_online_idx" btree (location_id) WHERE location_id IS NOT NULL AND property_state_id = 3 "fewo_property_location_property_location_idx" btree (property_id, location_id) WHERE location_id IS NOT NULL AND property_state_id = 3 "fewo_property_location_specifications_idx" btree (specifications) Foreign-key constraints: "fewo_property_location_location_id_fkey" FOREIGN KEY (location_id) REFERENCES fewo_location(location_id) MATCH FULL "fewo_property_location_property_state_id_fkey" FOREIGN KEY (property_state_id) REFERENCES fewo_property_state(property_state_id) MATCH FULL My conclusion is that this query time is mostly limited to the somewhat complex COUNT expressions. Is there any way to do this more efficiently? For the record, if I constrain this query to specific countries it performs in about 80ms (10x as fast). The hardware is a dual Opteron64x2, 4G RAM and some kind of RAID setup (software, don't know what type) running in a Xen host - it's our development DB-server. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Shouldn't these be using HAVING? SELECT COUNT(max_persons) ... GROUP BY NULL HAVING max_persons >= 5 AND max_persons <= 8; -- Brandon Aiken CS/IT Systems Engineer -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Alban Hertroys Sent: Tuesday, January 09, 2007 11:07 AM To: Postgres General Subject: [GENERAL] Operator performance question Hi all, I need your help on a small performance problem. I have a table of which I have to do a bunch of counts of various conditions. The worst case scenario where I have to iterate over every record in the table performs just a little bit too slow (800ms). That particular query will be hit a lot (it will be on the index of our web app). PostgreSQL uses a sequential scan (it should IMO) - I think my bottleneck is in the operators on the various columns. My queries look like this: SELECT COUNT(NULLIF(max_persons BETWEEN 5 AND 8, false)) AS "persons 5-8", -- And other variations COUNT(NULLIF(country_id = 74, false)) AS "LOCATION_NETHERLANDS", -- Basically for every country in Europe COUNT(NULLIF(specifications & '00000000000000000000000001000000', 0::bit(32))) AS "washing machine", -- And a bunch more of these; the bit mask is almost fully covered COUNT(*) AS all FROM table; The plan is: QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------- Aggregate (cost=7371.23..7371.55 rows=1 width=18) (actual time=803.374..803.376 rows=1 loops=1) -> Seq Scan on fewo_property_location (cost=0.00..828.84 rows=41538 width=18) (actual time=0.036..147.211 rows=41492 loops=1) Filter: ((location_id IS NOT NULL) AND (property_state_id = 3)) Total runtime: 804.398 ms (4 rows) The table definition is like: Column | Type | Modifiers -------------------+----------+---------------------- property_id | integer | not null property_state_id | integer | not null location_id | integer | min_persons | smallint | not null max_persons | smallint | not null specifications | bit(32) | default (0)::bit(32) country_id | integer | Indexes: "fewo_property_location_pkey" PRIMARY KEY, btree (property_id) "fewo_property_location_country_idx" btree (country_id) WHERE location_id IS NOT NULL "fewo_property_location_country_location_idx" btree (country_id, location_id) CLUSTER "fewo_property_location_location_online_idx" btree (location_id) WHERE location_id IS NOT NULL AND property_state_id = 3 "fewo_property_location_property_location_idx" btree (property_id, location_id) WHERE location_id IS NOT NULL AND property_state_id = 3 "fewo_property_location_specifications_idx" btree (specifications) Foreign-key constraints: "fewo_property_location_location_id_fkey" FOREIGN KEY (location_id) REFERENCES fewo_location(location_id) MATCH FULL "fewo_property_location_property_state_id_fkey" FOREIGN KEY (property_state_id) REFERENCES fewo_property_state(property_state_id) MATCH FULL My conclusion is that this query time is mostly limited to the somewhat complex COUNT expressions. Is there any way to do this more efficiently? For the record, if I constrain this query to specific countries it performs in about 80ms (10x as fast). The hardware is a dual Opteron64x2, 4G RAM and some kind of RAID setup (software, don't know what type) running in a Xen host - it's our development DB-server. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Alban Hertroys <alban@magproductions.nl> writes: > My conclusion is that this query time is mostly limited to the somewhat > complex COUNT expressions. Is there any way to do this more efficiently? Offhand I would bet on the bitstring-AND operations being the bottleneck; you could test this by comparing the speed of queries that are doing different mixes of the same number of COUNT()s. If you're happy with a fixed-width 32-bit field, consider using an integer field and integer & operations, instead of bitstring. Bitstring is a pass-by-reference type and so inherently a lot less efficient than an integer. Another suggestion is to replace count(nullif(boolean_expr, false)) with sum((boolean_expr)::int) I think this would be a marginal speed win at best (basically replacing a Const and a NullIf node with a Cast node), but it just seems to me to be more natural ... it took me a bit to figure out what your query was trying to accomplish. regards, tom lane
Brandon Aiken wrote: > Shouldn't these be using HAVING? > > SELECT COUNT(max_persons) ... > GROUP BY NULL > HAVING max_persons >= 5 AND max_persons <= 8; I don't think so; I need multiple COUNT results for multiple combinations; 5-8 persons isn't the only one. I also need counts for 1-4, 9-12 and 13 and up. The output I currently get is something like this: "persons 1-4" | 123 "persons 5-8" | 234 "persons 9-12" | 345 ... "LOCATION_NETHERLANDS" | 987 "LOCATION_BELGIUM" | 765 ... "washing machine" | 432 "dish washer" | 210 I don't see how I would get that with a HAVING clause. And I don't think you can have multiple HAVING clauses... The crux of the query is that I get all kinds of conditions counted - with a single query, output as a single record. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Tom Lane wrote: > Alban Hertroys <alban@magproductions.nl> writes: >> My conclusion is that this query time is mostly limited to the somewhat >> complex COUNT expressions. Is there any way to do this more efficiently? > > Offhand I would bet on the bitstring-AND operations being the > bottleneck; you could test this by comparing the speed of queries that > are doing different mixes of the same number of COUNT()s. If you're > happy with a fixed-width 32-bit field, consider using an integer field > and integer & operations, instead of bitstring. Bitstring is a > pass-by-reference type and so inherently a lot less efficient than an > integer. Hmm... I picked bitstrings because a quick test seemed to show it performing better than ints. Apparently my test wasn't right. Definitely a thing to test again. So far I have some bits of the 32 to spare, but the set of options to filter on isn't fixed yet - I expect it to grow, and I don't know by how much yet. I might (although I doubt it) get beyond 64 bits, in which case even a bigint wouldn't suffice... > Another suggestion is to replace > > count(nullif(boolean_expr, false)) > > with > > sum((boolean_expr)::int) I hadn't realized true::int = 1 and false::int = 0. Thanks for the suggestions. > I think this would be a marginal speed win at best (basically replacing > a Const and a NullIf node with a Cast node), but it just seems to me > to be more natural ... it took me a bit to figure out what your query > was trying to accomplish. It took me a bit to come up with that solution; I agree it's not a very obvious one. I basically hacked my expression to evaluate to NULL if false, so that count wouldn't pick it up. NULLIF did almost what I wanted. I would've been happier with a function NULLIF(boolean_expr) - with just one boolean parameter. It may be even better to have a COUNTIF(boolean_expr) aggregate that would only count values where the expression evaluates true - Come to think of it, that looks awfully familiar... Oracle maybe? Anyway, I now have something to go on again. Thanks. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //