Обсуждение: Bad performance of SELECT ... where id IN (...)
Hi, I have a big performance problem in my SQL select query: ======================================== select * from event where user_id in (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401, 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301, 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201, 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101, 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0); ======================================== The above SELECT always spends 1200ms. The EXPLAIN ANLYSE result of it is : QUERY PLAN -------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on event (cost=73685.08..5983063.49 rows=662018 width=36) (actual time=24.857..242.826 rows=134289 loops=1) Recheck Cond: (user_id = ANY ('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451 ,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,394,3 93,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336 ,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,2 78,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221 ,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,1 63,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106 ,105,104,103,102,101,100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,3 1,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0}'::integer[])) -> Bitmap Index Scan on event_user_id_idx (cost=0.00..71699.03 rows=662018 width=0) (actual time=24.610..24.610 rows=134289 loops=1) Index Cond: (user_id = ANY ('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452 ,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,3 94,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337 ,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,2 79,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222 ,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,1 64,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107 ,106,105,104,103,102,101,100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33, 32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0}'::integer[])) -------------------------------------------------------------------------------------------------------- My table's structure is : ===================== CREATE TABLE event ( id integer NOT NULL, user_id integer NOT NULL, action_type integer NOT NULL, resource_type integer NOT NULL, resource_sn integer NOT NULL, result_type integer, result_sn integer, created_date timestamp with time zone NOT NULL ); ===================== And the table event has more than 100,000,000 rows, and I have a btree index, event_user_id_idx, on user_id, the index size is 2171MB. Do anyone have good ideas to optimize this query? Thanks very much. -- 夏清然 Xia Qingran qingran.xia@gmail.com Sent from Beijing, 11, China Charles de Gaulle - "The better I get to know men, the more I find myself loving dogs." - http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
> I have a big performance problem in my SQL select query: > > ======================================== > select * from event where user_id in > (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401, > 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301, > 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201, > 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101, > 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0); > ======================================== What happens if you change the query to select * from event where user_id >= 0 and user_id <= 500; ? :-) -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
Xia Qingran wrote: > Hi, > I have a big performance problem in my SQL select query: > > ======================================== > select * from event where user_id in > (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401, > 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301, > 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201, > 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101, > 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0); > ======================================== > > The above SELECT always spends 1200ms. If your user_id is always in a narrow range like this, or even in any range that is a small fraction of the total, then adda range condition, like this: select * from event where user_id <= 500 and user_id >= 0 and user_id in (...) I did this exact same thing in my application and it worked well. Craig
On 26-Sep-2009, at 10:16 PM, Claus Guttesen wrote: >> I have a big performance problem in my SQL select query: >> >> ======================================== >> select * from event where user_id in >> (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401 >> , >> 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301 >> , >> 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201 >> , >> 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101 >> , >> 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 >> ); >> ======================================== > > What happens if you change the query to > > select * from event where user_id >= 0 and user_id <= 500; or select * from event where user_id <= 500; :) Besides, your index seem quite huge >2G, and it usually takes some time to process the result, even though it's already indexed with btree. > > ? :-) > > -- > regards > Claus > > When lenity and cruelty play for a kingdom, > the gentler gamester is the soonest winner. > > Shakespeare > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Xia Qingran <qingran.xia@gmail.com> writes: > I have a big performance problem in my SQL select query: > select * from event where user_id in > (500,499,498, ... ,1,0); > The above SELECT always spends 1200ms. Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms. So either the planning time is about 1000ms, or transmitting and displaying the 134K rows produced by the query takes that long, or some combination of the two. I wouldn't be too surprised if it's the data display that's slow; but if it's the planning time that you're unhappy about, updating to a more recent PG release might possibly help. What version is this anyway? regards, tom lane
if you reuse that set a lot, how about storing it in a table , and doing the join on db side ?
if it is large, it sometimes makes sense to create temp table just for single query (I use that sort of stuff for comparing with few M records).
But temp tables in that case have to be short lived, as they can't reuse space (no FSM in temporary table world I'm afraid, I hope it will be fixed at some stage tho).
On Sun, Sep 27, 2009 at 1:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Xia Qingran <qingran.xia@gmail.com> writes: >> I have a big performance problem in my SQL select query: >> select * from event where user_id in >> (500,499,498, ... ,1,0); >> The above SELECT always spends 1200ms. > > Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms. > So either the planning time is about 1000ms, or transmitting and > displaying the 134K rows produced by the query takes that long, or some > combination of the two. I wouldn't be too surprised if it's the data > display that's slow; but if it's the planning time that you're unhappy > about, updating to a more recent PG release might possibly help. What > version is this anyway? > > regards, tom lane Oh, It is a problem. Forgot to talk about my platform. I am running PostgreSQL 8.4.0 on FreeBSD 7.2-amd64 box, which has dual Xeon 5410 CPUs, 8GB memory and 2 SATA disks. And my postgresql.conf is listed as follow: --------------------------------------------------------------------------------------- listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 88 # (change requires restart) superuser_reserved_connections = 3 ssl = off # (change requires restart) tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds; tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; tcp_keepalives_count = 0 # TCP_KEEPCNT; shared_buffers = 2048MB # min 128kB or max_connections*16kB temp_buffers = 32MB # min 800kB max_prepared_transactions = 150 # can be 0 or more, 0 to shutdown the prepared transactions. work_mem = 8MB # min 64kB maintenance_work_mem = 1024MB # min 1MB max_stack_depth = 8MB # min 100kB max_files_per_process = 16384 # min 25 vacuum_cost_delay = 100 # 0-1000 milliseconds vacuum_cost_page_hit = 1 # 0-10000 credits vacuum_cost_page_miss = 10 # 0-10000 credits vacuum_cost_page_dirty = 20 # 0-10000 credits vacuum_cost_limit = 500 # 1-10000 credits bgwriter_delay = 500ms # 10-10000ms between rounds bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round fsync = off # turns forced synchronization on or off synchronous_commit = off # immediate fsync at commit wal_sync_method = fsync # the default is the first option full_page_writes = off # recover from partial page writes wal_buffers = 2MB # min 32kB wal_writer_delay = 200ms # 1-10000 milliseconds commit_delay = 50 # range 0-100000, in microseconds commit_siblings = 5 # range 1-1000 checkpoint_segments = 32 # in logfile segments, min 1, 16MB each checkpoint_timeout = 6min # range 30s-1h checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 30s # 0 is off enable_bitmapscan = on enable_hashagg = on enable_hashjoin = on enable_indexscan = on enable_mergejoin = on enable_nestloop = on enable_seqscan = on enable_sort = on enable_tidscan = on seq_page_cost = 1.8 # measured on an arbitrary scale random_page_cost = 2 # same scale as above cpu_tuple_cost = 0.15 # same scale as above cpu_index_tuple_cost = 0.07 # same scale as above cpu_operator_cost = 0.03 # same scale as above effective_cache_size = 3072MB geqo = on geqo_threshold = 20 geqo_effort = 7 # range 1-10 geqo_pool_size = 0 # selects default based on effort geqo_generations = 0 # selects default based on effort geqo_selection_bias = 2.0 # range 1.5-2.0 default_statistics_target = 500 # range 1-1000 constraint_exclusion = partition from_collapse_limit = 20 join_collapse_limit = 20 # 1 disables collapsing of explicit log_destination = 'syslog' syslog_facility = 'LOCAL2' syslog_ident = 'postgres' client_min_messages = notice # values in order of decreasing detail: log_min_messages = error # values in order of decreasing detail: log_error_verbosity = terse # terse, default, or verbose messages log_min_error_statement = panic # values in order of decreasing detail: log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements silent_mode = on debug_print_parse = off debug_print_rewritten = off debug_print_plan = off debug_pretty_print = off log_checkpoints = off log_connections = off log_disconnections = off log_duration = on log_hostname = off log_line_prefix = '' # special values: log_lock_waits = off # log lock waits >= deadlock_timeout log_statement = 'none' # none, ddl, mod, all log_temp_files = -1 # log temporary files equal or larger track_activities = on track_counts = on update_process_title = off log_parser_stats = off log_planner_stats = off log_executor_stats = off log_statement_stats = off autovacuum = on # Enable autovacuum subprocess? 'on' log_autovacuum_min_duration = 10 # -1 disables, 0 logs all actions and autovacuum_max_workers = 3 # max number of autovacuum subprocesses autovacuum_naptime = 10min # time between autovacuum runs autovacuum_vacuum_threshold = 100 # min number of row updates before autovacuum_analyze_threshold = 50 # min number of row updates before autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum autovacuum_vacuum_cost_delay = 30 # default vacuum cost delay for autovacuum_vacuum_cost_limit = 200 # default vacuum cost limit for datestyle = 'iso, mdy' client_encoding = utf-8 # actually, defaults to database lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting default_text_search_config = 'pg_catalog.english' deadlock_timeout = 60s max_locks_per_transaction = 32 # min 10 regex_flavor = basic # advanced, extended, or basic --------------------------------------------------------------------------------------- Thanks a lot. -- 夏清然 Xia Qingran qingran.xia@gmail.com Sent from Beijing, 11, China Joan Crawford - "I, Joan Crawford, I believe in the dollar. Everything I earn, I spend." - http://www.brainyquote.com/quotes/authors/j/joan_crawford.html
On Sat, Sep 26, 2009 at 10:59 PM, Craig James <craig_james@emolecules.com> wrote: > > If your user_id is always in a narrow range like this, or even in any range > that is a small fraction of the total, then add a range condition, like > this: > > select * from event where user_id <= 500 and user_id >= 0 and user_id in > (...) > > I did this exact same thing in my application and it worked well. > > Craig > It is a good idea. But In my application, most of the queries' user_id are random and difficult to range. Thanks anyway. -- 夏清然 Xia Qingran qingran.xia@gmail.com Sent from Beijing, 11, China Charles de Gaulle - "The better I get to know men, the more I find myself loving dogs." - http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
Xia Qingran wrote: > On Sun, Sep 27, 2009 at 1:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Xia Qingran <qingran.xia@gmail.com> writes: >>> I have a big performance problem in my SQL select query: >>> select * from event where user_id in >>> (500,499,498, ... ,1,0); >>> The above SELECT always spends 1200ms. >> Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms. >> So either the planning time is about 1000ms, or transmitting and >> displaying the 134K rows produced by the query takes that long, or some >> combination of the two. I wouldn't be too surprised if it's the data >> display that's slow; but if it's the planning time that you're unhappy >> about, updating to a more recent PG release might possibly help. What >> version is this anyway? >> >> regards, tom lane > > Oh, It is a problem. I don't see where the "Total runtime" information is in your first message. Also, did you run VACUUM FULL ANALYZE lately? > Forgot to talk about my platform. I am running PostgreSQL 8.4.0 on > FreeBSD 7.2-amd64 box, which has dual Xeon 5410 CPUs, 8GB memory and 2 > SATA disks. > > And my postgresql.conf is listed as follow: > --------------------------------------------------------------------------------------- > > listen_addresses = '*' # what IP address(es) to listen on; > port = 5432 # (change requires restart) > max_connections = 88 # (change requires restart) > superuser_reserved_connections = 3 > ssl = off # (change requires restart) > tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds; > tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; > tcp_keepalives_count = 0 # TCP_KEEPCNT; > shared_buffers = 2048MB # min 128kB or max_connections*16kB For start I think you will need to make shared_buffers larger than your index to get decent performance - try setting it to 4096 MB and see if it helps. > temp_buffers = 32MB # min 800kB > max_prepared_transactions = 150 # can be 0 or more, 0 to shutdown the > prepared transactions. > work_mem = 8MB # min 64kB Depending on the type of your workload (how many clients are connected and how complex are the queries) you might want to increase work_mem also. Try 16 MB - 32 MB or more and see if it helps. > fsync = off # turns forced synchronization on or off > synchronous_commit = off # immediate fsync at commit Offtopic - you probably know what you are doing by disabling these, right?
Hi Xia, Try this patch: http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch It's a hack, but it works for us. I think you're probably spending most of your query time planning, and this patch helps speed things up 10x over here. Regards, Omar On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote: > On Sat, Sep 26, 2009 at 10:59 PM, Craig James > <craig_james@emolecules.com> wrote: >> >> If your user_id is always in a narrow range like this, or even in any range >> that is a small fraction of the total, then add a range condition, like >> this: >> >> select * from event where user_id <= 500 and user_id >= 0 and user_id in >> (...) >> >> I did this exact same thing in my application and it worked well. >> >> Craig >> > > It is a good idea. But In my application, most of the queries' user_id > are random and difficult to range. > Thanks anyway. > > > > -- > 夏清然 > Xia Qingran > qingran.xia@gmail.com > Sent from Beijing, 11, China > Charles de Gaulle - "The better I get to know men, the more I find > myself loving dogs." - > http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Sun, Oct 4, 2009 at 9:58 PM, Omar Kilani <omar.kilani@gmail.com> wrote: > Hi Xia, > > Try this patch: > > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch > > It's a hack, but it works for us. I think you're probably spending > most of your query time planning, and this patch helps speed things up > 10x over here. Woof. I can see that helping in some situations, but what a foot-gun! ...Robert
Robert, On Mon, Oct 5, 2009 at 11:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Oct 4, 2009 at 9:58 PM, Omar Kilani <omar.kilani@gmail.com> wrote: >> Hi Xia, >> >> Try this patch: >> >> http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch >> >> It's a hack, but it works for us. I think you're probably spending >> most of your query time planning, and this patch helps speed things up >> 10x over here. > > Woof. I can see that helping in some situations, but what a foot-gun! We've run that patch for about 4 years (originally coded for us by Neil Conway for 8.2, I think), and have never seen any negatives from it. I'm not really sure what the alternatives are -- it never really makes sense to get the selectivity for thousands of items in the IN clause. I've never seen a different plan for the same query against a DB with that patch vs without -- it just takes a huge amount of time longer to run without it. :) But yeah, definitely a hack, and should only be used if needed -- hopefully there's some sort of official solution on the horizon. :) > ...Robert Regards, Omar
On Mon, Oct 5, 2009 at 1:24 PM, Omar Kilani <omar.kilani@gmail.com> wrote:
start using temporary tables, transactions, and joins.
Depending on source of the data (if the source is another query, than just combine it in one query with join), otherwise create temp table, fill out with data, and run query with join.
If you do all that in transaction, it will be very fast.
--
I'm not really sure what the alternatives are -- it never really makes
sense to get the selectivity for thousands of items in the IN clause.
I've never seen a different plan for the same query against a DB with
that patch vs without -- it just takes a huge amount of time longer to
run without it. :)
But yeah, definitely a hack, and should only be used if needed --
hopefully there's some sort of official solution on the horizon. :)
start using temporary tables, transactions, and joins.
Depending on source of the data (if the source is another query, than just combine it in one query with join), otherwise create temp table, fill out with data, and run query with join.
If you do all that in transaction, it will be very fast.
GJ
On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar.kilani@gmail.com> wrote: > Hi Xia, > > Try this patch: > > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch > > It's a hack, but it works for us. I think you're probably spending > most of your query time planning, and this patch helps speed things up > 10x over here. Thanks! I am trying it. Regards, Xia Qingran > > Regards, > Omar > > On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote: >> On Sat, Sep 26, 2009 at 10:59 PM, Craig James >> <craig_james@emolecules.com> wrote: >>> >>> If your user_id is always in a narrow range like this, or even in any range >>> that is a small fraction of the total, then add a range condition, like >>> this: >>> >>> select * from event where user_id <= 500 and user_id >= 0 and user_id in >>> (...) >>> >>> I did this exact same thing in my application and it worked well. >>> >>> Craig >>> >> >> It is a good idea. But In my application, most of the queries' user_id >> are random and difficult to range. >> Thanks anyway. >> >> >> >> -- >> 夏清然 >> Xia Qingran >> qingran.xia@gmail.com >> Sent from Beijing, 11, China >> Charles de Gaulle - "The better I get to know men, the more I find >> myself loving dogs." - >> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > -- 夏清然 Xia Qingran qingran.xia@gmail.com Sent from Beijing, 11, China Stephen Leacock - "I detest life-insurance agents: they always argue that I shall some day die, which is not so." - http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html
On Fri, Oct 09, 2009 at 08:31:54PM +0800, Xia Qingran wrote: > On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar.kilani@gmail.com> wrote: > > Hi Xia, > > > > Try this patch: > > > > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch > > > > It's a hack, but it works for us. I think you're probably spending > > most of your query time planning, and this patch helps speed things up > > 10x over here. > > Thanks! > I am trying it. > > Regards, > > Xia Qingran > We have a similar situation when using DSPAM with a PostgreSQL backend. In that case we used a function like the following to speed up the lookups. I do not know if it would be useful in your situation, but I thought I would post it for the group: The original query was of the form: SELECT uid, token, spam_hits, innocent_hits FROM dspam_token_data WHERE uid = 'xxx' AND token IN (...); The faster version of the query in the current code is: SELECT * FROM lookup_tokens(%d, '{...}); where lookup_tokens is defined as follows: create function lookup_tokens(integer,bigint[]) returns setof dspam_token_data language plpgsql stable as ' declare v_rec record; begin for v_rec in select * from dspam_token_data where uid=$1 and token in (select $2[i] from generate_series(array_lower($2,1), array_upper($2,1)) s(i)) loop return next v_rec; end loop; return; end;'; Anyway, you may want to try a similar approach instead of the posted code change. Regards, Ken > > > > Regards, > > Omar > > > > On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote: > >> On Sat, Sep 26, 2009 at 10:59 PM, Craig James > >> <craig_james@emolecules.com> wrote: > >>> > >>> If your user_id is always in a narrow range like this, or even in any range > >>> that is a small fraction of the total, then add a range condition, like > >>> this: > >>> > >>> select * from event where user_id <= 500 and user_id >= 0 and user_id in > >>> (...) > >>> > >>> I did this exact same thing in my application and it worked well. > >>> > >>> Craig > >>> > >> > >> It is a good idea. But In my application, most of the queries' user_id > >> are random and difficult to range. > >> Thanks anyway. > >> > >> > >> > >> -- > >> ????????? > >> Xia Qingran > >> qingran.xia@gmail.com > >> Sent from Beijing, 11, China > >> Charles de Gaulle ??- "The better I get to know men, the more I find > >> myself loving dogs." - > >> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html > >> > >> -- > >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > >> To make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-performance > >> > > > > > > -- > ????????? > Xia Qingran > qingran.xia@gmail.com > Sent from Beijing, 11, China > Stephen Leacock - "I detest life-insurance agents: they always argue > that I shall some day die, which is not so." - > http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >