Обсуждение: sql indexing suggestions needed
i'm going crazy trying to optimize this select. The table has ~25 columns, the select is based on 10. There are approx 5 million records in the table and growing. No matter how I index + analyze this table, including making an index of every related column on the search, pg keeps doing a sequential scan and never includes an index -- which takes ~2minutes to do. I really need to cut this down. SELECT * FROM table_a WHERE ( bool_a = False ) AND ( bool_b= False ) AND ( int_c IS NOT NULL ) AND ( int_c <= 10 ) AND ( bool_d = False ) AND ( bool_e= True ) AND ( timestamp_f IS NULL ) AND ( bool_g IS False ) AND ( int_h= 1 ) AND ( bool_i = False ) ORDER BY id ASC LIMIT 100 can anyone suggest an indexing approach that might get pg to use the indexes ? this is driving me crazy. thanks.
I wonder if converting all of your bools to a bit string and storing that string as a number would make things faster? Benjamin On Mar 20, 2007, at 11:54 AM, Jonathan Vanasco wrote: > i'm going crazy trying to optimize this select. > > The table has ~25 columns, the select is based on 10. There are > approx 5 million records in the table and growing. > > No matter how I index + analyze this table, including making an > index of every related column on the search, pg keeps doing a > sequential scan and never includes an index -- which takes > ~2minutes to do. I really need to cut this down. > > SELECT > * > FROM > table_a > WHERE > ( bool_a = False ) > AND > ( bool_b= False ) > AND > ( int_c IS NOT NULL ) > AND > ( int_c <= 10 ) > AND > ( bool_d = False ) > AND > ( bool_e= True ) > AND > ( timestamp_f IS NULL ) > AND > ( bool_g IS False ) > AND > ( int_h= 1 ) > AND > ( bool_i = False ) > ORDER BY > id ASC > LIMIT 100 > > can anyone suggest an indexing approach that might get pg to use > the indexes ? this is driving me crazy. > > thanks. > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >
On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote: > No matter how I index + analyze this table, including making an > index of every related column on the search, pg keeps doing a > sequential scan and never includes an index -- which takes > ~2minutes to do. I really need to cut this down. a friend just chimed in off-list -- i had missed putting the 'id' column in the large index of all the used columns , a common mistake. works like a charm now.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/20/07 13:54, Jonathan Vanasco wrote: > i'm going crazy trying to optimize this select. > > The table has ~25 columns, the select is based on 10. There are approx > 5 million records in the table and growing. > > No matter how I index + analyze this table, including making an index of > every related column on the search, pg keeps doing a sequential scan and > never includes an index -- which takes ~2minutes to do. I really need > to cut this down. > > SELECT > * > FROM > table_a > WHERE > ( bool_a = False ) > AND > ( bool_b= False ) > AND > ( int_c IS NOT NULL ) If it's less than 10, it can't be NULL. No need for this predicate. > AND > ( int_c <= 10 ) > AND > ( bool_d = False ) > AND > ( bool_e= True ) > AND > ( timestamp_f IS NULL ) > AND > ( bool_g IS False ) > AND > ( int_h= 1 ) > AND > ( bool_i = False ) > ORDER BY > id ASC > LIMIT 100 > > can anyone suggest an indexing approach that might get pg to use the > indexes ? this is driving me crazy. The problem is that the bool columns only have 2 values, and so it's more efficient to scan the whole table than to use indexes. How many *distinct* values are there in int_c? What percentage of them match "int_c <= 10"? Same questions, but for int_h. If int_h is relatively unique, then this index might help: (INT_H, INT_C) If that helps, try (INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I) HTH. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGADT0S9HxQb37XmcRAubuAJ0do/zu0vkaw5XzVQyPeJnFB2cJtwCeMCna cH3p6UGwqes8ZbAc5QfE1ok= =pPl0 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/20/07 14:11, Jonathan Vanasco wrote: > > On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote: > >> No matter how I index + analyze this table, including making an index >> of every related column on the search, pg keeps doing a sequential >> scan and never includes an index -- which takes ~2minutes to do. I >> really need to cut this down. > > a friend just chimed in off-list -- i had missed putting the 'id' column > in the large index of all the used columns , a common mistake. works > like a charm now. If the WHERE clause filters out most of the records, having ID in the index shouldn't be needed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGADVyS9HxQb37XmcRAjJZAKCX1VmuSVdS9VfK6mXrYNl7mVATjQCgpWef MVFm2rH8+paCqnzyw9NyKZk= =kbvB -----END PGP SIGNATURE-----
On Mar 20, 2007, at 3:24 PM, Ron Johnson wrote: > How many *distinct* values are there in int_c? What percentage of > them match "int_c <= 10"? right now there are 14, and 80% match -- but next week that number will be 20 , 60% ... and so on > Same questions, but for int_h. same answer > If int_h is relatively unique, then this index might help: > (INT_H, INT_C) If that helps, try > (INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I) i'll try both, thanks. the current 'super index' works - i'm more worried for speed right now than disk space , so its worth physical overhead.
On Mar 20, 2:54 pm, postg...@2xlp.com (Jonathan Vanasco) wrote: > can anyone suggest an indexing approach that might get pg to use the > indexes ? this is driving me crazy. Have you tried an expression-based index? http://www.postgresql.org/docs/8.1/interactive/indexes-expressional.html I'm assuming the right side of each equals sign is constant for you. Basically, you could create an expression-based index on all these conditions, which would evaluate to true or false for every row in your table. Caveat: If you have heavy updates/inserts this might be a bad idea. Also, the planner may still decide to seq scan, based on your configuration and resultset size. Mark