Обсуждение: Very slow select
I defined a table with two indexed text columns, a and b, of variable length. This table contains 287195 rows. SELECT FROM t WHERE a='my string'; works very fast. The same select made on b is very slow (2 or 3 seconds), as if no indexexists for column b. I'd like to know why. Note that b contains very long strings, more than 500 chars. Is there a limit in indexing? Thanks. Stefano -- Dott. Stefano Bargioni Biblioteca della Pontificia Universita' della Santa Croce - Roma <mailto:bargioni@usc.urbe.it> <http://www.usc.urbe.it> Personal web page: <http://www.usc.urbe.it/html/php.script?bargioni> --- "Si apud bibliothecam hortulum habes, nihil deerit" (Cicerone) ---
Stefano Bargioni <bargioni@usc.urbe.it> writes: > SELECT FROM t WHERE a='my string'; works very fast. The same select > made on b is very slow (2 or 3 seconds), as if no index exists for > column b. What does EXPLAIN say about the two queries? regards, tom lane
> Stefano Bargioni writes: > > SELECT FROM t WHERE a='my string'; works very fast. The same select > > made on b is very slow (2 or 3 seconds), as if no index exists for > > column b. > > What does EXPLAIN say about the two queries? > > regards, tom lane > Dear Tom, thanks for your help. Here is the explain results. As you note, the second select is not index driven. This is why it is slow. Are indexes built correctly / used / when fieldscontain very large text strings? Can an hash index be useful? Bye (Merry 2K Xmas!). Stefano ucc2=> explain select count(*) from document where xtitle='a'; NOTICE: QUERY PLAN: Aggregate (cost=70.61 rows=759 width=4) -> Index Scan using xtitle_index on document (cost=70.61 rows=759 width=4) EXPLAIN ucc2=> explain select count(*) from document where xothertitles='a'; NOTICE: QUERY PLAN: Aggregate (cost=26005.44 rows=287195 width=4) -> Seq Scan on document (cost=26005.44 rows=287195 width=4) EXPLAIN ucc2=> \d document Table = document +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | id | int4 not null default nextval('" | 4 | | title | text | var | | xtitle | text | var | | othertitles | text | var | | xothertitles | text | var | | author | text | var | | xauthor | text | var | | otherauthors | text | var | | xotherauthors | text | var | | subject | text | var | | xsubject | text | var | | publisher | text | var | | xpublisher | text | var | | collection | text | var | | xcollection | text | var | | yearpub | int4 | 4 | | xyearpub | int4 | 4 | | locationpub | text | var | | xlocationpub | text | var | | languagecode | text | var | | classification | text | var | | digest | text | var | +----------------------------------+----------------------------------+-------+ Indices: digest_document_index document_id_key xauthor_index xcollection_index xotherauthors_index xothertitles_index xpublisher_index xsubject_index xtitle_index indexes where created with create index xothertitles_index on document (xothertitles); create index xtitle_index on document (xtitle); -- Dott. Stefano Bargioni Biblioteca della Pontificia Universita' della Santa Croce - Roma <mailto:bargioni@usc.urbe.it> <http://www.usc.urbe.it> Personal web page: <http://www.usc.urbe.it/html/php.script?bargioni> --- "Si apud bibliothecam hortulum habes, nihil deerit" (Cicerone) ---
Stefano Bargioni <bargioni@usc.urbe.it> writes: >> What does EXPLAIN say about the two queries? > ucc2=> explain select count(*) from document where xtitle='a'; > NOTICE: QUERY PLAN: > Aggregate (cost=70.61 rows=759 width=4) > -> Index Scan using xtitle_index on document (cost=70.61 rows=759 width=4) > EXPLAIN > ucc2=> explain select count(*) from document where xothertitles='a'; > NOTICE: QUERY PLAN: > Aggregate (cost=26005.44 rows=287195 width=4) > -> Seq Scan on document (cost=26005.44 rows=287195 width=4) The reason for the difference in plan is the difference in estimated number of rows retrieved. You probably have some extremely common value in xothertitles, which is skewing the statistics that the planner uses for this estimate. Sometimes, what's skewing the stats is a value that's actually just a placeholder for "unknown", such as an empty string or 'UNKNOWN' or something like that. If possible, I recommend coding your application to use NULL for this purpose, rather than any ordinary data value. The stats account for NULLs separately, so the system won't be fooled by having a lot of them. regards, tom lane