both force order of evaluation and hit index

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема both force order of evaluation and hit index
Дата
Msg-id 20190410190326.GA31889@telsasoft.com
обсуждение исходный текст
Список pgsql-general
I wrote a script to import CSV query logs as described here:
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

I also created indices for searching:
|CREATE INDEX ON postgres_log_2019_04_09_2300 ((regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float))
WHEREmessage~'^duration:'
 

I'm having an issue that queries on long duration 1) need to specify "WHERE
message~'^duration'" to hit the index; 2) need to *evaluate* that condition
before the next condition:
|regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float))>9999

I've tried using CASE and OFFSET 0 and WITH, it seems like any effort I make to
force evaluation of "duration>9999" to happen before "message~'^duration'" also
causes a seq scan rather than index scan on query duration.

I'm hesitating to use ORDER BY just to encourage index scan, since it doesn't
even guarantee that it's going to avoid the error.

This is doing index scan on other/"wrong" index:
|postgres=# explain SELECT date_trunc('minute',log_time)log_time, error_severity, session_id, user_name, database,
left(message,99),left(query,99)
 
|FROM postgres_log_2019_04_10_1000 WHERE log_time>'2019-04-10 10:51' AND 
|CASE WHEN message~'^duration:' THEN regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float ELSE NULL END >
9999AND
 
|(application_name!='psql' OR user_name='postgres') ORDER BY 1 DESC;

| Sort  (cost=74908.17..74955.87 rows=19080 width=101)
|   Sort Key: (date_trunc('minute'::text, log_time)) DESC
|   ->  Bitmap Heap Scan on postgres_log_2019_04_10_1000  (cost=22.24..73551.61 rows=19080 width=101)
|         Recheck Cond: (log_time > '2019-04-10 10:51:00-04'::timestamp with time zone)
|         Filter: (((application_name <> 'psql'::text) OR (user_name = 'postgres'::text)) AND (CASE WHEN (message ~
'^duration:'::text)THEN (regexp_replace(message, '^duration: ([.0-9]+) ms.*'::text, '\1'::text))::double precision ELSE
NULL::doubleprecision END > '9999'::double precision))
 
|         ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_log_time_idx  (cost=0.00..17.47 rows=58197 width=0)
|               Index Cond: (log_time > '2019-04-10 10:51:00-04'::timestamp with time zone)

This gives a nice plan, but doesn't avoid errors due to casting to float lines
which don't match the duration regex.

postgres=# explain SELECT date_trunc('minute',log_time)log_time, dur, error_severity, session_id, user_name, database,
left(message,99),left(query,99)
 
FROM (SELECT *,regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float AS dur FROM
postgres_log_2019_04_10_1000WHERE log_time>'2019-04-10 10:50' AND message~'^duration:')x WHERE
 
dur>9999 AND (application_name!='psql' OR user_name='postgres') ;

| Bitmap Heap Scan on postgres_log_2019_04_10_1000  (cost=5214.86..50141.68 rows=19742 width=109)
|   Recheck Cond: ((log_time > '2019-04-10 10:50:00-04'::timestamp with time zone) AND ((regexp_replace(message,
'^duration:([.0-9]+) ms.*'::text, '\1'::text))::double precision > '9999'::double precision) AND (message ~
'^duration:'::text))
|   Filter: ((application_name <> 'psql'::text) OR (user_name = 'postgres'::text))
|   ->  BitmapAnd  (cost=5214.86..5214.86 rows=19819 width=0)
|         ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_log_time_idx  (cost=0.00..17.50 rows=59462 width=0)
|               Index Cond: (log_time > '2019-04-10 10:50:00-04'::timestamp with time zone)
|         ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_duration_idx  (cost=0.00..5187.23 rows=238241 width=0)
|               Index Cond: ((regexp_replace(message, '^duration: ([.0-9]+) ms.*'::text, '\1'::text))::double precision
>'9999'::double precision)
 
|(8 rows)

|ERROR:  invalid input syntax for type double precision: "temporary file: path
"pg_tblspc/3292386340/PG_11_201809051/pgsql_tmp/pgsql_tmp16493.0",size 603619328"
 

If I write as CTE or OFFSET 0 subq, that doesn't allow scanning on "duration"
index, and scans on time index instead:

|postgres=# explain analyze WITH x AS (SELECT *,regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float AS
durFROM postgres_log_2019_04_10_1000 WHERE log_time>'2019-04-10 10:50' AND message~'^duration:') SELECT
date_trunc('minute',log_time)log_time,dur, error_severity, session_id, user_name, database, left(message,99),
left(query,99)
FROM x WHERE dur>9999 AND (application_name!='psql' OR user_name='postgres') ;

|postgres=# explain analyze SELECT date_trunc('minute',log_time)log_time, dur, error_severity, session_id, user_name,
database,left(message,99), left(query,99)
 
FROM (SELECT *,regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float AS dur FROM
postgres_log_2019_04_10_1000WHERE log_time>'2019-04-10 10:50' AND message~'^duration:' OFFSET 0)x WHERE
 
dur>9999 AND (application_name!='psql' OR user_name='postgres') ;

Thanks in advance for any advice.

Justin



В списке pgsql-general по дате отправления:

Предыдущее
От: Michel Pelletier
Дата:
Сообщение: Re: How to use full-text search URL parser to filter query results bydomain name?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_indexes doesn't show indexes for partitioned tables - bug or intended?