Re: Help with optimizing a query over hierarchical data

Поиск
Список
Период
Сортировка
От Damon Snyder
Тема Re: Help with optimizing a query over hierarchical data
Дата
Msg-id CACkQbuhbP=JguiAVMkv-gWKmj_BLqnVOMfTMddzh=0mLYvEG_w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Help with optimizing a query over hierarchical data  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Help with optimizing a query over hierarchical data  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-performance
Hi Claudio,
See my comments inline below.

Um... I think your problem is a misuse of CTE. Your CTE is building an
intermediate of several thousands of rows only to select a dozen
afterwards. You may want to consider a view or subquery, though I'm
not sure pg will be able to optimize much given your use of window
functions, which forces a materialization of that intermediate result.

The application requires that we find an element and it's neighbors within a sorted set at a given offset after filtering by category and status. In the examples provided, we need position 50000, 6 above, and 6 below. Is there a way do to that more efficiently without first determining the position of each element within the set using a window function? How would a subquery help?

The only solution I could come up with was to materialize the intermediate result with the CTE (since you don't know ahead of time how many objects match the status and category criteria) then use the window to include the position or index.

The only alternative that I can think of would be to materialize the elements of the set with an index on the lookup attributes and the attribute used to order them. That is, make what the CTE is doing materialized. Even in that case you will still need a window to determine the position after filtering but you won't have any joins.

I think you need to re-think your queries to be smarter about that.

As I mentioned above, we need the position and it's neighbors to support a feature. Do you have any suggestions as to how we might re-think them?

Thanks,
Damon



On Mon, Mar 3, 2014 at 1:52 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
Um... I think your problem is a misuse of CTE. Your CTE is building an
intermediate of several thousands of rows only to select a dozen
afterwards. You may want to consider a view or subquery, though I'm
not sure pg will be able to optimize much given your use of window
functions, which forces a materialization of that intermediate result.

I think you need to re-think your queries to be smarter about that.

On Mon, Mar 3, 2014 at 2:55 PM, Damon Snyder <damon@huddler-inc.com> wrote:
> Hi Claudio,
> Thanks for responding. Here is the explain (http://explain.depesz.com/s/W3W)
> for the ordering by meta container starting on line 192
> (https://gist.github.com/drsnyder/9277054#file-object-ordering-setup-sql-L192).
>
> Here is the explain (http://explain.depesz.com/s/d1O) for the ordering by
> score starting on line 192
> (https://gist.github.com/drsnyder/9277054#file-object-ordering-setup-sql-L216).
>
> Both of the explains were done with (ANALYZE, BUFFERS).
>
> Thanks for the suggestion regarding de-normalizing. I'll consider that
> approach for the score based query.
>
> I've also included the server config changes made from updates to
> postgresql.conf on the box that I'm testing on. See below.
>
> Thanks,
> Damon
>
>                                                   version
> --------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-3), 64-bit
> (1 row)
>
>              name             |  current_setting   |        source
> ------------------------------+--------------------+----------------------
>  application_name             | psql               | client
>  checkpoint_completion_target | 0.9                | configuration file
>  checkpoint_segments          | 16                 | configuration file
>  DateStyle                    | ISO, MDY           | configuration file
>  default_tablespace           | ssd2               | user
>  default_text_search_config   | pg_catalog.english | configuration file
>  effective_cache_size         | 5632MB             | configuration file
>  lc_messages                  | en_US.UTF-8        | configuration file
>  lc_monetary                  | en_US.UTF-8        | configuration file
>  lc_numeric                   | en_US.UTF-8        | configuration file
>  lc_time                      | en_US.UTF-8        | configuration file
>  listen_addresses             | *                  | configuration file
>  log_destination              | stderr             | configuration file
>  log_directory                | pg_log             | configuration file
>  log_filename                 | postgresql-%a.log  | configuration file
>  log_line_prefix              | %d %m %c %x:       | configuration file
>  log_min_duration_statement   | 500ms              | configuration file
>  log_min_error_statement      | error              | configuration file
>  log_min_messages             | error              | configuration file
>  log_rotation_age             | 1d                 | configuration file
>  log_rotation_size            | 0                  | configuration file
>  log_timezone                 | UTC                | configuration file
>  log_truncate_on_rotation     | on                 | configuration file
>  logging_collector            | on                 | configuration file
>  maintenance_work_mem         | 480MB              | configuration file
>  max_connections              | 80                 | configuration file
>  max_stack_depth              | 2MB                | environment variable
>  port                         | 5432               | command line
>  shared_buffers               | 1920MB             | configuration file
>  TimeZone                     | UTC                | configuration file
>  wal_buffers                  | 16MB               | configuration file
>  work_mem                     | 8MB                | configuration file
> (32 rows)
>
>
>
> On Sat, Mar 1, 2014 at 5:02 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Fri, Feb 28, 2014 at 5:01 PM, Damon Snyder <damon@huddler-inc.com>
>> wrote:
>> > The primary query that I'm trying to optimize executes in about 1600ms
>> > on my
>> > laptop and about 800ms on production-like hardware (more for the score
>> > version). My target is to get the data fetch down below 100ms if
>> > possible.
>>
>> Could you post some explain analyze of those particular queries?
>>
>> > If you have any suggestions it would be greatly appreciated. Am I
>> > missing
>> > something obvious? Is there a logically equivalent alternative that
>> > would be
>> > more efficient?
>>
>> I'd suggest de-normalizing a bit. For instance, why don't you put the
>> score right into the object? I'm sure the indirection is hurting.
>
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Venkata Balaji Nagothi
Дата:
Сообщение: Re: Query taking long time
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Help with optimizing a query over hierarchical data