Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

Поиск
Список
Период
Сортировка
От Antonin Houska
Тема Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Дата
Msg-id 1960.1657781484@antos.home
обсуждение исходный текст
Ответ на Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-hackers
Shouldn't the patch status be set to "Waiting on Author"?

(I was curious if this is a patch that I can review.)

Julien Rouhaud <rjuju123@gmail.com> wrote:

> On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote:
> > >    Can you describe how it's kept in sync, and how it makes sure that the property
> > >    is maintained over restart / gc?  I don't see any change in the code for the
> > >    2nd part so I don't see how it could work (and after a quick test it indeed
> > >    doesn't).
> >
> > There is a routine called qtext_hash_sync which removed all entries from
> > the qtext_hash and reloads it will all the query ids from pgss_hash.
> > [...]
> > All the points when it's called, an exclusive lock is held.this allows or syncing all
> > The present queryid's in pgss_hash with qtext_hash.
>
> So your approach is to let the current gc / file loading behavior happen as
> before and construct your mapping hash using the resulting query text / offset
> info.  That can't work.
>
> > >    2nd part so I don't see how it could work (and after a quick test it indeed
> > >    doesn't).
> >
> > Can you tell me what test you used to determine it is not in sync?
>
> What test did you use to determine it is in sync?  Have you checked how the gc/
> file loading actually work?
>
> In my case I just checked the size of the query text file after running some
> script that makes sure that there are the same few queries ran by multiple
> different roles, then:
>
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
> pg_ctl restart
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B
>
> > >    Can you share more details on the benchmarks you did?  Did you also run
> > >    benchmark on workloads that induce entry eviction, with and without need for
> > >    gc?  Eviction in pgss is already very expensive, and making things worse just
> > >    to save a bit of disk space doesn't seem like a good compromise.
> >
> > Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
> > The script and results. But here is a summary:
> > On a EC2 r5.2xlarge. The benchmark I performed is:
> > 1. create 10k tables
> > 2. create 5 users
> > 3. run a pgbench script that performs per transaction a select on
> > A randomly chosen table for each of the 5 users.
> > 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
> > and one test with a larger pg_stat_statements.max = 10000.
>
> But you wrote:
>
> > ##################################
> > ## pg_stat_statements.max = 15000
> > ##################################
>
> So which one is it?
>
> >
> > So 10-15% is not accurate. I originally tested on a less powered machine. For this
> > Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger sized
> > pg_stat_statements.max is used and less gc/deallocations.
> > Both tests show a drop in gc/deallocations and a net increase
> > In tps. Less GC makes sense since the external file has less duplicate SQLs.
>
> On the other hand you're rebuilding the new query_offset hashtable every time
> there's an entry eviction, which seems quite expensive.
>
> Also, as I mentioned you didn't change any of the heuristic for
> need_gc_qtexts().  So if the query texts are indeed deduplicated, doesn't it
> mean that  gc  will artificially
> be called less often?  The wanted target of "50% bloat" will become "50%
> bloat assuming no deduplication is done" and the average query text file size
> will stay the same whether the query texts are deduplicated or not.
>
> I'm wondering the improvements you see due to the patch or simply due to
> artificially calling gc less often?  What are the results if instead of using
> vanilla pg_stat_statements you patch it to perform roughly the same number of
> gc as your version does?
>
> Also your benchmark workload is very friendly with your feature, what are the
> results with other workloads?  Having the results with query texts that aren't
> artificially long would be interesting for instance, after fixing the problems
> mentioned previously.
>
> Also, you said that if you run that benchmark with a single user you don't see
> any regression.   I don't see how rebuilding an extra hashtable in
> entry_dealloc(), so when holding an exclusive lwlock, while not saving any
> other work elsewhere could be free?
>
> Looking at the script, you have:
> echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \
>
> Is that really necessary?  Couldn't you upgrade the gc message to a higher
> level for your benchmark need, or expose some new counter in
> pg_stat_statements_info maybe?  Have you done the benchmark using a debug build
> or normal build?
>
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: i.e. and e.g.
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity