Re: Query execution in Perl TAP tests needs work

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Query execution in Perl TAP tests needs work
Дата
Msg-id 4f3038c7-2b37-2625-c4c6-ebf7cbcb076d@dunslane.net
обсуждение исходный текст
Ответ на Query execution in Perl TAP tests needs work  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Query execution in Perl TAP tests needs work  (Thomas Munro <thomas.munro@gmail.com>)
Re: Query execution in Perl TAP tests needs work  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers


On 2023-08-28 Mo 01:29, Thomas Munro wrote:
Hi,

Every time we run a SQL query, we fork a new psql process and a new
cold backend process.  It's not free on Unix, and quite a lot worse on
Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
example.  It runs 272 subtests firing off a stream of queries, and
completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
CI.

Here are some timestamps I captured from CI by instrumenting various
Perl and C bits:

0.000s: IPC::Run starts
0.023s:   postmaster socket sees connection
0.025s:   postmaster has created child process
0.033s:     backend starts running main()
0.039s:     backend has reattached to shared memory
0.043s:     backend connection authorized message
0.046s:     backend has executed and logged query
0.070s: IPC::Run returns

I expected process creation to be slow on that OS, but it seems like
something happening at the end is even slower.  CI shows Windows
consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
that finishes in 2-3 minutes everywhere else with the same number of
CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
nearby?  It seems like there must be either a busy loop or a busted
sleep/wakeup... somewhere?  But even if there's a weird bug here
waiting to be discovered and fixed, I guess it'll always be too slow
at ~10ms per process spawned, with two processes to spawn, and it's
bad enough on Unix.

As an experiment, I hacked up a not-good-enough-to-share experiment
where $node->safe_psql() would automatically cache a BackgroundPsql
object and reuse it, and the times for that test dropped ~51 -> ~9s on
Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
(well it's certainly non-ideal the way I hacked it up anyway...).  I
suppose there are quite a few ways we could do better:

1.  Don't fork anything at all: open (and cache) a connection directly
from Perl.
1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
popular Perl xsub library?
1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
of some existing one.
2.  Use long-lived psql sessions.
2a.  Something building on BackgroundPsql.
2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
protocol that is more fun to talk to from Perl/machines?

In some other languages one can do FFI pretty easily so we could use
the in-tree libpq without extra dependencies:

import ctypes
libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
libpq.PQlibVersion()
170000

... but it seems you can't do either static C bindings or runtime FFI
from Perl without adding a new library/package dependency.  I'm not
much of a Perl hacker so I don't have any particular feeling.  What
would be best?

This message brought to you by the Lorax.

Thanks for raising this. Windows test times have bothered me for ages.

The standard perl DBI library has a connect_cached method. Of course we don't want to be dependent on it, especially if we might have changed libpq in what we're testing, and it would place a substantial new burden on testers like buildfarm owners.

I like the idea of using a pure perl pq implementation, not least because it could expand our ability to test things at the protocol level. Not sure how much work it would be. I'm willing to help if we want to go that way.

Yes you need an external library to use FFI in perl, but there's one that's pretty tiny. See <https://metacpan.org/pod/FFI::Library>. There is also FFI::Platypus, but it involves building a library. OTOH, that's the one that's available standard on my Fedora and Ubuntu systems. I haven't tried using either Maybe we could use some logic that would use the FFI interface if it's available, and fall back on current usage.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Is pg_regress --use-existing used by anyone or is it broken?
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: persist logical slots to disk during shutdown checkpoint