Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Поиск
Список
Период
Сортировка
От Marina Polyakova
Тема Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Дата
Msg-id c262e889315625e0fc0d77ca78fe2eac@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers
On 11-09-2018 18:29, Fabien COELHO wrote:
> Hello Marina,
> 
>> Hmm, but we can say the same for serialization or deadlock errors that 
>> were not retried (the client test code itself could not run correctly 
>> or the SQL sent was somehow wrong, which is also the client's fault), 
>> can't we?
> 
> I think not.
> 
> If a client asks for something "legal", but some other client in
> parallel happens to make an incompatible change which result in a
> serialization or deadlock error, the clients are not responsible for
> the raised errors, it is just that they happen to ask for something
> incompatible at the same time. So there is no user error per se, but
> the server is reporting its (temporary) inability to process what was
> asked for. For these errors, retrying is fine. If the client was
> alone, there would be no such errors, you cannot deadlock with
> yourself. This is really an isolation issue linked to parallel
> execution.

You can get other errors that cannot happen for only one client if you 
use shell commands in meta commands:

starting vacuum...end.
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 6.953 ms
tps = 287.630161 (including connections establishing)
tps = 303.232242 (excluding connections establishing)
statement latencies in milliseconds and failures:
          1.636           0  BEGIN;
          1.497           0  \setshell var mkdir my_directory && echo 1
          0.007           0  \sleep 1 us
          1.465           0  \setshell var rmdir my_directory && echo 1
          1.622           0  END;

starting vacuum...end.
mkdir: cannot create directory ‘my_directory’: File exists
mkdir: could not read result of shell command
client 1 got an error in command 1 (setshell) of script 0; execution of 
meta-command failed
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 19/20
number of failures: 1 (5.000%)
number of meta-command failures: 1 (5.000%)
maximum number of tries: 1
latency average = 11.782 ms (including failures)
tps = 161.269033 (including connections establishing)
tps = 167.733278 (excluding connections establishing)
statement latencies in milliseconds and failures:
          2.731           0  BEGIN;
          2.909           1  \setshell var mkdir my_directory && echo 1
          0.231           0  \sleep 1 us
          2.366           0  \setshell var rmdir my_directory && echo 1
          2.664           0  END;

Or if you use untrusted procedural languages in SQL expressions (see the 
used file in the attachments):

starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
client 1 got an error in command 0 (SQL) of script 0; ERROR:  could not 
create the directory "my_directory": File exists at line 3.
CONTEXT:  PL/Perl anonymous code block

client 1 got an error in command 0 (SQL) of script 0; ERROR:  could not 
create the directory "my_directory": File exists at line 3.
CONTEXT:  PL/Perl anonymous code block

transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 18/20
number of failures: 2 (10.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 2 (10.000%)
maximum number of tries: 1
latency average = 3.282 ms (including failures)
tps = 548.437196 (including connections establishing)
tps = 637.662753 (excluding connections establishing)
statement latencies in milliseconds and failures:
          1.566           2  DO $$

starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 2.760 ms
tps = 724.746078 (including connections establishing)
tps = 853.131985 (excluding connections establishing)
statement latencies in milliseconds and failures:
          1.893           0  DO $$

Or if you try to create a function and perhaps replace an existing one:

starting vacuum...end.
client 0 got an error in command 0 (SQL) of script 0; ERROR:  duplicate 
key value violates unique constraint "pg_proc_proname_args_nsp_index"
DETAIL:  Key (proname, proargtypes, pronamespace)=(my_function, , 2200) 
already exists.

client 0 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR:  tuple 
concurrently updated

transaction type: pgbench_create_function.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/20
number of failures: 10 (50.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 10 (50.000%)
maximum number of tries: 1
latency average = 82.881 ms (including failures)
tps = 12.065492 (including connections establishing)
tps = 12.092216 (excluding connections establishing)
statement latencies in milliseconds and failures:
         82.549          10  CREATE OR REPLACE FUNCTION my_function() 
RETURNS integer AS 'select 1;' LANGUAGE SQL;

>> Why not handle client errors that can occur (but they may also not 
>> occur) the same way? (For example, always abort the client, or 
>> conversely do not make aborts in these cases.) Here's an example of 
>> such error:
> 
>> client 5 got an error in command 1 (SQL) of script 0; ERROR:  division 
>> by zero
> 
> This is an interesting case. For me we must stop the script because
> the client is asking for something "stupid", and retrying the same
> won't change the outcome, the division will still be by zero. It is
> the client responsability not to ask for something stupid, the bench
> script is buggy, it should not submit illegal SQL queries. This is
> quite different from submitting something legal which happens to fail.
> ...
>>> I'm not sure that having "--debug" implying this option
>>> is useful: As there are two distinct options, the user may be allowed
>>> to trigger one or the other as they wish?
>> 
>> I'm not sure that the main debugging output will give a good clue of 
>> what's happened without full messages about errors, retries and 
>> failures...
> 
> I'm more argumenting about letting the user decide what they want.
> 
>> These lines are quite long - do you suggest to wrap them this way?
> 
> Sure, if it is too long, then wrap.

Ok!

>>> Function getTransactionStatus name does not seem to correspond fully 
>>> to what the function does. There is a passthru case which should be 
>>> either avoided or clearly commented.
>> 
>> I don't quite understand you - do you mean that in fact this function 
>> finds out whether we are in a (failed) transaction block or not? Or do 
>> you mean that the case of PQTRANS_INTRANS is also ok?...
> 
> The former: although the function is named "getTransactionStatus", it
> does not really return the "status" of the transaction (aka
> PQstatus()?).

Thank you, I'll think how to improve it. Perhaps the name 
checkTransactionStatus will be better...

>>> I'd insist in a comment that "cnt" does not include "skipped" 
>>> transactions
>>> (anymore).
>> 
>> If you mean CState.cnt I'm not sure if this is practically useful 
>> because the code uses only the sum of all client transactions 
>> including skipped and failed... Maybe we can rename this field to 
>> nxacts or total_cnt?
> 
> I'm fine with renaming the field if it makes thinks clearer. They are
> all counters, so naming them "cnt" or "total_cnt" does not help much.
> Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a comment 
to this field. But IMO we have the same problem (They are all counters, 
so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt 
which cannot be named in the same way because it also includes skipped 
and failed transactions.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christoph Berg
Дата:
Сообщение: [patch] Support LLVM 7
Следующее
От: Arthur Zakirov
Дата:
Сообщение: Re: simplify index tuple descriptor initialization