Re: pgbench - allow to create partitioned tables
| От | Fabien COELHO | 
|---|---|
| Тема | Re: pgbench - allow to create partitioned tables | 
| Дата | |
| Msg-id | alpine.DEB.2.21.1909100818130.1895@lancre обсуждение исходный текст | 
| Ответ на | Re: pgbench - allow to create partitioned tables (Amit Kapila <amit.kapila16@gmail.com>) | 
| Ответы | Re: pgbench - allow to create partitioned tables Re: pgbench - allow to create partitioned tables | 
| Список | pgsql-hackers | 
Hello Amit,
Thanks for the feedback.
> + case 11: /* partitions */
> + initialization_option_set = true;
> + partitions = atoi(optarg);
> + if (partitions < 0)
> + {
> + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> + optarg);
> + exit(1);
> + }
> + break;
>
> Is there a reason why we treat "partitions = 0" as a valid value?
Yes. It is an explicit "do not create partitioned tables", which differ 
from 1 which says "create a partitionned table with just one partition".
> Also, shouldn't we keep some max limit for this parameter as well?
I do not think so. If someone wants to test how terrible it is to use 
100000 partitions, we should not prevent it.
> Forex. how realistic it will be if the user gives the value of
> partitions the same or greater than the number of rows in
> pgbench_accounts table?
Although I agree that it does not make much sense, for testing purposes 
why not, to test overheads in critical cases for instance.
> I understand it is not sensible to give such a value, but I guess the 
> API should behave sanely in such cases as well.
Yep, it should work.
> I am not sure what will be the good max value for it, but I
> think there should be one.
I disagree. Pgbench is a tool for testing performance for given 
parameters. If postgres accepts a parameter there is no reason why pgbench 
should reject it.
> @@ -3625,6 +3644,7 @@ initCreateTables(PGconn *con)
>  const char *bigcols; /* column decls if accountIDs are 64 bits */
>  int declare_fillfactor;
>  };
> +
>  static const struct ddlinfo DDLs[] = {
>
> Spurious line change.
Indeed.
> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
> +    "  --partition-method=(range|hash)\n"
> +    "                           partition account table with this
> method (default: range)\n"
>
> Refer complete table name like pgbench_accounts instead of just account. 
> It will be clear and in sync with what we display in some other options 
> like --skip-some-updates.
Ok.
> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
>
> /defaults/default.
Ok.
> I think we should print the information about partitions in
> printResults.  It can help users while analyzing results.
Hmmm. Why not, with some hocus-pocus to get the information out of 
pg_catalog, and trying to fail gracefully so that if pgbench is run 
against a no partitioning-support version.
> *
> +enum { PART_NONE, PART_RANGE, PART_HASH }
> + partition_method = PART_NONE;
> +
>
> I think it is better to follow the style of QueryMode enum by using
> typedef here, that will make look code in sync with nearby code.
Hmmm. Why not. This means inventing a used-once type name for 
partition_method. My great creativity lead to partition_method_t.
> *
> - int i;
>
>  fprintf(stderr, "creating tables...\n");
>
> - for (i = 0; i < lengthof(DDLs); i++)
> + for (int i = 0; i < lengthof(DDLs); i++)
>
> This is unnecessary change as far as this patch is concerned.  I
> understand there is no problem in writing either way, but let's not
> change the coding pattern here as part of this patch.
The reason I did that is that I had a stupid bug in a development version 
which was due to an accidental reuse of this index, which would have been 
prevented by this declaration style. Removed anyway.
> + if (partitions >= 1)
> + {
> + int64 part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
> + char ff[64];
> + ff[0] = '\0';
> + append_fillfactor(ff, sizeof(ff));
> +
> + fprintf(stderr, "creating %d partitions...\n", partitions);
> +
> + for (int p = 1; p <= partitions; p++)
> + {
> + char query[256];
> +
> + if (partition_method == PART_RANGE)
> + {
>
> part_size can be defined inside "if (partition_method == PART_RANGE)"
> as it is used here.
I just wanted to avoid recomputing the value in the loop, but indeed it 
may be computed needlessly. Moved.
> In general, this part of the code can use some comments.
Ok.
Attached an updated version.
-- 
Fabien.
		
	Вложения
В списке pgsql-hackers по дате отправления: