Обсуждение: typical active table count?

Поиск
Список
Период
Сортировка

typical active table count?

От
Jeremy Schneider
Дата:
Question for other PostgreSQL users

On your moderately busy DB, how many different tables might receive at least one change/DML during a 10 second window?

10? 50? 100? More? Ballpark guess off the top of your head.

I'm in a discussion & there's questions about whether it's unusual to have more than 10 or so. The answer isn't clear to me.

Probably worthwhile to call out partitioning explicitly (ie. if this factors into an answer then mention that fact)

-Jeremy

--

Re: typical active table count?

От
Ben Chobot
Дата:
We certainly have databases where far more than 100 tables are updated within a 10 second period. Is there a specific concern you have?

Jeremy Schneider wrote on 6/27/23 9:01 AM:
Question for other PostgreSQL users

On your moderately busy DB, how many different tables might receive at least one change/DML during a 10 second window?

10? 50? 100? More? Ballpark guess off the top of your head.

I'm in a discussion & there's questions about whether it's unusual to have more than 10 or so. The answer isn't clear to me.

Probably worthwhile to call out partitioning explicitly (ie. if this factors into an answer then mention that fact)

-Jeremy

--

Re: typical active table count?

От
Jeremy Schneider
Дата:
On 6/27/23 9:32 AM, Ben Chobot wrote:
> We certainly have databases where far more than 100 tables are updated
> within a 10 second period. Is there a specific concern you have?
> 

Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.

One really nice thing about PostgreSQL is that two quick copies of
pg_stat_all_tables and you can easily see this sort of info.

If you have a database where more than 100 tables are updated within a
10 second period - this seems really uncommon to me - I'm very curious
about the workload.

For example:

1) Is the overall total number of tables for this database in the
thousands, 10s of thousands or 100s of thousands?

2) How many CPUs or cores does the server have?

3) Are you using partitions and counting each one? What's the number if
you count each partitioned table as a single table?

4) Would you characterize this database as SaaS, ie. many copies of a
similar schema?  Or is it one very large schema of many different tables?

-Jeremy


-- 
http://about.me/jeremy_schneider




Re: typical active table count?

От
Ron
Дата:
On 6/27/23 13:47, Jeremy Schneider wrote:
On 6/27/23 9:32 AM, Ben Chobot wrote:
We certainly have databases where far more than 100 tables are updated
within a 10 second period. Is there a specific concern you have?

Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.

One really nice thing about PostgreSQL is that two quick copies of
pg_stat_all_tables and you can easily see this sort of info.

If you have a database where more than 100 tables are updated within a
10 second period - this seems really uncommon to me - I'm very curious
about the workload.

100 tables updates just means possibly complicated schema, not necessarily high volume.

More important is the number of tables updated in a single transaction.  Are you updating (really modifying: inserts, updates, deletes) lots of rows in all 100 tables in a single transaction, or are multiple users performing one of 20 separate transactions, each modifying 5 tables?  Because that bakes a huge difference.

And honestly, 100 tables in 10 seconds is 10 tables/second.  If each gets one insert, that's a laughably slow transaction rate.  (Unless of course there's 85 indices per table, and foreign keys don't have supporting indices.)

For example:

1) Is the overall total number of tables for this database in the
thousands, 10s of thousands or 100s of thousands?

2) How many CPUs or cores does the server have?

3) Are you using partitions and counting each one? What's the number if
you count each partitioned table as a single table?

4) Would you characterize this database as SaaS, ie. many copies of a
similar schema?

Why not multiple databases with the same definition?

  Or is it one very large schema of many different tables?


--
Born in Arizona, moved to Babylonia.

Re: typical active table count?

От
Adrian Klaver
Дата:
On 6/27/23 11:47 AM, Jeremy Schneider wrote:
> On 6/27/23 9:32 AM, Ben Chobot wrote:
>> We certainly have databases where far more than 100 tables are updated
>> within a 10 second period. Is there a specific concern you have?
>>
> 
> Thank Ben, not a concern but I'm trying to better understand how common
> this might be. And I think sharing general statistics about how people
> use PostgreSQL is a great help to the developers who build and maintain it.

Given that Postgres is used up into the petabyte range, it is reasonable 
to assume that it handles dealing with large multiples of tables. This 
of course is based on sufficient hardware and proactive tuning. 
Personally I think you are getting into the range of premature 
optimization. There are so many ways to use Postgres that unless you 
provide a detailed example of how you want to use it the survey you seem 
to be requesting will likely have more cases that do not apply then 
those that do. To me the way forward is to create a plan for what you 
want accomplish and then ask specific questions based on that or build a 
test/dev setup that institutes the plan and deal with the diversions, if 
any,  from the plan.

> -Jeremy
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: typical active table count?

От
Jeremy Schneider
Дата:
On 6/27/23 12:08 PM, Ron wrote:
> On 6/27/23 13:47, Jeremy Schneider wrote:
>> On 6/27/23 9:32 AM, Ben Chobot wrote:
>>> We certainly have databases where far more than 100 tables are updated
>>> within a 10 second period. Is there a specific concern you have?
>>>
>> Thank Ben, not a concern but I'm trying to better understand how common
>> this might be. And I think sharing general statistics about how people
>> use PostgreSQL is a great help to the developers who build and maintain it.
>>
>> One really nice thing about PostgreSQL is that two quick copies of
>> pg_stat_all_tables and you can easily see this sort of info.
>>
>> If you have a database where more than 100 tables are updated within a
>> 10 second period - this seems really uncommon to me - I'm very curious
>> about the workload.
> 
> 100 tables updates just means /possibly complicated schema/, not
> necessarily high volume.
> 
> ...
> 
> And honestly, 100 tables in 10 seconds is 10 tables/second.  If each
> gets one insert, that's a laughably slow transaction rate.  (Unless of
> course there's 85 indices per table, and foreign keys don't have
> supporting indices.)

I don't think the math actually works this way on highly concurrent
systems. In fact, this morning I connected with a coworker who works on
Amazon fulfillment center DBs and there was almost no difference in the
number of tables with insert/update/delete regardless of whether you
looked at a 10 second window or a 2 second window. I was also able to
chat with another coworker at Amazon who got numbers from a couple of
their PG databases, and connected w one person on slack at a different
company who passed along numbers, and got a few emails from Oracle folks.

The numbers reported back to me ranged from 29 to over a hundred.
Obviously there are also lots of small databases behind wordpress
websites with much less activity, but I found this to be an interesting
measure of some respectably busy systems.

The original context was a conversation related to logical replication
of DB changes.

But then I got interested in the general question and topic - and
someone on the Oracle side mentioned system tables which is a really
good point that hadn't occurred to me yet. The original conversation was
concerned with user tables and not system ones, but there would be a
fair amount of ongoing system table activity too.

Besides partitioning, another interesting dimension of the conversation
has been thinking about different categories of workloads. For example:
SaaS or multitenant applications with many copies of a similar schema,
ISVs, ERPs, or large enterprise databases with lots of development
history. All of these categories can easily ramp up the counts.

I'm still interested in more data - if anyone reading this can grab a
couple snapshots of pg_stat_all_tables and report back numbers for a 10
second window and a 2 second window, that would be amazing!

-Jeremy


-- 
http://about.me/jeremy_schneider




Re: typical active table count?

От
Ben Chobot
Дата:
Jeremy Schneider wrote on 6/27/23 11:47 AM:
Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.

One really nice thing about PostgreSQL is that two quick copies of
pg_stat_all_tables and you can easily see this sort of info.

If you have a database where more than 100 tables are updated within a
10 second period - this seems really uncommon to me - I'm very curious
about the workload.

Well, in our case we have a SaaS model where a moderately complicated schema is replicated hundreds of times per db. It doesn't take much load to end up scattering writes across many tables (not to mention their indices). We do have table partitioning too, but it's a relatively small part of our schema and the partitioning is done by date, so we really only have one hot partition at a time. FWIW, most of our dbs have 32 cores.

All that aside, as others have said there are many reasonable ways to reach the threshold you have set.