Обсуждение: [PATCH] pg_stat_toast
Hello -hackers! Please have a look at the attached patch, which implements some statistics for TOAST. The idea (and patch) have been lurking here for quite a while now, so I decided to dust it off, rebase it to HEAD and send it out for review today. A big shoutout to Georgios Kokolatos, who gave me a crash course in PG hacking, some very useful hints and valueable feedback early this year. I'd like to get some feedback about the general idea, approach, naming etc. before refining this further. I'm not a C person and I s**k at git, so please be kind with me! ;-) Also, I'm not subscribed here, so a CC would be much appreciated! Why gather TOAST statistics? ============================ TOAST is transparent and opaque at the same time. Whilst we know that it's there and we know _that_ it works, we cannot generally tell _how well_ it works. What we can't answer (easily) are questions like e.g. - how many datums have been externalized? - how many datums have been compressed? - how often has a compression failed (resulted in no space saving)? - how effective is the compression algorithm used on a column? - how much time did the DB spend compressing/decompressing TOAST values? The patch adds some functionality that will eventually be able to answer these (and probably more) questions. Currently, #1 - #4 can be answered based on the view contained in "pg_stats_toast.sql": postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); postgres=# INSERT INTO test SELECT i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM generate_series(0,100000) x(i); postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; -[ RECORD 1 ]--------+---------- schemaname | public reloid | 16829 attnum | 2 relname | test attname | lz4 externalizations | 0 compressions | 100001 compressionsuccesses | 100001 compressionsizesum | 6299710 originalsizesum | 320403204 -[ RECORD 2 ]--------+---------- schemaname | public reloid | 16829 attnum | 3 relname | test attname | std externalizations | 0 compressions | 100001 compressionsuccesses | 100001 compressionsizesum | 8198819 originalsizesum | 320403204 Implementation ============== I added some callbacks in backend/access/table/toast_helper.c to "pgstat_report_toast_activity" in backend/postmaster/pgstat.c. The latter (and the other additions there) are essentially 1:1 copies of the function statistics. Those were the perfect template, as IMHO the TOAST activities (well, what we're interested in at least) are very much comparable to function calls: a) It doesn't really matter if the TOASTed data was committed, as "the damage is done" (i.e. CPU cycles were used) anyway b) The information can (thus/best) be stored on DB level, no need to touch the relation or attribute statistics I didn't find anything that could have been used as a hash key, so the PgStat_StatToastEntry uses the shiny new PgStat_BackendAttrIdentifier (containing relid Oid, attr int). For persisting in the statsfile, I chose the identifier 'O' (as 'T' was taken). What's working? =============== - Gathering of TOAST externalization and compression events - collecting the sizes before and after compression - persisting in statsfile - not breaking "make check" - not crashing anything (afaict) What's missing (yet)? =============== - proper definition of the "pgstat_track_toast" GUC - Gathering of times (for compression [and decompression?]) - improve "pg_stat_toast" view and include it in the catalog - documentation (obviously) - proper naming (of e.g. the hash key type, functions, view columns etc.) - would it be necessary to implement overflow protection for the size & time sums? Thanks in advance & best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Hi, On 2021-12-12 17:20:58 +0100, Gunnar "Nick" Bluth wrote: > Please have a look at the attached patch, which implements some statistics > for TOAST. > > The idea (and patch) have been lurking here for quite a while now, so I > decided to dust it off, rebase it to HEAD and send it out for review today. > > A big shoutout to Georgios Kokolatos, who gave me a crash course in PG > hacking, some very useful hints and valueable feedback early this year. > > I'd like to get some feedback about the general idea, approach, naming etc. > before refining this further. I'm worried about the additional overhead this might impose. For some workload it'll substantially increase the amount of stats traffic. Have you tried to qualify the overheads? Both in stats size and in stats management overhead? Greetings, Andres Freund
Am 12.12.21 um 22:52 schrieb Andres Freund: > Hi, > > On 2021-12-12 17:20:58 +0100, Gunnar "Nick" Bluth wrote: >> Please have a look at the attached patch, which implements some statistics >> for TOAST. >> >> The idea (and patch) have been lurking here for quite a while now, so I >> decided to dust it off, rebase it to HEAD and send it out for review today. >> >> A big shoutout to Georgios Kokolatos, who gave me a crash course in PG >> hacking, some very useful hints and valueable feedback early this year. >> >> I'd like to get some feedback about the general idea, approach, naming etc. >> before refining this further. > > I'm worried about the additional overhead this might impose. For some workload > it'll substantially increase the amount of stats traffic. Have you tried to > qualify the overheads? Both in stats size and in stats management overhead? I'd lie if I claimed so... Regarding stats size; it adds one PgStat_BackendToastEntry (PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or something in that ballpark) per TOASTable attribute, I can't see that make any system break sweat ;-) A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So at least the call overhead seems to be neglectible. Obviously, this was really a quick run and doesn't reflect real life. I'll have the machine run some reasonable tests asap, also looking at stat size, of course! Best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Hi, On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote: > Regarding stats size; it adds one PgStat_BackendToastEntry > (PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or > something in that ballpark) per TOASTable attribute, I can't see that make > any system break sweat ;-) That's actually a lot. The problem is that all the stats data for a database is loaded into private memory for each connection to that database, and that the stats collector regularly writes out all the stats data for a database. > A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and > without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So > at least the call overhead seems to be neglectible. Yea, you'd probably need a few more tables and a few more connections for it to have a chance of mattering meaningfully. Greetings, Andres Freund
Am 13.12.21 um 00:41 schrieb Andres Freund: > Hi, > > On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote: >> Regarding stats size; it adds one PgStat_BackendToastEntry >> (PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or >> something in that ballpark) per TOASTable attribute, I can't see that make >> any system break sweat ;-) > > That's actually a lot. The problem is that all the stats data for a database > is loaded into private memory for each connection to that database, and that > the stats collector regularly writes out all the stats data for a database. My understanding is that the stats file is only pulled into the backend when the SQL functions (for the view) are used (see "pgstat_fetch_stat_toastentry()"). Otherwise, a backend just initializes an empty hash, right? Of which I reduced the initial size from 512 to 32 for the below tests (I guess the "truth" lies somewhere in between here), along with making the GUC parameter an actual GUC parameter and disabling the elog() calls I scattered all over the place ;-) for the v0.2 patch attached. >> A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and >> without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So >> at least the call overhead seems to be neglectible. > > Yea, you'd probably need a few more tables and a few more connections for it > to have a chance of mattering meaningfully. So, I went ahead and * set up 2 clusters with "track_toast" off and on resp. * created 100 DBs * each with 100 tables * with one TOASTable column in each table * filling those with 32000 bytes of md5 garbage These clusters sum up to ~ 2GB each, so differences should _start to_ show up, I reckon. $ du -s testdb* 2161208 testdb 2163240 testdb_tracking $ du -s testdb*/pg_stat 4448 testdb/pg_stat 4856 testdb_tracking/pg_stat The db_*.stat files are 42839 vs. 48767 bytes each (so confirmed, the differences do show). No idea if this is telling us anything, tbth, but the /proc/<pid>/smaps_rollup for a backend serving one of these DBs look like this ("0 kB" lines omitted): track_toast OFF =============== Rss: 12428 kB Pss: 5122 kB Pss_Anon: 1310 kB Pss_File: 2014 kB Pss_Shmem: 1797 kB Shared_Clean: 5864 kB Shared_Dirty: 3500 kB Private_Clean: 1088 kB Private_Dirty: 1976 kB Referenced: 11696 kB Anonymous: 2120 kB track_toast ON (view not called yet): ===================================== Rss: 12300 kB Pss: 4883 kB Pss_Anon: 1309 kB Pss_File: 1888 kB Pss_Shmem: 1685 kB Shared_Clean: 6040 kB Shared_Dirty: 3468 kB Private_Clean: 896 kB Private_Dirty: 1896 kB Referenced: 11572 kB Anonymous: 2116 kB track_toast ON (view called): ============================= Rss: 15408 kB Pss: 7482 kB Pss_Anon: 2083 kB Pss_File: 2572 kB Pss_Shmem: 2826 kB Shared_Clean: 6616 kB Shared_Dirty: 3532 kB Private_Clean: 1472 kB Private_Dirty: 3788 kB Referenced: 14704 kB Anonymous: 2884 kB That backend used some memory for displaying the result too, of course... A backend with just two TOAST columns in one table (filled with 1.000.001 rows) looks like this before and after calling the "pg_stat_toast" view: Rss: 146208 kB Pss: 116181 kB Pss_Anon: 2050 kB Pss_File: 2787 kB Pss_Shmem: 111342 kB Shared_Clean: 6636 kB Shared_Dirty: 45928 kB Private_Clean: 1664 kB Private_Dirty: 91980 kB Referenced: 145532 kB Anonymous: 2844 kB Rss: 147736 kB Pss: 103296 kB Pss_Anon: 2430 kB Pss_File: 3147 kB Pss_Shmem: 97718 kB Shared_Clean: 6992 kB Shared_Dirty: 74056 kB Private_Clean: 1984 kB Private_Dirty: 64704 kB Referenced: 147092 kB Anonymous: 3224 kB After creating 10.000 more tables (view shows 10.007 rows now), before and after calling "TABLE pg_stat_toast": Rss: 13816 kB Pss: 4898 kB Pss_Anon: 1314 kB Pss_File: 1755 kB Pss_Shmem: 1829 kB Shared_Clean: 5972 kB Shared_Dirty: 5760 kB Private_Clean: 832 kB Private_Dirty: 1252 kB Referenced: 13088 kB Anonymous: 2124 kB Rss: 126816 kB Pss: 55213 kB Pss_Anon: 5383 kB Pss_File: 2615 kB Pss_Shmem: 47215 kB Shared_Clean: 6460 kB Shared_Dirty: 113028 kB Private_Clean: 1600 kB Private_Dirty: 5728 kB Referenced: 126112 kB Anonymous: 6184 kB That DB's stat-file is now 4.119.254 bytes (3.547.439 without track_toast). After VACUUM ANALYZE, the size goes up to 5.919.812 (5.348.768). The "100 tables" DBs' go to 97.910 (91.868) bytes. In total: $ du -s testdb*/pg_stat 14508 testdb/pg_stat 15472 testdb_tracking/pg_stat IMHO, this would be ok to at least enable temporarily (e.g. to find out if MAIN or EXTERNAL storage/LZ4 compression would be ok/better for some columns). All the best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Dear Gunnar, > postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); > postgres=# INSERT INTO test SELECT > i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM > generate_series(0,100000) x(i); > postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; > -[ RECORD 1 ]--------+---------- > schemaname | public > reloid | 16829 > attnum | 2 > relname | test > attname | lz4 > externalizations | 0 > compressions | 100001 > compressionsuccesses | 100001 > compressionsizesum | 6299710 > originalsizesum | 320403204 > -[ RECORD 2 ]--------+---------- > schemaname | public > reloid | 16829 > attnum | 3 > relname | test > attname | std > externalizations | 0 > compressions | 100001 > compressionsuccesses | 100001 > compressionsizesum | 8198819 > originalsizesum | 320403204 I'm not sure about TOAST, but currently compressions are configurable: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294 How about adding a new attribute "method" to pg_stat_toast? ToastAttrInfo *attr->tai_compression represents how compress the data, so I think it's easy to add. Or, is it not needed because pg_attr has information? Best Regards, Hayato Kuroda FUJITSU LIMITED
Am 20.12.2021 um 04:20 schrieb kuroda.hayato@fujitsu.com: > Dear Gunnar, Hi Kuroda-San! >> postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); >> postgres=# INSERT INTO test SELECT >> i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM >> generate_series(0,100000) x(i); >> postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; >> -[ RECORD 1 ]--------+---------- >> schemaname | public >> reloid | 16829 >> attnum | 2 >> relname | test >> attname | lz4 >> externalizations | 0 >> compressions | 100001 >> compressionsuccesses | 100001 >> compressionsizesum | 6299710 >> originalsizesum | 320403204 >> -[ RECORD 2 ]--------+---------- >> schemaname | public >> reloid | 16829 >> attnum | 3 >> relname | test >> attname | std >> externalizations | 0 >> compressions | 100001 >> compressionsuccesses | 100001 >> compressionsizesum | 8198819 >> originalsizesum | 320403204 > > I'm not sure about TOAST, but currently compressions are configurable: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294 > > How about adding a new attribute "method" to pg_stat_toast? > ToastAttrInfo *attr->tai_compression represents how compress the data, > so I think it's easy to add. > Or, is it not needed because pg_attr has information? That information could certainly be included in the view, grabbing the information from pg_attribute.attcompression. It probably should! I guess the next step will be to include that view in the catalog anyway, so I'll do that next. Thx for the feedback! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Am 12.12.21 um 17:20 schrieb Gunnar "Nick" Bluth: > Hello -hackers! > > Please have a look at the attached patch, which implements some > statistics for TOAST. The attached v0.3 includes * a proper GUC "track_toast" incl. postgresql.conf.sample line * gathering timing information * the system view "pg_stat_toast" * naming improvements more than welcome! * columns "storagemethod" and "compressmethod" added as per Hayato Kuroda's suggestion * documentation (pointing out the potential impacts as per Andres Freund's reservations) Any hints on how to write meaningful tests would be much appreciated! I.e., will a test INSERTing some long text to a column raises the view counts and "compressedsize" is smaller than "originalsize" suffice?!? Cheers & best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 21.12.21 um 13:51 schrieb kuroda.hayato@fujitsu.com: > Dear Gunnar, Hayato-san, all, thanks for the feedback! >> * gathering timing information > > Here is a minor comment: > I'm not sure when we should start to measure time. > If you want to record time spent for compressing, toast_compress_datum() should be > sandwiched by INSTR_TIME_SET_CURRENT() and caclurate the time duration. > Currently time_spent is calcurated in the pgstat_report_toast_activity(), > but this have a systematic error. > If you want to record time spent for inserting/updating, however, > I think we should start measuring at the upper layer, maybe heap_toast_insert_or_update(). Yes, both toast_compress_datum() and toast_save_datum() are sandwiched the way you mentioned, as that's exactly what we want to measure (time spent doing compression and/or externalizing data). Implementation-wise, I (again) took "track_functions" as a template there, assuming that jumping into pgstat_report_toast_activity() and only then checking if "track_toast = on" isn't too costly (we call pgstat_init_function_usage() and pgstat_end_function_usage() a _lot_). I did miss though that INSTR_TIME_SET_CURRENT(time_spent); should be called right after entering pgstat_report_toast_activity(), as that might need additional clock ticks for setting up the hash etc. That's fixed now. What I can't assess is the cost of the unconditional call to INSTR_TIME_SET_CURRENT(start_time) in both toast_tuple_try_compression() and toast_tuple_externalize(). Would it be wise (cheaper) to add a check like if (pgstat_track_toast) before querying the system clock? >> Any hints on how to write meaningful tests would be much appreciated! >> I.e., will a test > > I searched hints from PG sources, and I thought that modules/ subdirectory can be used. > Following is the example: > https://github.com/postgres/postgres/tree/master/src/test/modules/commit_ts > > I attached a patch to create a new test. Could you rewrite the sql and the output file? Thanks for the kickstart ;-) Some tests (as meaningful as they may get, I'm afraid) are now in src/test/modules/track_toast/. "make check-world" executes them successfully, although only after I introduced a "SELECT pg_sleep(1);" to them. pg_stat_toast_v0.4.patch attached. Best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth: > pg_stat_toast_v0.4.patch attached. Aaaand I attached a former version of the patch file... sorry, I'm kind of struggling with all the squashing/rebasing... -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 03.01.22 um 17:50 schrieb Justin Pryzby: > On Mon, Jan 03, 2022 at 05:00:45PM +0100, Gunnar "Nick" Bluth wrote: >> Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth: >> >>> pg_stat_toast_v0.4.patch attached. > > Note that the cfbot says this fails under windows Thanks for the heads up! > > http://cfbot.cputube.org/gunnar-quotnickquot-bluth.html > ... > [16:47:05.347] Could not determine contrib module type for track_toast > [16:47:05.347] at src/tools/msvc/mkvcbuild.pl line 31. Not only Window$... as it turns out, one of the checks was pretty bogus. Kicked that one and instead wrote two (hopefully) meaningful ones. Also, I moved the tests to regress/, as they're not really for a module anyway. Let's see how this fares! >> Aaaand I attached a former version of the patch file... sorry, I'm kind of >> struggling with all the squashing/rebasing... > > Soon you will think this is fun :) As long as you're happy with plain patches like the attached one, I may ;-) All the best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > Am 03.01.22 um 17:50 schrieb Justin Pryzby: > > Soon you will think this is fun :) > > As long as you're happy with plain patches like the attached one, I may ;-) Well, with a zero-byte patch, not very much ... BTW why do you number with a "0." prefix? It could just be "4" and "5" and so on. There's no value in two-part version numbers for patches. Also, may I suggest that "git format-patch -vN" with varying N is an easier way to generate patches to attach? -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Am 03.01.22 um 19:30 schrieb Alvaro Herrera: > On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > >> Am 03.01.22 um 17:50 schrieb Justin Pryzby: > >>> Soon you will think this is fun :) >> >> As long as you're happy with plain patches like the attached one, I may ;-) > > Well, with a zero-byte patch, not very much ... D'oh!!!! > BTW why do you number with a "0." prefix? It could just be "4" and "5" > and so on. There's no value in two-part version numbers for patches. > Also, may I suggest that "git format-patch -vN" with varying N is an > easier way to generate patches to attach? Not when you have a metric ton of commits in the history... I'll hopefully find a way to start over soon :/ 9:38 $ git format-patch PGDG/master -v5 -o .. ../v5-0001-ping-pong-of-thougths.patch ../v5-0002-ping-pong-of-thougths.patch ../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch ../v5-0004-adds-some-groundwork-for-pg_stat_toast-to-pgstat..patch ../v5-0005-fixes-wrong-type-for-pgstat_track_toast-GUC.patch ../v5-0006-introduces-PgStat_BackendAttrIdentifier-OID-attr-.patch ../v5-0007-implements-and-calls-pgstat_report_toast_activity.patch ../v5-0008-Revert-adds-some-debugging-messages-in-toast_help.patch ../v5-0009-adds-more-detail-to-logging.patch ../v5-0010-adds-toastactivity-to-table-stats-and-many-helper.patch ../v5-0011-fixes-missed-replacement-in-comment.patch ../v5-0012-adds-SQL-support-functions.patch ../v5-0013-Add-SQL-functions.patch ../v5-0014-reset-to-HEAD.patch ../v5-0015-makes-DEBUG2-messages-more-precise.patch ../v5-0016-adds-timing-information-to-stats-and-view.patch ../v5-0017-adds-a-basic-set-of-tests.patch ../v5-0018-adds-a-basic-set-of-tests.patch ../v5-0019-chooses-a-PGSTAT_TOAST_HASH_SIZE-of-64-changes-ha.patch ../v5-0020-removes-whitespace-trash.patch ../v5-0021-returns-to-PGDG-master-.gitignore.patch ../v5-0022-pg_stat_toast_v0.5.patch ../v5-0023-moves-tests-to-regress.patch But alas! INT versioning it is from now on! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > 9:38 $ git format-patch PGDG/master -v5 -o .. > ../v5-0001-ping-pong-of-thougths.patch > ../v5-0002-ping-pong-of-thougths.patch > ../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch > ... Hmm, in such cases I would suggest to create a separate branch and then "git merge --squash" for submission. You can keep your development branch separate, with other merges if you want. I've found this to be easier to manage, though I don't always follow that workflow myself. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun)
Am 03.01.22 um 20:11 schrieb Alvaro Herrera: > On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > >> 9:38 $ git format-patch PGDG/master -v5 -o .. >> ../v5-0001-ping-pong-of-thougths.patch >> ../v5-0002-ping-pong-of-thougths.patch >> ../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch >> ... > > Hmm, in such cases I would suggest to create a separate branch and then > "git merge --squash" for submission. You can keep your development > branch separate, with other merges if you want. > > I've found this to be easier to manage, though I don't always follow > that workflow myself. > Using --stdout does help ;-) I wonder why "track_toast.sql" test fails on Windows (with "ERROR: compression method lz4 not supported"), but "compression.sql" doesn't. Any hints? Anyway, I shamelessly copied "wait_for_stats()" from the "stats.sql" file and the tests _should_ now work at least on the platforms with lz4. v6 attached! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Overall I think this is a good feature to have; assessing the need for compression is important for tuning, so +1 for the goal of the patch. I didn't look into the patch carefully, but here are some minor comments: On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > @@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext *ttc, int attribute) > Datum *value = &ttc->ttc_values[attribute]; > Datum new_value; > ToastAttrInfo *attr = &ttc->ttc_attr[attribute]; > + instr_time start_time; > > + INSTR_TIME_SET_CURRENT(start_time); > new_value = toast_compress_datum(*value, attr->tai_compression); > > if (DatumGetPointer(new_value) != NULL) Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an expensive syscall. Find a way to only do it if the feature is enabled. This also suggests that perhaps it'd be a good idea to allow this to be enabled for specific tables only, rather than system-wide. (Maybe in order for stats to be collected, the user should have to both set the GUC option *and* set a per-table option? Not sure.) > @@ -82,10 +82,12 @@ typedef enum StatMsgType > PGSTAT_MTYPE_DEADLOCK, > PGSTAT_MTYPE_CHECKSUMFAILURE, > PGSTAT_MTYPE_REPLSLOT, > + PGSTAT_MTYPE_CONNECTION, I think this new enum value doesn't belong in this patch. > +/* ---------- > + * PgStat_ToastEntry Per-TOAST-column info in a MsgFuncstat > + * ---------- Not in "a MsgFuncstat", right? > +-- function to wait for counters to advance > +create function wait_for_stats() returns void as $$ I don't think we want a separate copy of wait_for_stats; see commit fe60b67250a3 and the discussion leading to it. Maybe you'll want to move the test to stats.sql. I'm not sure what to say about relying on LZ4; maybe you'll want to leave that part out, and just verify in an LZ4-enabled build that some 'l' entry exists. BTW, don't we have any decent way to turn that 'l' into a more reasonable, descriptive string? Also, perhaps make the view-defining query turn an empty compression method into whatever the default is. Or even better, stats collection should store the real compression method used rather than empty string, to avoid confusing things when some stats are collected, then the default is changed, then some more stats are collected. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Para tener más hay que desear menos"
Am 03.01.22 um 22:03 schrieb Justin Pryzby: >> +pgstat_report_toast_activity(Oid relid, int attr, >> + bool externalized, >> + bool compressed, >> + int32 old_size, >> + int32 new_size, > ... >> + if (new_size) >> + { >> + htabent->t_counts.t_size_orig+=old_size; >> + if (new_size) >> + { > > I guess one of these is supposed to say old_size? Didn't make a difference, tbth, as they'd both be 0 or have a value. Streamlined the whole block now. >> +CREATE TABLE toast_test (cola TEXT, colb TEXT COMPRESSION lz4, colc TEXT , cold TEXT, cole TEXT); > > Is there a reason this uses lz4 ? I thought it might help later on, but alas! the LZ4 column mainly broke things, so I removed it for the time being. > If that's needed for stable results, I think you should use pglz, since that's > what's guaranteed to exist. I imagine LZ4 won't be required any time soon, > seeing as zlib has never been required. Yeah. It didn't prove anything whatsoever. >> + Be aware that this feature, depending on the amount of TOASTable columns in >> + your databases, may significantly increase the size of the statistics files >> + and the workload of the statistics collector. It is recommended to only >> + temporarily activate this to assess the right compression and storage method >> + for (a) column(s). > > saying "a column" is fine Changed. > >> + <structfield>schemaname</structfield> <type>name</type> >> + Attribute (column) number in the relation >> + <structfield>relname</structfield> <type>name</type> > >> + <entry role="catalog_table_entry"><para role="column_definition"> >> + <structfield>compressmethod</structfield> <type>char</type> >> + </para> >> + <para> >> + Compression method of the attribute (empty means default) > > One thing to keep in mind is that the current compression method is only used > for *new* data - old data can still use the old compression method. It > probably doesn't need to be said here, but maybe you can refer to the docs > about that in alter_table. > >> + Number of times the compression was successful (gained a size reduction) > > It's more clear to say "was reduced in size" Changed the wording a bit, I guess it is clear enough now. The question is if the column should be there at all, as it's simply fetched from pg_attribute... > >> + /* we assume this inits to all zeroes: */ >> + static const PgStat_ToastCounts all_zeroes; > > You don't have to assume; static/global allocations are always zero unless > otherwise specified. Copy-pasta ;-) Removed. Thx for looking into this! Patch v7 will be in the next mail. -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Am 03.01.22 um 22:23 schrieb Alvaro Herrera: > Overall I think this is a good feature to have; assessing the need for > compression is important for tuning, so +1 for the goal of the patch. Much appreciated! > I didn't look into the patch carefully, but here are some minor > comments: > > On 2022-Jan-03, Gunnar "Nick" Bluth wrote: > >> @@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext *ttc, int attribute) >> Datum *value = &ttc->ttc_values[attribute]; >> Datum new_value; >> ToastAttrInfo *attr = &ttc->ttc_attr[attribute]; >> + instr_time start_time; >> >> + INSTR_TIME_SET_CURRENT(start_time); >> new_value = toast_compress_datum(*value, attr->tai_compression); >> >> if (DatumGetPointer(new_value) != NULL) > > Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an > expensive syscall. Find a way to only do it if the feature is enabled. Yeah, I was worried about that (and asking if it would be required) already. Adding the check was easier than I expected, though I'm absolutely clueless if I did it right! #include "pgstat.h" extern PGDLLIMPORT bool pgstat_track_toast; > This also suggests that perhaps it'd be a good idea to allow this to be > enabled for specific tables only, rather than system-wide. (Maybe in > order for stats to be collected, the user should have to both set the > GUC option *and* set a per-table option? Not sure.) That would of course be nice, but I seriously doubt the required additional logic would be justified. The patch currently tampers with as few internal structures as possible, and for good reason... ;-) >> @@ -82,10 +82,12 @@ typedef enum StatMsgType >> PGSTAT_MTYPE_DEADLOCK, >> PGSTAT_MTYPE_CHECKSUMFAILURE, >> PGSTAT_MTYPE_REPLSLOT, >> + PGSTAT_MTYPE_CONNECTION, > > I think this new enum value doesn't belong in this patch. Yeah, did I mention I'm struggling with rebasing? ;-| >> +/* ---------- >> + * PgStat_ToastEntry Per-TOAST-column info in a MsgFuncstat >> + * ---------- > > Not in "a MsgFuncstat", right? Obviously... fixed! > >> +-- function to wait for counters to advance >> +create function wait_for_stats() returns void as $$ > > I don't think we want a separate copy of wait_for_stats; see commit > fe60b67250a3 and the discussion leading to it. Maybe you'll want to > move the test to stats.sql. I'm not sure what to say about relying on Did so. > LZ4; maybe you'll want to leave that part out, and just verify in an > LZ4-enabled build that some 'l' entry exists. BTW, don't we have any > decent way to turn that 'l' into a more reasonable, descriptive string? > Also, perhaps make the view-defining query turn an empty compression > method into whatever the default is. I'm not even sure that having it in there is useful at all. It's simply JOINed in from pg_attribute. Which is where I'd see that "make it look nicer" change happening, tbth. ;-) > Or even better, stats collection > should store the real compression method used rather than empty string, > to avoid confusing things when some stats are collected, then the > default is changed, then some more stats are collected. I was thinking about that already, but came to the conclusion that it a) would blow up the size of these statistics by quite a bit and b) would be quite tricky to display in a useful way. I mean, the use case of track_toast is pretty limited anyway; you'll probably turn this feature on with a specific column in mind, of which you'll probably know which compression method is used at the time. Thanks for the feedback! v7 attached. -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 04.01.22 um 12:29 schrieb Gunnar "Nick" Bluth: > Am 03.01.22 um 22:23 schrieb Alvaro Herrera: >> Overall I think this is a good feature to have; assessing the need for >> compression is important for tuning, so +1 for the goal of the patch. > > Much appreciated! > > >> I didn't look into the patch carefully, but here are some minor >> comments: >> >> On 2022-Jan-03, Gunnar "Nick" Bluth wrote: >> >>> @@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext >>> *ttc, int attribute) >>> Datum *value = &ttc->ttc_values[attribute]; >>> Datum new_value; >>> ToastAttrInfo *attr = &ttc->ttc_attr[attribute]; >>> + instr_time start_time; >>> + INSTR_TIME_SET_CURRENT(start_time); >>> new_value = toast_compress_datum(*value, attr->tai_compression); >>> if (DatumGetPointer(new_value) != NULL) >> >> Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an >> expensive syscall. Find a way to only do it if the feature is enabled. > > Yeah, I was worried about that (and asking if it would be required) > already. > Adding the check was easier than I expected, though I'm absolutely > clueless if I did it right! > > #include "pgstat.h" > extern PGDLLIMPORT bool pgstat_track_toast; > As it works and nobody objected, it seems to be the right way... v8 (applies cleanly to today's HEAD/master) attached. Any takers? -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Hi, On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote: > v8 (applies cleanly to today's HEAD/master) attached. This doesn't apply anymore, likely due to my recent pgstat changes - which you'd need to adapt to... http://cfbot.cputube.org/patch_37_3457.log Marked as waiting on author. Greetings, Andres Freund
Am 22.03.22 um 02:17 schrieb Andres Freund: > Hi, > > On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote: >> v8 (applies cleanly to today's HEAD/master) attached. > > This doesn't apply anymore, likely due to my recent pgstat changes - which > you'd need to adapt to... Now, that's been quite an overhaul... kudos! > http://cfbot.cputube.org/patch_37_3457.log > > Marked as waiting on author. v9 attached. TBTH, I don't fully understand all the external/static stuff, but it applies to HEAD/master, compiles and passes all tests, so... ;-) Best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 22.03.22 um 12:23 schrieb Gunnar "Nick" Bluth: > Am 22.03.22 um 02:17 schrieb Andres Freund: >> Hi, >> >> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote: >>> v8 (applies cleanly to today's HEAD/master) attached. >> >> This doesn't apply anymore, likely due to my recent pgstat changes - which >> you'd need to adapt to... > > Now, that's been quite an overhaul... kudos! > > >> http://cfbot.cputube.org/patch_37_3457.log >> >> Marked as waiting on author. > > v9 attached. > > TBTH, I don't fully understand all the external/static stuff, but it > applies to HEAD/master, compiles and passes all tests, so... ;-) And v10 catches up to master once again. Best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Am 31.03.22 um 15:14 schrieb Gunnar "Nick" Bluth: > Am 22.03.22 um 12:23 schrieb Gunnar "Nick" Bluth: >> Am 22.03.22 um 02:17 schrieb Andres Freund: >>> Hi, >>> >>> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote: >>>> v8 (applies cleanly to today's HEAD/master) attached. >>> >>> This doesn't apply anymore, likely due to my recent pgstat changes - which >>> you'd need to adapt to... >> >> Now, that's been quite an overhaul... kudos! >> >> >>> http://cfbot.cputube.org/patch_37_3457.log >>> >>> Marked as waiting on author. >> >> v9 attached. >> >> TBTH, I don't fully understand all the external/static stuff, but it >> applies to HEAD/master, compiles and passes all tests, so... ;-) > > And v10 catches up to master once again. > > Best, That was meant to say "v10", sorry! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth <gunnar.bluth@pro-open.de> wrote: > That was meant to say "v10", sorry! Hi, From my point of view, at least, it would be preferable if you'd stop changing the subject line every time you post a new version. Based on the test results in http://postgr.es/m/42bfa680-7998-e7dc-b50e-480cdd986ffc@pro-open.de and the comments from Andres in https://www.postgresql.org/message-id/20211212234113.6rhmqxi5uzgipwx2%40alap3.anarazel.de my judgement would be that, as things stand today, this patch has no chance of being accepted, due to overhead. Now, Andres is currently working on an overhaul of the statistics collector and perhaps that would reduce the overhead of something like this to an acceptable level. If it does, that would be great news; I just don't know whether that's the case. As far as the statistics themselves are concerned, I am somewhat skeptical about whether it's really worth adding code for this. According to the documentation, the purpose of the patch is to allow you to assess choice of storage and compression method settings for a column and is not intended to be enabled permanently. However, it seems to me that you could assess that pretty easily without this patch: just create a couple of different tables with different settings, load up the same data via COPY into each one, and see what happens. Now you might answer that with the patch you would get more detailed and accurate statistics, and I think that's true, but it doesn't really look like the additional level of detail would be critical to have in order to make a proper assessment. You might also say that creating multiple copies of the table and loading the data multiple times would be expensive, and that's also true, but you don't really need to load it all. A representative sample of 1GB or so would probably suffice in most cases, and that doesn't seem likely to be a huge load on the system. Also, as we add more compression options, it's going to be hard to assess this sort of thing without trying stuff anyway. For example if you can set the lz4 compression level, you're not going to know which level is actually going to work best without trying out a bunch of them and seeing what happens. If we allow access to other sorts of compression parameters like zstd's "long" option, similarly, if you really care, you're going to have to try it. So my feeling is that this feels like a lot of machinery and a lot of worst-case overhead to solve a problem that's really pretty easy to solve without any new code at all, and therefore I'd be inclined to reject it. However, it's a well-known fact that sometimes my feelings about things are pretty stupid, and this might be one of those times. If so, I hope someone will enlighten me by telling me what I'm missing. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Am 05.04.22 um 18:17 schrieb Robert Haas: > On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth > <gunnar.bluth@pro-open.de> wrote: >> That was meant to say "v10", sorry! > > Hi, Hi Robert, and thx for looking at this. > From my point of view, at least, it would be preferable if you'd stop > changing the subject line every time you post a new version. Terribly sorry, I believed to do the right thing! I removed the "suffix" now for good. > Based on the test results in > http://postgr.es/m/42bfa680-7998-e7dc-b50e-480cdd986ffc@pro-open.de > and the comments from Andres in > https://www.postgresql.org/message-id/20211212234113.6rhmqxi5uzgipwx2%40alap3.anarazel.de > my judgement would be that, as things stand today, this patch has no > chance of being accepted, due to overhead. Now, Andres is currently > working on an overhaul of the statistics collector and perhaps that > would reduce the overhead of something like this to an acceptable > level. If it does, that would be great news; I just don't know whether > that's the case. AFAICT, Andres' work is more about the structure (e.g. 13619598f1080d7923454634a2570ca1bc0f2fec). Or I've missed something...? The attached v11 incorporates the latest changes in the area, btw. Anyway, my (undisputed up to now!) understanding still is that only backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast view) actually read the data. So, the 10-15% more space used for pg_stat only affect the stats collector and _some few_ backends. And those 10-15% were gathered with 10.000 tables containing *only* TOASTable attributes. So the actual percentage would probably go down quite a bit once you add some INTs or such. Back then, I was curious myself on the impact and just ran a few syntetic tests quickly hacked together. I'll happily go ahead and run some tests on real world schemas if that helps clarifying matters! > As far as the statistics themselves are concerned, I am somewhat > skeptical about whether it's really worth adding code for this. > According to the documentation, the purpose of the patch is to allow > you to assess choice of storage and compression method settings for a > column and is not intended to be enabled permanently. However, it TBTH, the wording there is probably a bit over-cautious. I very much respect Andres and thus his reservations, and I know how careful the project is about regressions of any kind (see below on some elobarations on the latter). I alleviated the <note> part a bit for v11. > seems to me that you could assess that pretty easily without this > patch: just create a couple of different tables with different > settings, load up the same data via COPY into each one, and see what > happens. Now you might answer that with the patch you would get more > detailed and accurate statistics, and I think that's true, but it > doesn't really look like the additional level of detail would be > critical to have in order to make a proper assessment. You might also > say that creating multiple copies of the table and loading the data > multiple times would be expensive, and that's also true, but you don't > really need to load it all. A representative sample of 1GB or so would > probably suffice in most cases, and that doesn't seem likely to be a > huge load on the system. At the end of the day, one could argue like you did there for almost all (non-attribute) stats. "Why track function execution times? Just set up a benchmark and call the function 1 mio times and you'll know how long it takes on average!". "Why track IO timings? Run a benchmark on your system and ..." etc. pp. I maintain a couple of DBs that house TBs of TOASTable data (mainly XML containing encrypted payloads). In just a couple of columns per cluster. I'm completely clueless if TOAST compression makes a difference there. Or externalization. And I'm not allowed to copy that data anywhere outside production without unrolling a metric ton of red tape. Guess why I started writing this patch ;-) *I* would certainly leave the option on, just to get an idea of what's happening... > Also, as we add more compression options, it's going to be hard to > assess this sort of thing without trying stuff anyway. For example if > you can set the lz4 compression level, you're not going to know which > level is actually going to work best without trying out a bunch of > them and seeing what happens. If we allow access to other sorts of > compression parameters like zstd's "long" option, similarly, if you > really care, you're going to have to try it. Funny that you mention it. When writing the first version, I was thinking about the LZ4 patch authors and was wondering how they tested/benchmarked all of it and why they didn't implement something like this patch for their tests ;-) Yes, you're gonna try it. And you're gonna measure it. Somehow. Externally, as things are now. With pg_stat_toast, you'd get the byte-by-byte and - maybe even more important - ms-by-ms comparison of the different compression and externalization strategies straight from the core of the DB. I'd fancy that! And if you get these stats by just flicking a switch (or leaving it on permanently...), you might start looking at the pg_stat_toast view from time to time, maybe realizing that your DB server spent hours of CPU time trying to compress data that's compressed already. Or of which you _know_ that it's only gonna be around for a couple of seconds... Mind you, a *lot* of people out there aren't even aware that TOAST even exists. Granted, most probably just don't care... ;-) Plus: this would (potentially, one day) give us information we could eventually incorporate into EXPLAIN [ANALYZE]. Like, "estimated time for (un)compressing TOAST values" or so. > So my feeling is that this feels like a lot of machinery and a lot of > worst-case overhead to solve a problem that's really pretty easy to > solve without any new code at all, and therefore I'd be inclined to > reject it. However, it's a well-known fact that sometimes my feelings > about things are pretty stupid, and this might be one of those times. > If so, I hope someone will enlighten me by telling me what I'm > missing. Most DBAs I met will *happily* donate a few CPU cycles (and MBs) to gather as much first hand information about their *live* systems. Why is pg_stat_statements so popular? Even if it costs 5-10% CPU cycles...? If I encounter a tipped-over plan and have a load1 of 1200 on my production server, running pgbadger on 80GB of (compressed) full statement logs will just not give me the information I need _now_ (actually, an hour ago). So I happily deploy pg_stat_statements everywhere, *hoping* that I'll never really need it... Why is "replica" now the default WAL level? Because essentially everybody changed it anyway, _just in case_. People looking for the last couple of % disk space will tune it down to "minimal", for everybody else, the gain in *options* vastly outweighs the additional disk usage. Why is everybody asking for live execution plans? Or a progress indication? The effort to get these is ridiculous from what I know, still I'd fancy them a lot! One of my clients is currently spending a lot of time (and thus $$$) to get some profiling software (forgot the name) for their DB2 to work (and not push AIX into OOM situations, actually ;). And compared to PostgreSQL, I'm pretty sure you get a lot more insights from a stock DB2 already. As that's what customers ask for... In essence: if *I* read in the docs "this will give you useful information" (and saves you effort for testing it in a seperate environment) "but may use up some RAM and disk space for pg_stats", I flick that switch on and probably leave it there. And in real world applications, you'd almost certainly never note a difference (we're discussing ~ 50-60 bytes per attribute, afterall). I reckon most DBAs (and developers) would give this a spin and leave it on, out of curiosity first and out of sheer convenience later. Like, if I run a DB relying heavily on stored procedures, I'll certainly enable "track_functions". Now show me the DB without any TOASTable attributes! ;-) TBTH, I imagine this to be a default "on" GUC parameter *eventually*, which some people with *very* special needs (and braindead schemas causing the "worst-case overhead" you mention) turn "off". But alas! that's not how we add features, is it? Also, I wouldn't call ~ 583 LOC plus docs & tests "a lot of machinery" ;-). Again, thanks a lot for peeking at this and best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Вложения
Hi, On 2022-04-06 00:08:13 +0200, Gunnar "Nick" Bluth wrote: > AFAICT, Andres' work is more about the structure (e.g. > 13619598f1080d7923454634a2570ca1bc0f2fec). Or I've missed something...? That was just the prep work... I'm about to send slightly further polished version, but here's the patchset from yesterday: https://www.postgresql.org/message-id/20220405030506.lfdhbu5zf4tzdpux%40alap3.anarazel.de > The attached v11 incorporates the latest changes in the area, btw. > > > Anyway, my (undisputed up to now!) understanding still is that only > backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast > view) actually read the data. So, the 10-15% more space used for pg_stat > only affect the stats collector and _some few_ backends. It's not so simple. That stats collector constantly writes these stats out to disk. And disk bandwidth / space is of course a shared resource. Greetings, Andres Freund
On Tue, Apr 5, 2022 at 10:34 PM Andres Freund <andres@anarazel.de> wrote: > > Anyway, my (undisputed up to now!) understanding still is that only > > backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast > > view) actually read the data. So, the 10-15% more space used for pg_stat > > only affect the stats collector and _some few_ backends. > > It's not so simple. That stats collector constantly writes these stats out to > disk. And disk bandwidth / space is of course a shared resource. Yeah, and just to make it clear, this really becomes an issue if you have hundreds of thousands or even millions of tables. It's a lot of extra data to be writing, and in some cases we're rewriting it all, like, once per second. Now if we're only incurring that overhead when this feature is enabled, then in fairness that problem is a lot less of an issue, especially if this is also disabled by default. People who want the data can get it and pay the cost, and others aren't much impacted. However, experience has taught me that a lot of skepticism is warranted when it comes to claims about how cheap extensions to the statistics system will be. -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-Apr-06, Robert Haas wrote: > Now if we're only incurring that overhead when this feature is > enabled, then in fairness that problem is a lot less of an issue, > especially if this is also disabled by default. People who want the > data can get it and pay the cost, and others aren't much impacted. > However, experience has taught me that a lot of skepticism is > warranted when it comes to claims about how cheap extensions to the > statistics system will be. Maybe this feature should provide a way to be enabled for tables individually, so you pay the overhead only where you need it and don't swamp the system with stats for uninteresting tables. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
On Tue, Apr 5, 2022 at 6:08 PM Gunnar "Nick" Bluth <gunnar.bluth@pro-open.de> wrote: > At the end of the day, one could argue like you did there for almost all > (non-attribute) stats. "Why track function execution times? Just set up > a benchmark and call the function 1 mio times and you'll know how long > it takes on average!". "Why track IO timings? Run a benchmark on your > system and ..." etc. pp. > > I maintain a couple of DBs that house TBs of TOASTable data (mainly XML > containing encrypted payloads). In just a couple of columns per cluster. > I'm completely clueless if TOAST compression makes a difference there. > Or externalization. > And I'm not allowed to copy that data anywhere outside production > without unrolling a metric ton of red tape. > Guess why I started writing this patch ;-) > *I* would certainly leave the option on, just to get an idea of what's > happening... I feel like if you want to know whether externalization made a difference, you can look at the size of the TOAST table. And by selecting directly from that table, you can even see how many chunks it contains, and how many are full-sized chunks vs. partial chunks, and stuff like that. And for compression, how about looking at pg_column_size(col1) vs. pg_column_size(col1||'') or something like that? You might get a 1-byte varlena header on the former and a 4-byte varlena header on the latter even if there's no compression, but any gains beyond 3 bytes have to be due to compression. > Most DBAs I met will *happily* donate a few CPU cycles (and MBs) to > gather as much first hand information about their *live* systems. > > Why is pg_stat_statements so popular? Even if it costs 5-10% CPU > cycles...? If I encounter a tipped-over plan and have a load1 of 1200 on > my production server, running pgbadger on 80GB of (compressed) full > statement logs will just not give me the information I need _now_ > (actually, an hour ago). So I happily deploy pg_stat_statements > everywhere, *hoping* that I'll never really need it... > > [ additional arguments ] I'm not trying to argue that instrumentation in the database is *in general* useless. That would be kinda ridiculous, especially since I've spent time working on it myself. But all cases are not the same. If you don't use something like pg_stat_statements or auto_explain or log_min_duration_statement, you don't have any good way of knowing which of your queries are slow and how slow they are, and you really need some kind of instrumentation to help you figure that out. On the other hand, you CAN find out how effective compression is, at least in terms of space, without instrumentation, because it leaves state on disk that you can examine whenever you like. The stuff that the patch tells you about how much *time* was consumed is data you can't get after-the-fact, so maybe there's enough value there to justify adding code to measure it. I'm not entirely convinced, though, because I think that for most people in most situations doing trial loads and timing them will give sufficiently good information that they won't need anything else. I'm not here to say that you must be wrong if you don't agree with me; I'm just saying what I think based on my own experience. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Apr 6, 2022 at 11:49 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2022-Apr-06, Robert Haas wrote: > > Now if we're only incurring that overhead when this feature is > > enabled, then in fairness that problem is a lot less of an issue, > > especially if this is also disabled by default. People who want the > > data can get it and pay the cost, and others aren't much impacted. > > However, experience has taught me that a lot of skepticism is > > warranted when it comes to claims about how cheap extensions to the > > statistics system will be. > > Maybe this feature should provide a way to be enabled for tables > individually, so you pay the overhead only where you need it and don't > swamp the system with stats for uninteresting tables. Maybe. Or maybe once Andres finishes fixing the stats collector the cost goes down so much that it's just not a big issue any more. I'm not sure. For me the first question is really around how useful this data really is. I think we can take it as given that the data would be useful to Gunnar, but I can't think of a situation when it would have been useful to me, so I'm curious what other people think (and why). -- Robert Haas EDB: http://www.enterprisedb.com
Am 06.04.22 um 17:22 schrieb Robert Haas: > On Tue, Apr 5, 2022 at 10:34 PM Andres Freund <andres@anarazel.de> wrote: >>> Anyway, my (undisputed up to now!) understanding still is that only >>> backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast >>> view) actually read the data. So, the 10-15% more space used for pg_stat >>> only affect the stats collector and _some few_ backends. >> >> It's not so simple. That stats collector constantly writes these stats out to >> disk. And disk bandwidth / space is of course a shared resource. > > Yeah, and just to make it clear, this really becomes an issue if you > have hundreds of thousands or even millions of tables. It's a lot of > extra data to be writing, and in some cases we're rewriting it all, > like, once per second. Fair enough. At that point, a lot of things become unexpectedly painful. How many % of the installed base may that be though? I'm far from done reading the patch and mail thread Andres mentioned, but I think the general idea is to move the stats to shared memory, so that reading (and thus, persisting) pg_stats is required far less often, right? > Now if we're only incurring that overhead when this feature is > enabled, then in fairness that problem is a lot less of an issue, > especially if this is also disabled by default. People who want the > data can get it and pay the cost, and others aren't much impacted. That's the idea, yes. I reckon folks with millions of tables will scan through the docs (and postgresql.conf) very thoroughly anyway. Hence the note there. > However, experience has taught me that a lot of skepticism is > warranted when it comes to claims about how cheap extensions to the > statistics system will be. Again, fair enough! Maybe we first need statistics about statistics collection and handling? ;-) Best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Am 06.04.22 um 17:49 schrieb Robert Haas: > I feel like if you want to know whether externalization made a > difference, you can look at the size of the TOAST table. And by > selecting directly from that table, you can even see how many chunks > it contains, and how many are full-sized chunks vs. partial chunks, > and stuff like that. And for compression, how about looking at > pg_column_size(col1) vs. pg_column_size(col1||'') or something like > that? You might get a 1-byte varlena header on the former and a 4-byte > varlena header on the latter even if there's no compression, but any > gains beyond 3 bytes have to be due to compression. I'll probably give that a shot! It does feel a bit like noting your mileage on fuel receipts though, as I've done until I got my first decent car; works and will work perfectly well up to the day, but certainly is a bit out-of-time (and requires some basic math skills ;-). Best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth <gunnar.bluth@pro-open.de> wrote: > Fair enough. At that point, a lot of things become unexpectedly painful. > How many % of the installed base may that be though? I don't have statistics on that, but it's large enough that the expense associated with the statistics collector is a reasonably well-known pain point, and for some users, a really severe one. Also, if we went out and spun up a billion new PostgreSQL instances that were completely idle and had no data in them, that would decrease the percentage of the installed base with high table counts, but it wouldn't be an argument for or against this patch. The people who are using PostgreSQL heavily are both more likely to have a lot of tables and also more likely to be interested in more obscure statistics. The question isn't - how likely is a random PostgreSQL installation to have a lot of tables? - but rather - how likely is a PostgreSQL installation that cares about this feature to have a lot of tables? I don't know either of those percentages but surely the second must be significantly higher than the first. > I'm far from done reading the patch and mail thread Andres mentioned, > but I think the general idea is to move the stats to shared memory, so > that reading (and thus, persisting) pg_stats is required far less often, > right? Right. I give Andres a lot of props for dealing with this mess, actually. Infrastructure work like this is a ton of work and hard to get right and you can always ask yourself whether the gains are really worth it, but your patch is not anywhere close to the first one where the response has been "but that would be too expensive!". So we have to consider not only the direct benefit of that work in relieving the pain of people with large database clusters, but also the indirect benefits of maybe unblocking some other improvements that would be beneficial. I'm fairly sure it's not going to make things so cheap that we can afford to add all the statistics anybody wants, but it's so painful that even modest relief would be more than welcome. > > However, experience has taught me that a lot of skepticism is > > warranted when it comes to claims about how cheap extensions to the > > statistics system will be. > > Again, fair enough! > Maybe we first need statistics about statistics collection and handling? ;-) Heh. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2022-04-06 12:24:20 -0400, Robert Haas wrote: > On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth > <gunnar.bluth@pro-open.de> wrote: > > Fair enough. At that point, a lot of things become unexpectedly painful. > > How many % of the installed base may that be though? > > I don't have statistics on that, but it's large enough that the > expense associated with the statistics collector is a reasonably > well-known pain point, and for some users, a really severe one. Yea. I've seen well over 100MB/s of write IO solely due to stats files writes on production systems, years ago. > I'm fairly sure it's not going to make things so cheap that we can afford to > add all the statistics anybody wants, but it's so painful that even modest > relief would be more than welcome. It definitely doesn't make stats free. But I'm hopefull that avoiding the regular writing out / readin back in, and the ability to only optionally store some stats (by varying allocation size or just having different kinds of stats), will reduce the cost sufficiently that we can start keeping more stats. Which is not to say that these stats are the right ones (nor that they're the wrong ones). I think if I were to tackle providing more information about toasting, I'd start not by adding a new stats view, but by adding a function to pgstattuple that scans the relation and collects stats for each toasted column. An SRF returning one row for each toastable column. With information like - column name - #inline datums - #compressed inline datums - sum(uncompressed inline datum size) - sum(compressed inline datum size) - #external datums - #compressed external datums - sum(uncompressed external datum size) - sum(compressed external datum size) IIRC this shouldn't require visiting the toast table itself. Perhaps also an SRF that returns information about each compression method separately (i.e. collect above information, but split by compression method)? Perhaps even with the ability to measure how big the gains of recompressing into another method would be? > > > However, experience has taught me that a lot of skepticism is > > > warranted when it comes to claims about how cheap extensions to the > > > statistics system will be. > > > > Again, fair enough! > > Maybe we first need statistics about statistics collection and handling? ;-) > > Heh. I've wondered about adding pg_stat_stats the other day, actually :) https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de Greetings, Andres Freund
Am 06.04.22 um 18:55 schrieb Andres Freund: > Hi, > > On 2022-04-06 12:24:20 -0400, Robert Haas wrote: >> On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth >> <gunnar.bluth@pro-open.de> wrote: >>> Fair enough. At that point, a lot of things become unexpectedly painful. >>> How many % of the installed base may that be though? >> >> I don't have statistics on that, but it's large enough that the >> expense associated with the statistics collector is a reasonably >> well-known pain point, and for some users, a really severe one. > > Yea. I've seen well over 100MB/s of write IO solely due to stats files writes > on production systems, years ago. Wow. Yeah, I tend to forget there's systems like ads' out there ;-) >> I'm fairly sure it's not going to make things so cheap that we can afford to >> add all the statistics anybody wants, but it's so painful that even modest >> relief would be more than welcome. > > It definitely doesn't make stats free. But I'm hopefull that avoiding the > regular writing out / readin back in, and the ability to only optionally store > some stats (by varying allocation size or just having different kinds of > stats), will reduce the cost sufficiently that we can start keeping more > stats. Knock on wood! > Which is not to say that these stats are the right ones (nor that they're the > wrong ones). ;-) > I think if I were to tackle providing more information about toasting, I'd > start not by adding a new stats view, but by adding a function to pgstattuple > that scans the relation and collects stats for each toasted column. An SRF > returning one row for each toastable column. With information like > > - column name > - #inline datums > - #compressed inline datums > - sum(uncompressed inline datum size) > - sum(compressed inline datum size) > - #external datums > - #compressed external datums > - sum(uncompressed external datum size) > - sum(compressed external datum size) > > IIRC this shouldn't require visiting the toast table itself. But it would still require a seqscan and quite some cycles. However, sure, something like that is an option. > Perhaps also an SRF that returns information about each compression method > separately (i.e. collect above information, but split by compression method)? > Perhaps even with the ability to measure how big the gains of recompressing > into another method would be? Even more of the above, but yeah, sound nifty. >>>> However, experience has taught me that a lot of skepticism is >>>> warranted when it comes to claims about how cheap extensions to the >>>> statistics system will be. >>> >>> Again, fair enough! >>> Maybe we first need statistics about statistics collection and handling? ;-) >> >> Heh. > > I've wondered about adding pg_stat_stats the other day, actually :) > https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de OMG LOL! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Am 06.04.22 um 17:49 schrieb Alvaro Herrera: > On 2022-Apr-06, Robert Haas wrote: > >> Now if we're only incurring that overhead when this feature is >> enabled, then in fairness that problem is a lot less of an issue, >> especially if this is also disabled by default. People who want the >> data can get it and pay the cost, and others aren't much impacted. >> However, experience has taught me that a lot of skepticism is >> warranted when it comes to claims about how cheap extensions to the >> statistics system will be. > > Maybe this feature should provide a way to be enabled for tables > individually, so you pay the overhead only where you need it and don't > swamp the system with stats for uninteresting tables. > That would obviously be very nice (and Georgios pushed heavily in that direction ;-). However, I intentionally bound those stats to the database level (see my very first mail). The changes to get them bound to attributes (or tables) would have required mangling with quite a lot of very elemental stuff, (e.g. attribute stats only get refreshed by ANALYZE and their structure would have to be changed significantly, bloating them even if the feature is inactive). It also would have made the stats updates synchronous (at TX end), would have been "blind" for all TOAST efforts done by rolled back TXs etc. What you can do is of course (just like track_functions): ALTER DATABASE under_surveillance SET track_toast = [on|off]; Best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato