Обсуждение: re: constant crashing
The full error reads:
PostgreSQL 16.2
I also believe it is a resource issue which can be rectified with a setting, but which setting?
server closed the connection expectantly
This probably means the server terminated abnormally
before or while processing the request.
error: connection to server was lost
I also believe it is a resource issue which can be rectified with a setting, but which setting?
If you were updating 100 million records what settings would you adjust?
Here are the updates I am performing on the 100 million records:
Here are the updates I am performing on the 100 million records:
UPDATE table SET category_modified = UPPER(category);
UPDATE table SET category_modified = REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), '\s{2,}', ' ', 'g') WHERE AND LENGTH(category_modified)>1 AND POSITION('--' IN category_modified)>0;
UPDATE table SET category_modified = REPLACE(category_modified,' ','-');
UPDATE table SET category_modified = CASE WHEN category_modified IS NOT NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$', '') ELSE NULL END;
UPDATE table SET category_modified = regexp_replace(category_modified, '-{2,}', '-', 'g');
UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1 AND category_modified LIKE '%-';
On 4/14/24 09:20, jack wrote: > The full error reads: > server closed the connection expectantly > This probably means the server terminated abnormally > before or while processing the request. > error: connection to server was lost Look at the OS system log. > > PostgreSQL 16.2 > > I also believe it is a resource issue which can be rectified with a > setting, but which setting? > If you were updating 100 million records what settings would you adjust? > > Here are the updates I am performing on the 100 million records: > UPDATE table SET category_modified = UPPER(category); > UPDATE table SET category_modified = > REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), > '\s{2,}', ' ', 'g') WHERE AND LENGTH(category_modified)>1 AND > POSITION('--' IN category_modified)>0; > UPDATE table SET category_modified = REPLACE(category_modified,' ','-'); > UPDATE table SET category_modified = CASE WHEN category_modified IS NOT > NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$', '') ELSE NULL END; > UPDATE table SET category_modified = regexp_replace(category_modified, > '-{2,}', '-', 'g'); > UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 > FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1 AND > category_modified LIKE '%-'; Is the above all being done in one script/transaction? Again what are the table definitions for the tables being copied into and/or modified? > > > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 4/14/24 09:20, jack wrote: >> The full error reads: >> server closed the connection expectantly >> This probably means the server terminated abnormally >> before or while processing the request. >> error: connection to server was lost > Look at the OS system log. Really the most detail would be in postgres' own log. I'd only expect the system log to contain relevant info if the problem turns out to be an overeager OOM killer --- but if that's what's happening, we should be able to diagnose from the postmaster's log too (because it'd show up as the backend dying from SIGKILL). regards, tom lane
Jack: On Sun, 14 Apr 2024 at 18:20, jack <jack4pg@a7q.com> wrote: > The full error reads: > server closed the connection expectantly > This probably means the server terminated abnormally > before or while processing the request. > error: connection to server was lost This is a CLIENT-side error. What people are asking you is for the corresponding error in the SERVER log. When the server has an severe error, ( probably a resource exhaustion, which normally leads to abnormal termination as it has been pointed out previously ) the connection gets closed, and you should go to the server to look at the eigen cause. > PostgreSQL 16.2 > I also believe it is a resource issue which can be rectified with a setting, but which setting? Not all resource issues can be solved, sometimes one asks for something which is too much for the available hardware. Anyway, as pointed out, posting the configuration, the server error, and maybe monitoring memory ( the usual culprit ) with top may help some one. Look for memory because a common cause of this kind of things in Linux is having overcommit on ( overcommit is nice, but basically it lies to postgres, it tells the server she can have X Gb and then when it tries to use them kills -9 it ) > If you were updating 100 million records what settings would you adjust? From what you told earlier ( loading from CSV ) and with .... > Here are the updates I am performing on the 100 million records: > UPDATE table SET category_modified = UPPER(category); > UPDATE table SET category_modified = REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), '\s{2,}',' ', 'g') WHERE AND LENGTH(category_modified)>1 AND POSITION('--' IN category_modified)>0; > UPDATE table SET category_modified = REPLACE(category_modified,' ','-'); > UPDATE table SET category_modified = CASE WHEN category_modified IS NOT NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$','') ELSE NULL END; > UPDATE table SET category_modified = regexp_replace(category_modified, '-{2,}', '-', 'g'); > UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1AND category_modified LIKE '%-'; this kind of updates I would recommend stream-filtering the data on load. I do not know how it does it presently, but those are six hundred million updates ( unless some are supressed, which is what I do not remember now ), which makes for huge transactions and is asking for problems ( or maybe only 100_000_000 if you are autocomitting ). If postgres does not supress redundant updates ( lots of people can answer that ) I would start by conditioning the updates ( UPDATE table SET category_modified = UPPER(category) where category_modified is not null and category_modified <> UPPER(category); ), this can shave time and resource usages. But if your updates are of this kind ( transformation of data on the current row ( your sample is on a single value of a row, even easier ) without touching other things in the database ), I would encourage to use a small perl/awk/python/whatever program to filter and transform the data before loading it. It has several advantages, one is speed, other is minimizing server load, other is testability ( pipe sample lines to the filter, check result, when satisfied pipe full file and pipe result to psql ), and choosing the right language ( I would say perl, it was dessigned practically for this sort of things ) makes the filtering really easy. And it seems you are forcing it, I see null handling in some updates, but not others, this hints off relying on strict behaviour ( functions returning null on null input ). The only apparent difficulty is parsing csv, but there are millions of modules parsing the one understood by postgres ( if you generate your data, the default text format for postgres is much easier to parse, i.e., in perl normally a simple split// call does the trick ). Doing it with a filtering stream would allow you to easily process gigabytes of data using a few megabytes of ram in the client. I'm not sure about the server, but stream filtering lends itself to very easy batching of copies, and from what I read your server is beefy. Francisco Olarte.
On Sun, Apr 14, 2024 at 10:20 AM Francisco Olarte <folarte@peoplecall.com> wrote:
If postgres does not supress redundant updates ( lots of people can
answer that )
It does not. My other usual piece of advice along these lines, if doing the transform outside the database is not desirable, is to at least ensure you are doing them on a staging table that is defined as either being temporary or unlogged. WAL production probably isn't causing the crash but can definitely be an issue and is pointless to incur during most transformation processing.
David J.
On Sun, 14 Apr 2024 at 21:50, jack <jack4pg@a7q.com> wrote:
The full error reads:server closed the connection expectantlyThis probably means the server terminated abnormallybefore or while processing the request.error: connection to server was lostPostgreSQL 16.2
I also believe it is a resource issue which can be rectified with a setting, but which setting?If you were updating 100 million records what settings would you adjust?
Here are the updates I am performing on the 100 million records:UPDATE table SET category_modified = UPPER(category);UPDATE table SET category_modified = REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), '\s{2,}', ' ', 'g') WHERE AND LENGTH(category_modified)>1 AND POSITION('--' IN category_modified)>0;UPDATE table SET category_modified = REPLACE(category_modified,' ','-');UPDATE table SET category_modified = CASE WHEN category_modified IS NOT NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$', '') ELSE NULL END;UPDATE table SET category_modified = regexp_replace(category_modified, '-{2,}', '-', 'g');UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1 AND category_modified LIKE '%-';
independent of best practices, i just want to check if there is a leak.
I created a sample table with text data and ran updates like yours and I could not see mem growth, but I have a small vm and ofc your category_modified field might be more complex than simple text fields for 30-40 chars.
can you grab the pid of your psql backend and (if you have pidstat installed) monitor resource usage for that pid
postgres@pg:~/udemy/16$ psql
psql (16.2 (Ubuntu 16.2-1.pgdg22.04+1))
Type "help" for help.
postgres=# select pg_backend_pid(); pg_backend_pid
----------------
1214
(1 row)
# pidstat 2 100 -rud -h -p 1214
(get all stats for that pid) that might help to figure out if there is a leak or the server has other things competing for memory and your updates were picked by the killer.
Linux 5.15.0-101-generic (pg) 04/15/24 _x86_64_ (1 CPU)
# Time UID PID %usr %system %guest %wait %CPU CPU minflt/s majflt/s VSZ RSS %MEM kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
00:40:25 113 1214 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 354112 220940 24.18 0.00 0.00 0.00 0 postgres
# Time UID PID %usr %system %guest %wait %CPU CPU minflt/s majflt/s VSZ RSS %MEM kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
00:40:27 113 1214 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 354112 220940 24.18 0.00 0.00 0.00 0 postgres
....
ofc, if there is a genuine leak , then there might be more digging needed Finding memory leaks in Postgres C code (enterprisedb.com)
just kill the process requesting more mem than available Memory context: how PostgreSQL allocates memory - CYBERTEC (cybertec-postgresql.com)
Thanks,
Vijay
Ignore my thread, I guess there might be a bug given it segfaulted.
On Mon, Apr 15, 2024, 12:48 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
On Sun, 14 Apr 2024 at 21:50, jack <jack4pg@a7q.com> wrote:The full error reads:server closed the connection expectantlyThis probably means the server terminated abnormallybefore or while processing the request.error: connection to server was lostPostgreSQL 16.2
I also believe it is a resource issue which can be rectified with a setting, but which setting?If you were updating 100 million records what settings would you adjust?
Here are the updates I am performing on the 100 million records:UPDATE table SET category_modified = UPPER(category);UPDATE table SET category_modified = REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), '\s{2,}', ' ', 'g') WHERE AND LENGTH(category_modified)>1 AND POSITION('--' IN category_modified)>0;UPDATE table SET category_modified = REPLACE(category_modified,' ','-');UPDATE table SET category_modified = CASE WHEN category_modified IS NOT NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$', '') ELSE NULL END;UPDATE table SET category_modified = regexp_replace(category_modified, '-{2,}', '-', 'g');UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1 AND category_modified LIKE '%-';independent of best practices, i just want to check if there is a leak.I created a sample table with text data and ran updates like yours and I could not see mem growth, but I have a small vm and ofc your category_modified field might be more complex than simple text fields for 30-40 chars.can you grab the pid of your psql backend and (if you have pidstat installed) monitor resource usage for that pidpostgres@pg:~/udemy/16$ psqlpsql (16.2 (Ubuntu 16.2-1.pgdg22.04+1))Type "help" for help.postgres=# select pg_backend_pid(); pg_backend_pid----------------1214(1 row)# pidstat 2 100 -rud -h -p 1214(get all stats for that pid) that might help to figure out if there is a leak or the server has other things competing for memory and your updates were picked by the killer.Linux 5.15.0-101-generic (pg) 04/15/24 _x86_64_ (1 CPU)# Time UID PID %usr %system %guest %wait %CPU CPU minflt/s majflt/s VSZ RSS %MEM kB_rd/s kB_wr/s kB_ccwr/s iodelay Command00:40:25 113 1214 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 354112 220940 24.18 0.00 0.00 0.00 0 postgres# Time UID PID %usr %system %guest %wait %CPU CPU minflt/s majflt/s VSZ RSS %MEM kB_rd/s kB_wr/s kB_ccwr/s iodelay Command00:40:27 113 1214 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 354112 220940 24.18 0.00 0.00 0.00 0 postgres....ofc, if there is a genuine leak , then there might be more digging needed Finding memory leaks in Postgres C code (enterprisedb.com)just kill the process requesting more mem than available Memory context: how PostgreSQL allocates memory - CYBERTEC (cybertec-postgresql.com)--Thanks,Vijay