Обсуждение: crazy logging from PG 8.4
After several days of running a simple query over a very large dataset (more than a terabyte), I logged in to note that postgresql had decided to log *billions* of log lines. The query looks like this: create table result_table as select TEXTFIELD, count(distinct INTFIELD) from SOME_HUGE_TABLE group by TEXTFIELD; At some point, PG started logging stuff that looked like this (over and over again): SELECT 2704204951 user@dbname LOG: 00000: performsort starting: CPU 0.00.... The 2704204951 number corresponds to the "%l" (session line number) in log_line_prefix. Thus, by the time I saw this, it had logged 2.7 billion lines. An strace of the process showed this, over and over again: getrusage(....) write(2, <logging info>) and nothing else. What happened here? It seems like PG just went crazy and got itself into some sort of loop. I had to kill the query. This is PostgreSQL 8.4.13 on x86_64, Linux, using Scientific Linux 6.4. -- Jon
On 23 ÅÃjen 2013, 16:00, Jon Nelson wrote: > After several days of running a simple query over a very large dataset > (more than a terabyte), I logged in to note that postgresql had > decided to log *billions* of log lines. > > The query looks like this: > > create table result_table as > select TEXTFIELD, count(distinct INTFIELD) from SOME_HUGE_TABLE group > by TEXTFIELD; > > At some point, PG started logging stuff that looked like this (over > and over again): > > SELECT 2704204951 user@dbname LOG: 00000: performsort starting: CPU > 0.00.... > > The 2704204951 number corresponds to the "%l" (session line number) > in log_line_prefix. Thus, by the time I saw this, it had logged 2.7 > billion lines. > > > An strace of the process showed this, over and over again: > > getrusage(....) > write(2, <logging info>) > > and nothing else. What happened here? It seems like PG just went > crazy and got itself into some sort of loop. I had to kill the query. First guess is that you enabled detailed logging in config. See log_*_stats in http://www.postgresql.org/docs/8.4/static/runtime-config-statistics.html I'd guess it's log_executor_stats, but as you've copied a single line (and not even that one was complete), it's hard to say. Those are the only lines referring to getrusage in the source code, although I'm not sure any of those could that produce the number of messages you've mentioned. > This is PostgreSQL 8.4.13 on x86_64, Linux, using Scientific Linux 6.4. Not really the most updated 8.4 install. Please consider updating to 8.4.18. Tomas
"Tomas Vondra" <tv@fuzzy.cz> writes: > On 23 ÅÃjen 2013, 16:00, Jon Nelson wrote: >> After several days of running a simple query over a very large dataset >> (more than a terabyte), I logged in to note that postgresql had >> decided to log *billions* of log lines. > First guess is that you enabled detailed logging in config. I think it's trace_sort. regards, tom lane
On Wed, Oct 23, 2013 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Tomas Vondra" <tv@fuzzy.cz> writes: >> On 23 =C5=98=C3=ADjen 2013, 16:00, Jon Nelson wrote: >>> After several days of running a simple query over a very large dataset >>> (more than a terabyte), I logged in to note that postgresql had >>> decided to log *billions* of log lines. > >> First guess is that you enabled detailed logging in config. > > I think it's trace_sort. I think you are right. Let's say that there would end up being 2.7+ billion groups. Does that scenario help to explain? I guess I'd like to see trace_sort have a new value in addition to just "on" and "off", perhaps "external". I don't really care about internal sorts but I do care about external sorts. Right now, they're the only (minimal!) insight I have into how the query is progressing. --=20 Jon
On 23.10.2013 20:56, Jon Nelson wrote: > On Wed, Oct 23, 2013 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Tomas Vondra" <tv@fuzzy.cz> writes: >>> On 23 ÅÃjen 2013, 16:00, Jon Nelson wrote: >>>> After several days of running a simple query over a very large dataset >>>> (more than a terabyte), I logged in to note that postgresql had >>>> decided to log *billions* of log lines. >> >>> First guess is that you enabled detailed logging in config. >> >> I think it's trace_sort. > > I think you are right. > > Let's say that there would end up being 2.7+ billion groups. Does > that scenario help to explain? I guess I'd like to see trace_sort have > a new value in addition to just "on" and "off", perhaps "external". I > don't really care about internal sorts but I do care about external > sorts. Right now, they're the only (minimal!) insight I have into how > the query is progressing. Yes, that sounds plausible, because COUNT(DISTINCT) ends with a sort for each group. At least that's my experience. If by internal/external you mean in-memory/on-disk sorting, then you can disable trace_sort and use log_temp_files instead. It will probably log some other operations using temp files, but it's unlikely to happen as frequently as trace_sort. Tomas