Обсуждение: How different is AWS-RDS postgres?

Поиск
Список
Период
Сортировка

How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
I have what purports to be Postgres 12 ( PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit ) running on RDS.  I testing a new release of my service with an admittedly large data set (which may be my swamping AWS bandwidth).  But my app logs tell me a table was created and loaded (and later, read)

From my tomcat log
00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED: bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" 00:09:58.585 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at 1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11] ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload 00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : false 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG edu.utah.camplab.jx.PayloadFromMux - 074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at 1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447 took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] DEBUG edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.AbstractPayload - ff3ba73c-d927-431c-bd29-9687a47f1594: closing process ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG org.jooq.impl.DefaultConnectionProvider - release savepoint 00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609
Which claims to have written 2,528,447 records in roughly 9 seconds into the newly created table "bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.

However, no such table exists, though later processing renames it by appending "_done" to the name (being careful to remain under 64 char)

My middleware does receive an exception notice
00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR edu.utah.camplab.jx.AbstractPayload - run ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL [insert into segment select * from bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.: {}

which confirms(?) that the table /was/ there and read from 10 times prior since I copy from this temp, bulk loaded table to the actual target in 16 chunks by diddling the first hex digit from 0 through f.  Here zero through nine apparently didn't have a problem.  These 16 inserts are in a single transaction, separate from the bulk copy.  (There are a dozen more  of these files processed and disappeared.)

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs?  Is this frailty inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.







Re: How different is AWS-RDS postgres?

От
Ron
Дата:
On 5/26/21 12:50 PM, Rob Sargent wrote:
I have what purports to be Postgres 12 ( PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit ) running on RDS.  I testing a new release of my service with an admittedly large data set (which may be my swamping AWS bandwidth).  But my app logs tell me a table was created and loaded (and later, read)

From my tomcat log
00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED: bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" 00:09:58.585 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at 1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11] ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload 00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : false 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG edu.utah.camplab.jx.PayloadFromMux - 074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at 1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447 took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] DEBUG edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.AbstractPayload - ff3ba73c-d927-431c-bd29-9687a47f1594: closing process ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG org.jooq.impl.DefaultConnectionProvider - release savepoint 00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609
Which claims to have written 2,528,447 records in roughly 9 seconds into the newly created table "bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.

However, no such table exists, though later processing renames it by appending "_done" to the name (being careful to remain under 64 char)

My middleware does receive an exception notice
00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR edu.utah.camplab.jx.AbstractPayload - run ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL [insert into segment select * from bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.: {}

which confirms(?) that the table /was/ there and read from 10 times prior since I copy from this temp, bulk loaded table to the actual target in 16 chunks by diddling the first hex digit from 0 through f.  Here zero through nine apparently didn't have a problem.  These 16 inserts are in a single transaction, separate from the bulk copy.  (There are a dozen more  of these files processed and disappeared.)

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs?  Is this frailty inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into an instance, and it's been rock solid.  JBoss is quite happy, and there haven't been any problems.

--
Angular momentum makes the world go 'round.

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs?  Is this frailty inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into an instance, and it's been rock solid.  JBoss is quite happy, and there haven't been any problems.

--
Angular momentum makes the world go 'round.
Good to hear. 
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time.  Nothing in the (available) logs other that lots of auto-vacuuming  and the suggestion to increase wal size (which I thought a little surprising).  I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.

rjs

Re: How different is AWS-RDS postgres?

От
Ron
Дата:
On 5/26/21 1:24 PM, Rob Sargent wrote:

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs?  Is this frailty inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into an instance, and it's been rock solid.  JBoss is quite happy, and there haven't been any problems.

--
Angular momentum makes the world go 'round.
Good to hear. 
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time.  Nothing in the (available) logs other that lots of auto-vacuuming  and the suggestion to increase wal size (which I thought a little surprising).  I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.

The difficulty in viewing logs needs to be emphasized.

--
Angular momentum makes the world go 'round.

Re: How different is AWS-RDS postgres?

От
Ian Harding
Дата:
There is an option to send the logs to cloudwatch which makes it less awful to look at them. 

On Wed, May 26, 2021, 11:52 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 5/26/21 1:24 PM, Rob Sargent wrote:

My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs?  Is this frailty inherent in AWS/RDS infrastructure?

Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.

The big differences I notice are:

1. "postgres" is not a superuser,
2. viewing logs is a hassle.

Otherwise, they're really similar.  We've pumped about 6TB of data into an instance, and it's been rock solid.  JBoss is quite happy, and there haven't been any problems.

--
Angular momentum makes the world go 'round.
Good to hear. 
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time.  Nothing in the (available) logs other that lots of auto-vacuuming  and the suggestion to increase wal size (which I thought a little surprising).  I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.

The difficulty in viewing logs needs to be emphasized.

--
Angular momentum makes the world go 'round.

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:


On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them. 
I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!?  I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets. 

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four  channels of logs; the graphs are wearing the same rose-coloured as the logs.  
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear.  I am making separate calls to JDBC getConnection () for each doPost. 

Re: How different is AWS-RDS postgres?

От
Philip Semanchuk
Дата:

> On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:
>
>
>
>> On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:
>>
>> 
>> There is an option to send the logs to cloudwatch which makes it less awful to look at them.
> I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!?  I
haveyet to spot anything which corresponds to the “I/O failure” which the middle ware gets.  
>
> I don’t have query logging on, but I do see reports from my psql session fat-fingering.
>
> As to the logs UI, the search is pretty feeble; I don’t understand why there are four  channels of logs; the graphs
arewearing the same rose-coloured as the logs.   
> And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)
>
> My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the
tablescould disappear.  I am making separate calls to JDBC getConnection () for each doPost.  

We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to
downloadcloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this
email.It’s hacky but not too terrible. I hope you find it useful.  

Cheers
Philip



Вложения

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
> We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to
downloadcloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this
email.It’s hacky but not too terrible. I hope you find it useful.  
>

> Cheers
> Philip
Thank you.
>
>
> <get_aws_logs.py>



Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
On 5/27/21 7:45 AM, Philip Semanchuk wrote:

On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:



On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them. 
I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!?  I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets. 

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four  channels of logs; the graphs are wearing the same rose-coloured as the logs.  
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear.  I am making separate calls to JDBC getConnection () for each doPost. 
We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful. 

Cheers
Philip


I may have found another difference: JDBC connections are not logged?!
I just reproduce my report, and the CloudWatch view of the logs shows some psql interaction from before and after the test, but no mention of losing 7.5M records.

Re: How different is AWS-RDS postgres?

От
Sam Gendler
Дата:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

On Thu, May 27, 2021 at 2:01 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 7:45 AM, Philip Semanchuk wrote:
On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:



On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:


There is an option to send the logs to cloudwatch which makes it less awful to look at them. 
I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!?  I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets. 

I don’t have query logging on, but I do see reports from my psql session fat-fingering.

As to the logs UI, the search is pretty feeble; I don’t understand why there are four  channels of logs; the graphs are wearing the same rose-coloured as the logs.  
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)

My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear.  I am making separate calls to JDBC getConnection () for each doPost. 
We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful. 

Cheers
Philip


I may have found another difference: JDBC connections are not logged?!
I just reproduce my report, and the CloudWatch view of the logs shows some psql interaction from before and after the test, but no mention of losing 7.5M records.

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
On 5/27/21 3:08 PM, Sam Gendler wrote:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql  statements generate by my app begin executed on the server.  Not coincidentally this is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself.  I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear.  Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read.  Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?
the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
So this would have been the fourth such insert statement read from that staging table.

That table is not deleted by my code.  It is renamed after the last insert into segment, by appending "_done" to the name.  But that table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux - bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin transfer from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool Cleaner[1731185718:1622133381802]] org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
        at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
        at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
        at edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit      : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR edu.utah.camplab.jx.AbstractPayload - run c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL [insert into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-000000000000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
        at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
        at org.jooq.impl.Tools.translate(Tools.java:2880)
        at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
        at edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
        at edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
        at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
        at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
        at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
        at org.jooq.impl.Tools$35.get(Tools.java:5200)
        at org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
        at org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
        at org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
        at edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
        at edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
        at edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)

...
       Suppressed: org.jooq.exception.DataAccessException: Cannot rollback transaction
                at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
                at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
                at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 45 common frames omitted
        Caused by: java.sql.SQLException: Connection has already been closed.
                at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.rollback(Unknown Source)
                at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
                ... 47 common frames omitted
        Suppressed: org.jooq.exception.DataAccessException: Cannot set autoCommit
                at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
                at org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
                at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
                at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 33 common frames omitted
        Caused by: java.sql.SQLException: Connection has already been closed.
                at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
                at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
                ... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
        at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
        at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
        at com.sun.proxy.$Proxy14.execute(Unknown Source)
        at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
        ... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
        at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
        at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)



Re: How different is AWS-RDS postgres?

От
Ron
Дата:
On 5/27/21 4:58 PM, Rob Sargent wrote:
On 5/27/21 3:08 PM, Sam Gendler wrote:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql  statements generate by my app begin executed on the server.  Not coincidentally this is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself.  I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear.  Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read.  Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?
the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
So this would have been the fourth such insert statement read from that staging table.

That table is not deleted by my code.  It is renamed after the last insert into segment, by appending "_done" to the name.  But that table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

--
Angular momentum makes the world go 'round.

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
On 5/27/21 4:10 PM, Ron wrote:
On 5/27/21 4:58 PM, Rob Sargent wrote:
On 5/27/21 3:08 PM, Sam Gendler wrote:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql  statements generate by my app begin executed on the server.  Not coincidentally this is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself.  I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear.  Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read.  Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?
the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
So this would have been the fourth such insert statement read from that staging table.

That table is not deleted by my code.  It is renamed after the last insert into segment, by appending "_done" to the name.  But that table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

I have the json file which feeds the staging table and I have code which is designed to load such files.  I suppose an sql file with those 16 insert statements would approximate the the apps call.  Let me give that a shot.

--
Angular momentum makes the world go 'round.

Re: How different is AWS-RDS postgres?

От
Sam Gendler
Дата:
That sure looks like something is causing your connection to have a transaction rollback.  I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester.  Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested. 

On Thu, May 27, 2021 at 2:58 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 3:08 PM, Sam Gendler wrote:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql  statements generate by my app begin executed on the server.  Not coincidentally this is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself.  I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear.  Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read.  Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?
the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
So this would have been the fourth such insert statement read from that staging table.

That table is not deleted by my code.  It is renamed after the last insert into segment, by appending "_done" to the name.  But that table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux - bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin transfer from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool Cleaner[1731185718:1622133381802]] org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
        at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
        at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
        at edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit      : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR edu.utah.camplab.jx.AbstractPayload - run c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL [insert into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-000000000000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
        at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
        at org.jooq.impl.Tools.translate(Tools.java:2880)
        at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
        at edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
        at edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
        at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
        at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
        at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
        at org.jooq.impl.Tools$35.get(Tools.java:5200)
        at org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
        at org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
        at org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
        at edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
        at edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
        at edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
        at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)

...
       Suppressed: org.jooq.exception.DataAccessException: Cannot rollback transaction
                at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
                at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
                at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 45 common frames omitted
        Caused by: java.sql.SQLException: Connection has already been closed.
                at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.rollback(Unknown Source)
                at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
                ... 47 common frames omitted
        Suppressed: org.jooq.exception.DataAccessException: Cannot set autoCommit
                at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
                at org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
                at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
                at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
                ... 33 common frames omitted
        Caused by: java.sql.SQLException: Connection has already been closed.
                at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
                at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
                at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
                at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
                at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
                ... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
        at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
        at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
        at com.sun.proxy.$Proxy14.execute(Unknown Source)
        at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
        ... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
        at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
        at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)



Re: How different is AWS-RDS postgres?

От
Sam Gendler
Дата:
The docs for the connection pool you are using: https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html and a screenshot for the relevant configuration params.

On Thu, May 27, 2021 at 3:24 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 4:10 PM, Ron wrote:
On 5/27/21 4:58 PM, Rob Sargent wrote:
On 5/27/21 3:08 PM, Sam Gendler wrote:
The same JDBC connection that is resulting in lost data?  Sounds to me like you aren't connecting to the DB you think you are connecting to.

I almost wish that were true.

However, looking at AWS "Performance Insights" is see the sql  statements generate by my app begin executed on the server.  Not coincidentally this is from the "Top SQL (10)/Load by waits" view.  Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself.  I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.

I just notice the "Top database" tab on that screen:  there is only one and it's the correct one.

I've reproduced the behaviour.  I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear.  Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).

The loss of the staging table is most spectacular.  It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read.  Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?
the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data.  In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where  s.probandset_id >= '30000000-0000-0000-0000-00000000\
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
So this would have been the fourth such insert statement read from that staging table.

That table is not deleted by my code.  It is renamed after the last insert into segment, by appending "_done" to the name.  But that table, by either name, is nowhere to be seen on the server.

For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -

Is there any way to replicate this in psql?

I have the json file which feeds the staging table and I have code which is designed to load such files.  I suppose an sql file with those 16 insert statements would approximate the the apps call.  Let me give that a shot.

--
Angular momentum makes the world go 'round.

Вложения

Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
On 5/27/21 4:25 PM, Sam Gendler wrote:
That sure looks like something is causing your connection to have a transaction rollback.  I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester.  Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested. 


Well I /was/ doing
      contextResource.setProperty("validationQuery",                    "SELECT 1");
but I see that I lost that when I switched to using a properties file.  Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first.  Is the connection idle after the I/O error or is the I/O error from a dead connection?  (Small disclaimer:  there is no catalina involved here, just an embedded tomcat instance.)




Re: How different is AWS-RDS postgres?

От
Sam Gendler
Дата:
Unless something has changed in recent years, the core servlet engine of tomcat IS catalina.  Embedded tomcat is embedded catalina. It looks like the I/O error is a result of attempting to send a query on an already dead connection.  I'd look for something that is limiting how long a connection can be open - either an explicit or default value for a timeout in the connection pool or on the server side.  If you don't get the same behaviour when running against a database locally, I'd definitely look at the default settings in RDS.  It may be automatically closing connections if they are idle for even a brief period.

On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 4:25 PM, Sam Gendler wrote:
That sure looks like something is causing your connection to have a transaction rollback.  I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester.  Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested. 


Well I /was/ doing
      contextResource.setProperty("validationQuery",                    "SELECT 1");
but I see that I lost that when I switched to using a properties file.  Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first.  Is the connection idle after the I/O error or is the I/O error from a dead connection?  (Small disclaimer:  there is no catalina involved here, just an embedded tomcat instance.)




Re: How different is AWS-RDS postgres?

От
Gmail
Дата:
Not important who does what with whom re Catalina/Tomcat ;)

I will indeed re-examine timeouts and such as inserting 100sK record is not instantaneous by any stretch.  Tomcat is the new kid on my block as prior to this release I managed a naked port with a Selector and that had no trouble with this same test data.  My biggest concern is the lack of server side indications.  I also need to confirm the table create/bulk-copy and update to target table (segment) are in separate transactions.  Doesn’t look like it, but they’re supposed to be.  I want the data written to the db in bulk, then come back round and load to the final table in chunks.




On May 27, 2021, at 5:20 PM, Sam Gendler <sgendler@ideasculptor.com> wrote:

Unless something has changed in recent years, the core servlet engine of tomcat IS catalina.  Embedded tomcat is embedded catalina. It looks like the I/O error is a result of attempting to send a query on an already dead connection.  I'd look for something that is limiting how long a connection can be open - either an explicit or default value for a timeout in the connection pool or on the server side.  If you don't get the same behaviour when running against a database locally, I'd definitely look at the default settings in RDS.  It may be automatically closing connections if they are idle for even a brief period.

On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 4:25 PM, Sam Gendler wrote:
That sure looks like something is causing your connection to have a transaction rollback.  I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.

One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester.  Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested. 


Well I /was/ doing
      contextResource.setProperty("validationQuery",                    "SELECT 1");
but I see that I lost that when I switched to using a properties file.  Thanks for point me there.

The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first.  Is the connection idle after the I/O error or is the I/O error from a dead connection?  (Small disclaimer:  there is no catalina involved here, just an embedded tomcat instance.)




Re: How different is AWS-RDS postgres?

От
Rob Sargent
Дата:
Found it.  Though there are indeed separate transactions for the bulk 
copy  v. target update, I had missed that there is an over-arching 
transaction in play and of course when things go south, /all/ things go 
south.

I get away with this, for now by bumping the tomcat idle time to an 
hour, which is serious overkill.